[policy, no patch] Make a policy for Machine Learning ("AI") code contributions

Created on 4 March 2023, almost 2 years ago
Updated 28 July 2023, over 1 year ago

Problem/Motivation

As the AI/ML hype begins to go from wishful thinking to real possibilities, it is time for us as a community to take a stand on code contributions that are (in part) not made by humans. Whether it is full modules created by an ML script or AI peer code reviews or co-pilot like functionality, we need to work on standards to ensure that this is allowed or not.

ML in its current form has a number of drawbacks and ambiguities that make the creation of a policy necessary. For example, biases towards race, gender or other discriminatory issues (also in code!) and lack of clarity about who owns the code and thus whether they have the right to distribute it under a GPL license.

PS: it is not that I am against progress or afraid of ML changing the lives of individuals, I believe in the future but want to give direction to a desirable form of this future.

PPS: Several open source companies and projects already have policies on this like my employer SUSE at https://opensource.suse.com/legal/policy ("AI pair programming must not be used. The legal constructs around AI pair programming with respect to licensing and potential violations are not resolved.").

Proposed resolution

Discuss, come up with a (temporary?) policy.

๐ŸŒฑ Plan
Status

Active

Version

11.0 ๐Ÿ”ฅ

Component
Baseย  โ†’

Last updated about 10 hours ago

Created by

๐Ÿ‡ณ๐Ÿ‡ฑNetherlands bertboerland

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @bertboerland
  • ๐Ÿ‡ณ๐Ÿ‡ฟNew Zealand quietone

    Updating according to the 'special issue titles' and 'tag guidelines'. https://www.drupal.org/docs/develop/issues/fields-and-other-parts-of-an-... โ†’

  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia Captain Arnab

    Captain Arnab โ†’ made their first commit to this issueโ€™s fork.

  • @captain-arnab opened merge request.
  • ๐Ÿ‡ญ๐Ÿ‡บHungary Gรกbor Hojtsy Hungary

    Moving to 11.x as per https://www.drupal.org/about/core/blog/new-drupal-core-branching-scheme-... โ†’ , but I think no title or tag update is needed.

  • ๐Ÿ‡ฑ๐Ÿ‡นLithuania mindaugasd

    @fgm message copy from another issue

    I have a related, but not entirely equivalent question, and that is about code contributed on d.o. are modules allowed to include code generated by an AI ?

    The issue is that the AIs insist that they provide no license for the content they generate and have no copyright assertions, and I donโ€™t think there already is a legal doctrine around this.

    One the one hand, creations not by a human are deemed acts of nature (Naruto vs Slater was settled out of court, but the US copyright office stated in 2014 that โ€œOnly works created by a human can be copyrighted under United States lawโ€) which means a dev using such code should gain the copyright.

    But on the other hand, in 2020, in the Uniloc USA, Inc. v. Google LLC case, such code was found not to be copyrightable, meaning such code could not placed by the contributor under GPL-2.0+ as all Drupal code must be, for lack of copyright.

    And thatโ€™s without even considering the potential for infringement from existing code regurgitated by the AI.

  • ๐Ÿ‡ซ๐Ÿ‡ทFrance fgm Paris, France

    Thanks @mindaugasd. The reason why that Uniloc/Google case is so perplexing is that in essence it appears to a non-lawyer to be saying that because the code has no copyright when it is emitted by the AI, one cannot take it freely and attach a license to it. While, ever since I started tracking IP laws in the 80s (eh...), I've always been under the impression that anything in PD was available for anyone for any use, including relicensing, the limitation being, obviously, that other parties could copy the same code for free because it was also available from PD, regardless ot ehe extra licensing offered by any source.

    But here, the reasoning seems to be that such code would not be copyrightable matter, which is slightly different from being public domain.

  • ๐Ÿ‡ฑ๐Ÿ‡นLithuania mindaugasd

    New policy was added to documentation
    https://www.drupal.org/docs/develop/issues/issue-procedures-and-etiquett... โ†’

    AI Generated Content

    There is no doubt that artificial intelligence tools such as ChatGPT can be powerful ways to jumpstart code or content. However, AI systems still have significant flaws. Often times the code they produce is non-functional, and the content they create includes assertions or citations that are untrue.

    When using AI in the course of making a contribution to Drupal, we require you to:

    • Disclose that AI was used in crafting the code or content.
    • Carefully review and test the output, to ensure it is relevant, and that it works.
    • Provide human intervention to correct inaccuracies, mistakes, or broken code.

    Bulk use of AI when it is not relevant to an issue, provides broken or unusable code, or provides false information will likely result in a ban.

  • ๐Ÿ‡ซ๐Ÿ‡ทFrance fgm Paris, France

    Don't we want to worry about the fact that the "generated" code may actually be code copied from non-free code, introducing copyright violations to our code base ? That seems more worrisome than broken code which should be caught by tests anyway.

  • ๐Ÿ‡ฑ๐Ÿ‡นLithuania mindaugasd

    @fgm code is unlikely to be copyrighted, because:

    • AI learned from code on the internet. Most of it is likely to be open source. Copyrighted code on the interned is not easy to find (I guess). And there is little incentive for OpenAI to train AI on copyrighted code to avoid issues, because there is plenty of open source code to train on instead.
    • GPT-4 writes general patterns of coding, which it learned through many examples. GPT-4 is not only a generation engine, but it is smart as studies are showing. For example, 90% of AI developer assistant code was written by GPT-4. I did it through a conversation - I ask AI questions to evaluate the code and I also ask to improve the code until the it look flawless. For example, I ask "Please tell what we could improve", and then "Ok, great. Let's do that" :) And in practice, there is no copyrighted code, because it is original code generated in many iterations following general Drupal coding patterns.
    • If, in the end, if GPT-4 generated a line of original copyrighted (not Drupal) code, then the developer could spot it I think and delete it from the codebase.
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia darvanen Sydney, Australia

    Just a little context around #7: the problem is not so much broken code but a flood of half-hearted attempts at contribution becoming a large burden for reviewers. The addition was made in order to have something to point to when attempting to educate people whose efforts appear to be more skewed towards gaming the credit system than actually contributing. See #contribution-recognition-feedback channel in Slack for more.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States dww

    Interesting topics, thanks for opening an issue about this. Removing credit from @Captain Arnab since this is a policy issue and there will be "no patch" (or MR). Sadly, I don't have perms to close https://git.drupalcode.org/project/drupal/-/merge_requests/3726.

  • ๐Ÿ‡ณ๐Ÿ‡ฟNew Zealand quietone

    I closed the MR.

  • ๐Ÿ‡ฑ๐Ÿ‡นLithuania mindaugasd

    Wow, #contribution-recognition-feedback is very interesting cases.
    It can be bots and it will be increasingly difficult to tell.
    Interesting talk about this (more broad, but still) AI and the future of humanity | Yuval Noah Harari at the Frontiers Forum

  • ๐Ÿ‡ฑ๐Ÿ‡นLithuania mindaugasd
Production build 0.71.5 2024