Make a policy for Machine Learning ("AI") code contributions

Issue created by @bertboerland
Comment over 2 years ago →
cilefen
Comment over 2 years ago →
🇳🇿New Zealand quietone
Updating according to the 'special issue titles' and 'tag guidelines'. https://www.drupal.org/docs/develop/issues/fields-and-other-parts-of-an-... →
Comment over 2 years ago →
🇮🇳India Captain Arnab
Captain Arnab → made their first commit to this issue’s fork.
@captain-arnab opened merge request.
Comment over 2 years ago →
🇭🇺Hungary Gábor Hojtsy Hungary
Moving to 11.x as per https://www.drupal.org/about/core/blog/new-drupal-core-branching-scheme-... → , but I think no title or tag update is needed.
Comment about 2 years ago →
🇱🇹Lithuania mindaugasd
@fgm message copy from another issue

I have a related, but not entirely equivalent question, and that is about code contributed on d.o. are modules allowed to include code generated by an AI ?

The issue is that the AIs insist that they provide no license for the content they generate and have no copyright assertions, and I don’t think there already is a legal doctrine around this.

One the one hand, creations not by a human are deemed acts of nature (Naruto vs Slater was settled out of court, but the US copyright office stated in 2014 that “Only works created by a human can be copyrighted under United States law”) which means a dev using such code should gain the copyright.

But on the other hand, in 2020, in the Uniloc USA, Inc. v. Google LLC case, such code was found not to be copyrightable, meaning such code could not placed by the contributor under GPL-2.0+ as all Drupal code must be, for lack of copyright.

And that’s without even considering the potential for infringement from existing code regurgitated by the AI.
Comment about 2 years ago →
🇫🇷France fgm Paris, France
Thanks @mindaugasd. The reason why that Uniloc/Google case is so perplexing is that in essence it appears to a non-lawyer to be saying that because the code has no copyright when it is emitted by the AI, one cannot take it freely and attach a license to it. While, ever since I started tracking IP laws in the 80s (eh...), I've always been under the impression that anything in PD was available for anyone for any use, including relicensing, the limitation being, obviously, that other parties could copy the same code for free because it was also available from PD, regardless ot ehe extra licensing offered by any source.

But here, the reasoning seems to be that such code would not be copyrightable matter, which is slightly different from being public domain.
Comment about 2 years ago →
🇱🇹Lithuania mindaugasd
New policy was added to documentation
https://www.drupal.org/docs/develop/issues/issue-procedures-and-etiquett... →

AI Generated Content

There is no doubt that artificial intelligence tools such as ChatGPT can be powerful ways to jumpstart code or content. However, AI systems still have significant flaws. Often times the code they produce is non-functional, and the content they create includes assertions or citations that are untrue.

When using AI in the course of making a contribution to Drupal, we require you to:

Disclose that AI was used in crafting the code or content.

Carefully review and test the output, to ensure it is relevant, and that it works.

Provide human intervention to correct inaccuracies, mistakes, or broken code.

Bulk use of AI when it is not relevant to an issue, provides broken or unusable code, or provides false information will likely result in a ban.
Comment about 2 years ago →
🇫🇷France fgm Paris, France
Don't we want to worry about the fact that the "generated" code may actually be code copied from non-free code, introducing copyright violations to our code base ? That seems more worrisome than broken code which should be caught by tests anyway.
Comment about 2 years ago →
🇱🇹Lithuania mindaugasd
@fgm code is unlikely to be copyrighted, because:

AI learned from code on the internet. Most of it is likely to be open source. Copyrighted code on the interned is not easy to find (I guess). And there is little incentive for OpenAI to train AI on copyrighted code to avoid issues, because there is plenty of open source code to train on instead.

GPT-4 writes general patterns of coding, which it learned through many examples. GPT-4 is not only a generation engine, but it is smart as studies are showing. For example, 90% of AI developer assistant code was written by GPT-4. I did it through a conversation - I ask AI questions to evaluate the code and I also ask to improve the code until the it look flawless. For example, I ask "Please tell what we could improve", and then "Ok, great. Let's do that" :) And in practice, there is no copyrighted code, because it is original code generated in many iterations following general Drupal coding patterns.

If, in the end, if GPT-4 generated a line of original copyrighted (not Drupal) code, then the developer could spot it I think and delete it from the codebase.
Comment about 2 years ago →
🇦🇺Australia darvanen Sydney, Australia
Just a little context around #7: the problem is not so much broken code but a flood of half-hearted attempts at contribution becoming a large burden for reviewers. The addition was made in order to have something to point to when attempting to educate people whose efforts appear to be more skewed towards gaming the credit system than actually contributing. See #contribution-recognition-feedback channel in Slack for more.
Comment about 2 years ago →
🇺🇸United States dww
Interesting topics, thanks for opening an issue about this. Removing credit from @Captain Arnab since this is a policy issue and there will be "no patch" (or MR). Sadly, I don't have perms to close https://git.drupalcode.org/project/drupal/-/merge_requests/3726.
Comment about 2 years ago →
System Message
quietone → closed merge request !3726
Comment about 2 years ago →
🇳🇿New Zealand quietone
I closed the MR.
Comment about 2 years ago →
🇱🇹Lithuania mindaugasd
Wow, #contribution-recognition-feedback is very interesting cases.
It can be bots and it will be increasingly difficult to tell.
Interesting talk about this (more broad, but still) AI and the future of humanity | Yuval Noah Harari at the Frontiers Forum
Comment about 2 years ago →
🇱🇹Lithuania mindaugasd

When this problem grows big enough, authentication may need to be improved (phone or money payment).

Longer term people are thinking of solutions OpenAI's Sam Altman launches Worldcoin crypto project "The project's core offering is its World ID, which the company describes as a "digital passport" to prove that its holder is a real human, not an AI bot."

Also AI will counter AI, aka. #3377394: Content moderation AI copilot →
Status changed to Closed: outdated 4 months ago1:31am 23 April 2025
Comment 4 months ago →
🇳🇿New Zealand quietone
The section AI-Generated Content → of the Issue etiquette document was added after this issue was opened. Having that policy statement fulfills the proposed resolution of this issue. Therefor I think this can be closed.

Policy and documentation evolves, so if the existing statement needs adjusting I suggest making a new issue to focus on that.
Comment about 2 months ago →
🇳🇿New Zealand quietone
Changing to latest version when this was closed.

Make a policy for Machine Learning ("AI") code contributions

Problem/Motivation

Proposed resolution

Comments & Activities

AI Generated Content