Do we need a policy on AI-generated content

Comment over 2 years ago →
🇳🇴Norway gisle Norway
ChatGPT-generated posts are starting to appear on Drupal.org.

There has been several of these reported in issue queues, but they have been summarily deleted as spam.

This morning, I spotted three in the support forums:

https://www.drupal.org/forum/general/general-discussion/2023-05-22/can-i... →

https://www.drupal.org/forum/general/general-discussion/2023-05-04/how-c... →

https://www.drupal.org/forum/support/post-installation/2014-10-12/combin... →

To me, it looks the quality of the answers it generates are below par. These answers are trash ans provides no benefit to the community. It look the people use AI-generated content to gain recognition and earn issue credits.

Should we sanction people that use AI to generate the content they post on Drupal.org?
Comment over 2 years ago →
🇧🇪Belgium BramDriesen Belgium 🇧🇪
I think we need a policy for such cases. I'm okay with it being used to for example make wordings of a sentence better. But just plainly copy pasting answers like in the 3 links you posted is a no-go for me.

Here is the policy which Stack Overflow (and all subs) are following: https://meta.stackoverflow.com/questions/421831/temporary-policy-chatgpt...

In a nutshell:

Overall, because the average rate of getting correct answers from ChatGPT is too low, the posting of answers created by ChatGPT is substantially harmful to the site and to users who are asking and looking for correct answers.
Comment over 2 years ago →
🇬🇧United Kingdom John_B London (UK), Worthing (UK), Innsbruck (Tirol)
Yes a policy would be good. I can see no motive other than spamming for posting such stuff. Marking it spam makes sense to me. Can we have an AI bot to test posts for ChatGTP? It usually seems to churn out material (intelligently?) cut and pasted from elsewhere, which should be identifiable.
Comment over 2 years ago →
🇳🇴Norway gisle Norway
It usually seems to churn out material (intelligently?) cut and pasted from elsewhere, which should be identifiable.

Unfortunately, it does not. Academia have used plagiarism-checkers for years, and these will discover copy-pasted materials quite reliably. But they don't work with ChatGPT. (Yes, I've tested it.)

GPT is an acronym for "Generative Pre-trained Transformer" and this type of AI works by predicting what word a human would use next when making conversation about a specific topic.

However, ChatGPT output has a certain "style", and I like to believe I've gotten rather good at recognizing it. But what generally characterises the output produced by ChatGPT output is how bad it is. Unlike the rule-based AI that we pursued in the seventies, the GPT-based AI has zero knowledge about the domain. They're just incredibly good at generating letter-perfect and plausible-sounding sentences.

Until there is some reliable automated tool for detecting it, we just have to rely on human moderation, just like we do with spam. (Well, we also use and automated spam detector called "Akismet" – but it gets it wrong too often, and human site moderators often needs to sort out both false positives and false negatives.)

If we end up following Stack Overflow, and banning AI-generated posts, we may need to have a mechanism to report such posts (or just overload reporting on the present spam flag), and leave it to site moderators to remove the posts and impose sanctions.

For starters, I think posting AI-generated posts should block a user from getting the 'confirmed' user role.
Comment over 2 years ago →
🇳🇴Norway gisle Norway
Moved examples to issue summary.
Comment over 2 years ago →
🇳🇴Norway gisle Norway
Added another one to issue summary.
Comment over 2 years ago →
🇬🇧United Kingdom catch
New examples, some of these have thankfully been deleted or unpublished by gisle already:

https://www.drupal.org/project/drupal/issues/3366843#comment-15158138 📌 Fix tests broken on PHP 8.3 Closed: cannot reproduce
https://www.drupal.org/project/drupal/issues/3224941#comment-15155329 📌 Remove usage of setAccessible() when core requires PHP 8.1 Fixed

These were very long, obviously generated comments, with no relationship to the discussion going on. They could possibly in some cases look more plausible if they were the first comment on a new issue instead of a non-sequitur.

I'm not entirely sure this needs a specific policy yet - these were worthless spam, so we can just ban the spammer; exactly how the text was produced was secondary except for the speed and volume.
Comment over 2 years ago →
🇳🇴Norway gisle Norway
I'm not entirely sure this needs a specific policy yet - these were worthless spam, so we can just ban the spammer; exactly how the text was produced was secondary except for the speed and volume.

The standard definition of "spam" is that it is content posted for the purposes of advertising. A lot if the AI-generated content is spam, where a chunk of ChatGPT generated content is simply used as wrapper for one or more spam links, to make the post appear legitimate. These posts are unceremoniously deleted, and the poster instantly banned (re our current policy → on spam).

However, there's a lot of bad content posted on Drupal. Being bad does not make it spam according to the above definition. Some of it are posted with good intentions by community members that happen to lack knowledge, and some of it is AI-generated contents posted to create an appearance of contributing to issues (perhaps to earn some recognition or some issue credit). If is obviously wrong or off-topic, I use my site moderator privileges to unpublish or delete it (depending on how bad it is), but the site moderators need to thread careful to avoid exercising censorship. I, for one, do not interfere with bad content posted with good intentions.

Classifying "these were worthless spam" is a policy. That's not our current policy on AI generated content, and until it is some concensus that this should be our policy, the site moderators cannot ban these users.

It should also be noted that there are some enthusiasm for AI generated content in the community, for example, see this related issue: 📌 Use ChatGPT for solving Drupal issues to increase rate of development Active . I don't share their enthusiasm, but how to treat such content and the users posting it clearly not a cut and dried case.

Independent of this issue, Hestenet of the DA recently expanded our guidelines on Issue Etiquette with a section on AI Generated Content → . It states the following policy:

Disclose that AI was used in crafting the code or content.

Carefully review and test the output, to ensure it is relevant, and that it works

Provide human intervention to correct inaccuracies, mistakes, or broken code.

Bulk use of AI when it is not relevant to an issue, provides broken or unusable code, or provides false information will likely result in a ban.

My opinion is that this is reasonable, and that we probably now have our policy.
Comment over 2 years ago →
🇬🇧United Kingdom catch
The standard definition of "spam" is that it is content posted for the purposes of advertising.

hmm I think that's an incomplete definition. Let's say someone really loves the book module. So they post 3,000 nearly identical messages on random core issues about how much they love the book module. They're not advertising anything except for how much they love the book module, but it's still spam - bulk messages posted indiscriminately and unsolicited. Another example would be someone posting exactly the same support question in 15 different slack channels within the space of a minute.

However, there's a lot of bad content posted on Drupal. Being bad does not make it spam according to the above definition.

This is true, but we've also had cases like someone won't fixing 50 random issues in a day which swiftly resulted in a ban. There is a type of behaviour that is this kind of bulk + low effort posting, which for me is best described by 'spamming'. I think it's fine if there's another term for it, just that's the one I associate with it.

The addition looks basically fine, but I worry that it's going to let through not incorrect but redundant, useless, and lengthy content with a disclaimer that it was generated by AI, which I also don't want to wade through. I guess that can still be covered by 'not relevant' and can always be tightened later.
Comment over 2 years ago →
🇳🇴Norway gisle Norway
hmm I think that's an incomplete definition. Let's say someone really loves the book module. So they post 3,000 nearly identical messages on random core issues about how much they love the book module. They're not advertising anything

IMHO, this not spam, but instead:

Repetitive posting of trash content (test posts, inappropriate book pages, ...)

And they're going to be banned for doing that (re our current policy → on trash content).

but we've also had cases like someone won't fixing 50 random issues in a day which swiftly resulted in a ban.

There are people outside of the site moderator team that may decide to ban a user for disruptive behavior (I believe this includes the employees of the DA, some CWG members, some senior core committers). IIRC, this decision to ban this particular user was not made by the site moderators.

However, users are banned by site moderators for abuse of the credit system (Example: #3375893: Block user Harshita Mehna (Harshita mehna) for abuse of the issue queue. → ). This is based on the policy stated in the breakout box "Abuse of the credit system" on our guidance on Attribution and credit system" →

Speaking as a site moderator, I prefer there to be clear and objective policies for deleting content and and banning users. When we do this, we engage in acts of censorship. It is important that such acts are based on transparent and recognized policies, and not personal opinion or bias of the individual site moderator.

As for the lengthy on not relevant missives typically produced by ChatGPT, I just noticed that our automatic anti-spam tool (Akismet) just unpublished one of them, without it containing any advertisement link. I've left it unpublished (instead of delering it) to see whether its author is going to complain about it not being published.
Comment over 2 years ago →
🇮🇹Italy apaderno Brescia, 🇮🇹
Yes, spamming is also used for posting the same content multiple times, but we do treat people doing that like spammers; differently, even the accounts used by people who post the same comment for issues that must be closed (like I do) should be blocked.

We are restrictive about what we call spam because spammers are blocked without warnings, while we warn people who is doing something they should not be do, like repeatedly close old issues. The account could still be temporary blocked, for example to avoid that person keeps closing issues at the rate of 10 issues per minute, but we also send a message to let the person know what it should not be done on drupal.org.

We need a policy because when people do something they should not do, we contact them and give a link to a page explaining what should not be done. I agree, it can be difficult to write a policy about AI-generated content, but that policy does not need to give too much details about why we do not like AI-generated content in issue queues and other places; it could do like the Drupal.org Terms of Service page, which does not say why shared accounts are not allowed to make commits in drupal.org repositories.
Comment over 2 years ago →
🇬🇧United Kingdom catch
Speaking as a site moderator, I prefer there to be clear and objective policies for deleting content and and banning users.

OK I agree with this, there will be occasions where someone comes up with some new horrible behaviour that's not covered, but then it can at least be added retrospectively. The main question for me was whether there needed to be a specific policy for AI-generated content or whether it could be covered by a less-specific one encompassing the same kinds of content whether human or AI-generated. The 'trash' policy deals with some of that. I haven't had site moderation permissions since I resigned when a spyware module company was re-instated without discussion in about 2010 or similar, so a bit out of touch.
Comment over 2 years ago →
🇺🇸United States hestenet Portland, OR 🇺🇸
As @gisle noted above, I've made a very basic effort to start developing a policy for AI generated content:

In which I updated both:
https://www.drupal.org/docs/develop/issues/issue-procedures-and-etiquett... →
And:
https://www.drupal.org/drupalorg/docs/marketplace/abuse-of-the-contribut... →

This may be a bit tricky to write a policy for, because (at least for the moment) I've been leaning towards allowing AI generated content if and only if the use of AI is disclosed, and the user posting has reviewed and edited the material for accuracy.

But maybe that's most of what we need:

Users agree to do the above when using AI, or else the account will be temporarily suspended, and the user contacted with a link to the policy pages above.

Users will be unblocked when they have acknowledged the policy materials and agreed to follow them in future. After an additional violation they may be blocked permanently.

Thoughts on what else we should include?
Comment over 2 years ago →
🇺🇸United States volkswagenchick San Francisco Bay Area
@hestenet The two link you posted in comment #14 lead me to 404s.
Comment almost 2 years ago →
🇨🇦Canada deviantintegral
I think it would be great if the policy would apply to content in the drupal.org planet feed too. As written, the policies wouldn't prevent sites from using generated content if they so choose, but requires a baseline of human review, quality, and attribution that would increase the overall quality of content on the feed.
Comment almost 2 years ago →
🇱🇹Lithuania mindaugasd
AI is good at writing. @deviantintegral is there some existing (or predicted) AI related problems within Planet planet feed?
Comment almost 2 years ago →
🇨🇦Canada deviantintegral
It may be good at the structure of writing, but hallucinations are a challenge still.

There's been some articles over the past 6 months that have felt like AI-generated content in that they were filled with generalizations and so on. But, I'm not confident enough to link to them. I think a policy would get ahead of things so that if content starts publishing problematic content there is a guideline to refer authors and publishers too.
Comment almost 2 years ago →
🇳🇴Norway gisle Norway
AI is terrible at writing. It is just terribly good at phrasing and grammar.

The problem with AI generated content is not just the much talked about "hallucinations", but the verbosity and utter triviality of their utterances (which seems to be getting worse all the time as the AI companies tries to get rid of the "hallucinations" by removing any specificity from the output they spew forth).

I think the problem with AI generated content is the same as the problem with spam. It is not bringing the world down, but having to deal with it is a big waste of time and resources – including my time. My vote is for muting any contributor, including Planet Drupal contributors, that posts AI generated content.
Comment almost 2 years ago →
🇳🇴Norway gisle Norway
Fixed broken link in issue summary.
Comment almost 2 years ago →
🇦🇹Austria ai-sidekick
In considering our approach to AI-generated content, the emphasis should be on the quality of the text rather than the means by which it was created.

It's important to recognize the distinction between AI that generates content independently and AI that is employed to articulate and polish a user's pre-existing ideas effectively. While the first can be problematic, the second is merely better grammar correction.

Just to prove my point, AI mostly wrote this based on: `Write a comment: Focus on text quality, not how created; Differentiate AI can be used to make up content, or to formulate users' ideas nicely - the second is good.` Having a first draft and then refining it is much easier for me than writing from scratch.

PS: If someone were actually able to provide correct and helpful comments with an LLM and RAG, that would be wonderful. I wasn't able to do this yet.
Comment almost 2 years ago →
🇧🇪Belgium BramDriesen Belgium 🇧🇪
You are correct, but I think the issue that started it all here was that some user was posting forum answers by just copy pasting the question and then posting the (very long) answer of ChatGPT or whatever he used.

Like you say, if it’s to help you write what you’re trying to say (grammar/wording) it’s fine to an extent that it isn’t noticeable. Just like your example.

The tools will become better and more difficult to detect anyway over time. I think we need to keep an eye on this and keep evaluating what is happening in the community.
Comment almost 2 years ago →
🇳🇴Norway jenniferhook12
Yeah, Chatgpt can be used for multiple reasons including progamming and writing purposes. it summaries everything you ask it. you name it and its done.
Comment 11 months ago →
🇩🇰Denmark ressa Copenhagen
There has been a few veiled attempts at what looks like SEO link spam recently, by posting 5-6 sort of correct answers, and also some wrong answers -- seemingly by an LLM such as ChatGPT -- and then finally a post with a link to a site:

https://www.drupal.org/u/monapeterson →

https://www.drupal.org/u/albert-s →

So I agree with the current policy of not allowing LLM generated content in the forums:

Because the average quality of answers generated by ChatGPT and other AI tools seldom makes these answers useful, posting of answers created by AI-tools is considered harmful to community and to users who are coming to Drupal.org seeking support. Therefore, you should not use AI to generate your answer to a support question, feature request or bug report.

https://www.drupal.org/docs/administering-a-drupal-site/troubleshooting-... →

Maybe this rule could be used more?

For instance, posting a link to your own website and asking how to convert it to Drupal is too generic (as it cannot be answered beyond "Hire a consultant") and will be routinely unpublished.

https://www.drupal.org/docs/administering-a-drupal-site/troubleshooting-... →

Drupal RAG

I agree @rolandschuetz, a functioning Drupal RAG providing precise, non-hallucinatory answers would be useful. Akansha Saxena is closest, I think, see Inside the Codebase: A Deep Dive Into Drupal Rag Integration. She's also on Planet Drupal, 📌 Add akanshasaxena.com to Planet Drupal Fixed .
Comment 6 months ago →
🇨🇦Canada Charlie ChX Negyesi 🍁Canada
There's a much worse problem: people now post AI generated modules which were clearly never even read by a human. The cherry on the cake is they have the audacity to opt such code into security coverage. I just reported a security hole in such and I haven't even started looking seriously. These modules should be removed from security coverage and their publisher needs to lose their security coverage privileges IMO. I doubt we can stop the proliferation of them because they will just post this garbage to github otherwise but no one should for the security team to deal with slop.
Comment 6 months ago →
🇬🇧United Kingdom catch
I tried to review an MR against experience builder recently, and it turned out the entire MR was LLM-generated and had not been reviewed by a human, none of this was disclosed in the issue summary. It wasted more than an hour of my time wondering wtf was going on. See ✨ Add automated image optimization to image component Active .

I think there needs to be a disclosure policy, and when that's repeatedly broken, issues and projects should be treated as spam.

When there's disclosure, and the project code is still un-reviewed slop, which is the situation that chx describes, that is starts to feel like we need a new issue for the Technical Working Group (which is not properly functional)/DA/Security team. Maybe a new issue in https://www.drupal.org/project/issues/securitydrupalorg → to start with?

I have seen some truly awful Drupal modules posted in my time, but LLMs allow these to be written and posted much, much faster than they previously could be.
Comment 6 months ago →
🇸🇰Slovakia poker10
Linking a similar core issue.
Comment 6 months ago →
🇺🇸United States cmlara
First off: I'm aware of the user and modules refereed to in #23, and agree they are poorly written (to the extent I've added them to an internal list of modules to not to utilize and maintainers to avoid).

These modules should be removed from security coverage and their publisher needs to lose their security coverage privileges IMO perhaps based on a strike system.

I am weary of this as an option. Responsible/Ethical/Coordinated disclosure is a key aspect of security. It would be one concept if D.O. did not use wording such as Security issues do not need to be privately reported for the module_name project., with this wording D.O. is actively encouraging public uncoordinated disclosure for those not 'opted in'.

Any action D.O. takes to remove the ability for maintainers to compel private disclosure is an affront to security. D.O. requiring a test and to remain in the 'good blessings' of the Security Team would be conflict with security industry norms.

To put this in another light, how would the community feel if I demanded the Core Team pass an arbitrary test that I personally create, with a drawn out process of months to approve with the ability to revoke based on a condition I decide,, before I would cease disclosing vulnerabilities publicly? I presume the community would be very upset, especially in the case of a Drupelgeddon level vulnerability, and yet that is the standard applied to contrib maintainers today that this suggests extending.

Whatever that occurs for a policy regarding AI generated code there should not be considering the compromising of security privileges in the equation. Consider LLM code spam, consider LLM code unacceptable for commit due to copyright laws, consider it unwanted and target the user on those grounds, whatever is done, do not make a policy that would compromise the ability to request private disclosure.
Comment 6 months ago →
🇨🇦Canada Charlie ChX Negyesi 🍁Canada
My two cents: already the security team has the policy to mark unmaintained projects unsupported. Spraying a probabilistic series of PHP tokens into git is not maintainership.

But, it's up to the security team whether they share my concern of being overrun.
Comment 6 months ago →
🇬🇧United Kingdom catch
@cmlara I actually agree, in this specific case, revoking git access altogether would be better. I really do not like the 'opt in approval' process, although it's better than what it replaced, but we do need to be able to prevent people from bulk-publishing inherently flawed code on Drupal.org. I've opened 📌 Bulk LLM-generated module publishing by bigbabert Active .
Comment 6 months ago →
🇺🇸United States mradcliffe USA
In the new first-time contributor workshop slides that volkswagenchick developed we have a single slide titled "Use of AI" with the following scripted notes. This could be a good start for an official policy.

There is no doubt that artificial intelligence tools such as ChatGPT can be powerful ways to jumpstart code or content. However, AI systems still have significant flaws. Often times the code they produce is non-functional, and the content they create includes assertions or citations that are untrue.

When using AI in the course of making a contribution to Drupal, we require you to:

Disclose that AI was used in crafting the code or content.
Carefully review and test the output, to ensure it is relevant, and that it works.
Provide human intervention to correct inaccuracies, mistakes, or broken code.
Bulk use of AI when it is not relevant to an issue, provides broken or unusable code, or provides false information will likely result in a ban.

I am not sure why we mention it as required, but it probably should be changed to recommended for now pending policies.
Comment 6 months ago →
🇺🇸United States cmlara
I am not sure why we mention it as required, but it probably should be changed to recommended for now pending policies.

That appears to be a direct pull from the Issue Etiquette page .

Hestnet added basic policy in comment #14 to the Cresir A use policy and added the Etiquette page text.

Discussion on this issue switched to “does the policy need changes” after that post.
Comment 6 months ago →
🇺🇸United States mradcliffe USA
cmlara, thank you. That jarred my memory. I was trying to find where it came from.
Comment 6 months ago →
🇺🇸United States cmlara
Updating title to better reflect the status of this issue after #14's creation of policy by D.A. staff.
Comment 6 months ago →
🇺🇸United States mradcliffe USA
I updated the issue summary with links from #14 and added the links to recent issues in the examples item list.
Comment 5 months ago →
🇺🇸United States mradcliffe USA
I added an example of a module porting exercise using ChatGPT. The use of which is disclosed as part of the merge request (with prompts included in the merge request).

Do we need a policy on AI-generated content

Comments & Activities

Drupal RAG