Do we need a policy on AI-generated content?

Created on 11 January 2023, almost 2 years ago
Updated 9 December 2023, 11 months ago

The much talked about ChatGPT tool is capable of generating very plausible text about a number of subjects, including programming. Its large language model has been trained on 45 Terabytes of text data, presumable scraped from the World Wide Web, so if the "answer" is somewhere on the web, odds are good that ChatGPT will come up with it.

The well-respected website StackOverflow has (temporary?) banned answers generated by ChatGPT, see Temporary policy: Generative AI (e.g., ChatGPT) is banned.

The site moderators on StackOverflow seem to think abuse of ChatGPT to generate answers is severe enough to ban it. I don't know if it will create problems on Drupal.org, but I just wanted to post this to the Site moderator's issue queue for general discussion about whether we need a policy in this area.

Examples:

๐ŸŒฑ Plan
Status

Active

Component

Policy

Created by

๐Ÿ‡ณ๐Ÿ‡ดNorway gisle Norway

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • ๐Ÿ‡ณ๐Ÿ‡ดNorway gisle Norway

    ChatGPT-generated posts are starting to appear on Drupal.org.

    There has been several of these reported in issue queues, but they have been summarily deleted as spam.

    This morning, I spotted three in the support forums:

    To me, it looks the quality of the answers it generates are below par. These answers are trash ans provides no benefit to the community. It look the people use AI-generated content to gain recognition and earn issue credits.

    Should we sanction people that use AI to generate the content they post on Drupal.org?

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium BramDriesen Belgium ๐Ÿ‡ง๐Ÿ‡ช

    I think we need a policy for such cases. I'm okay with it being used to for example make wordings of a sentence better. But just plainly copy pasting answers like in the 3 links you posted is a no-go for me.

    Here is the policy which Stack Overflow (and all subs) are following: https://meta.stackoverflow.com/questions/421831/temporary-policy-chatgpt...

    In a nutshell:

    Overall, because the average rate of getting correct answers from ChatGPT is too low, the posting of answers created by ChatGPT is substantially harmful to the site and to users who are asking and looking for correct answers.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom John_B London (UK), Worthing (UK), Innsbruck (Tirol)

    Yes a policy would be good. I can see no motive other than spamming for posting such stuff. Marking it spam makes sense to me. Can we have an AI bot to test posts for ChatGTP? It usually seems to churn out material (intelligently?) cut and pasted from elsewhere, which should be identifiable.

  • ๐Ÿ‡ณ๐Ÿ‡ดNorway gisle Norway

    It usually seems to churn out material (intelligently?) cut and pasted from elsewhere, which should be identifiable.

    Unfortunately, it does not. Academia have used plagiarism-checkers for years, and these will discover copy-pasted materials quite reliably. But they don't work with ChatGPT. (Yes, I've tested it.)

    GPT is an acronym for "Generative Pre-trained Transformer" and this type of AI works by predicting what word a human would use next when making conversation about a specific topic.

    However, ChatGPT output has a certain "style", and I like to believe I've gotten rather good at recognizing it. But what generally characterises the output produced by ChatGPT output is how bad it is. Unlike the rule-based AI that we pursued in the seventies, the GPT-based AI has zero knowledge about the domain. They're just incredibly good at generating letter-perfect and plausible-sounding sentences.

    Until there is some reliable automated tool for detecting it, we just have to rely on human moderation, just like we do with spam. (Well, we also use and automated spam detector called "Akismet" โ€“ but it gets it wrong too often, and human site moderators often needs to sort out both false positives and false negatives.)

    If we end up following Stack Overflow, and banning AI-generated posts, we may need to have a mechanism to report such posts (or just overload reporting on the present spam flag), and leave it to site moderators to remove the posts and impose sanctions.

    For starters, I think posting AI-generated posts should block a user from getting the 'confirmed' user role.

  • ๐Ÿ‡ณ๐Ÿ‡ดNorway gisle Norway

    Moved examples to issue summary.

  • ๐Ÿ‡ณ๐Ÿ‡ดNorway gisle Norway

    Added another one to issue summary.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    New examples, some of these have thankfully been deleted or unpublished by gisle already:

    https://www.drupal.org/project/drupal/issues/3366843#comment-15158138 ๐Ÿ“Œ Fix tests broken on PHP 8.3 Closed: cannot reproduce
    https://www.drupal.org/project/drupal/issues/3224941#comment-15155329 ๐Ÿ“Œ Remove usage of setAccessible() when core requires PHP 8.1 Fixed

    These were very long, obviously generated comments, with no relationship to the discussion going on. They could possibly in some cases look more plausible if they were the first comment on a new issue instead of a non-sequitur.

    I'm not entirely sure this needs a specific policy yet - these were worthless spam, so we can just ban the spammer; exactly how the text was produced was secondary except for the speed and volume.

  • ๐Ÿ‡ณ๐Ÿ‡ดNorway gisle Norway

    I'm not entirely sure this needs a specific policy yet - these were worthless spam, so we can just ban the spammer; exactly how the text was produced was secondary except for the speed and volume.

    The standard definition of "spam" is that it is content posted for the purposes of advertising. A lot if the AI-generated content is spam, where a chunk of ChatGPT generated content is simply used as wrapper for one or more spam links, to make the post appear legitimate. These posts are unceremoniously deleted, and the poster instantly banned (re our current policy โ†’ on spam).

    However, there's a lot of bad content posted on Drupal. Being bad does not make it spam according to the above definition. Some of it are posted with good intentions by community members that happen to lack knowledge, and some of it is AI-generated contents posted to create an appearance of contributing to issues (perhaps to earn some recognition or some issue credit). If is obviously wrong or off-topic, I use my site moderator privileges to unpublish or delete it (depending on how bad it is), but the site moderators need to thread careful to avoid exercising censorship. I, for one, do not interfere with bad content posted with good intentions.

    Classifying "these were worthless spam" is a policy. That's not our current policy on AI generated content, and until it is some concensus that this should be our policy, the site moderators cannot ban these users.

    It should also be noted that there are some enthusiasm for AI generated content in the community, for example, see this related issue: ๐Ÿ“Œ Use ChatGPT for solving Drupal issues to increase rate of development Active . I don't share their enthusiasm, but how to treat such content and the users posting it clearly not a cut and dried case.

    Independent of this issue, Hestenet of the DA recently expanded our guidelines on Issue Etiquette with a section on AI Generated Content โ†’ . It states the following policy:

    • Disclose that AI was used in crafting the code or content.
    • Carefully review and test the output, to ensure it is relevant, and that it works
    • Provide human intervention to correct inaccuracies, mistakes, or broken code.
    • Bulk use of AI when it is not relevant to an issue, provides broken or unusable code, or provides false information will likely result in a ban.

    My opinion is that this is reasonable, and that we probably now have our policy.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    The standard definition of "spam" is that it is content posted for the purposes of advertising.

    hmm I think that's an incomplete definition. Let's say someone really loves the book module. So they post 3,000 nearly identical messages on random core issues about how much they love the book module. They're not advertising anything except for how much they love the book module, but it's still spam - bulk messages posted indiscriminately and unsolicited. Another example would be someone posting exactly the same support question in 15 different slack channels within the space of a minute.

    However, there's a lot of bad content posted on Drupal. Being bad does not make it spam according to the above definition.

    This is true, but we've also had cases like someone won't fixing 50 random issues in a day which swiftly resulted in a ban. There is a type of behaviour that is this kind of bulk + low effort posting, which for me is best described by 'spamming'. I think it's fine if there's another term for it, just that's the one I associate with it.

    The addition looks basically fine, but I worry that it's going to let through not incorrect but redundant, useless, and lengthy content with a disclaimer that it was generated by AI, which I also don't want to wade through. I guess that can still be covered by 'not relevant' and can always be tightened later.

  • ๐Ÿ‡ณ๐Ÿ‡ดNorway gisle Norway

    hmm I think that's an incomplete definition. Let's say someone really loves the book module. So they post 3,000 nearly identical messages on random core issues about how much they love the book module. They're not advertising anything

    IMHO, this not spam, but instead:

    • Repetitive posting of trash content (test posts, inappropriate book pages, ...)

    And they're going to be banned for doing that (re our current policy โ†’ on trash content).

    but we've also had cases like someone won't fixing 50 random issues in a day which swiftly resulted in a ban.

    There are people outside of the site moderator team that may decide to ban a user for disruptive behavior (I believe this includes the employees of the DA, some CWG members, some senior core committers). IIRC, this decision to ban this particular user was not made by the site moderators.

    However, users are banned by site moderators for abuse of the credit system (Example: #3375893: Block user Harshita Mehna (Harshita mehna) for abuse of the issue queue. โ†’ ). This is based on the policy stated in the breakout box "Abuse of the credit system" on our guidance on Attribution and credit system" โ†’

    Speaking as a site moderator, I prefer there to be clear and objective policies for deleting content and and banning users. When we do this, we engage in acts of censorship. It is important that such acts are based on transparent and recognized policies, and not personal opinion or bias of the individual site moderator.

    As for the lengthy on not relevant missives typically produced by ChatGPT, I just noticed that our automatic anti-spam tool (Akismet) just unpublished one of them, without it containing any advertisement link. I've left it unpublished (instead of delering it) to see whether its author is going to complain about it not being published.

  • ๐Ÿ‡ฎ๐Ÿ‡นItaly apaderno Brescia, ๐Ÿ‡ฎ๐Ÿ‡น

    Yes, spamming is also used for posting the same content multiple times, but we do treat people doing that like spammers; differently, even the accounts used by people who post the same comment for issues that must be closed (like I do) should be blocked.

    We are restrictive about what we call spam because spammers are blocked without warnings, while we warn people who is doing something they should not be do, like repeatedly close old issues. The account could still be temporary blocked, for example to avoid that person keeps closing issues at the rate of 10 issues per minute, but we also send a message to let the person know what it should not be done on drupal.org.

    We need a policy because when people do something they should not do, we contact them and give a link to a page explaining what should not be done. I agree, it can be difficult to write a policy about AI-generated content, but that policy does not need to give too much details about why we do not like AI-generated content in issue queues and other places; it could do like the Drupal.org Terms of Service page, which does not say why shared accounts are not allowed to make commits in drupal.org repositories.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    Speaking as a site moderator, I prefer there to be clear and objective policies for deleting content and and banning users.

    OK I agree with this, there will be occasions where someone comes up with some new horrible behaviour that's not covered, but then it can at least be added retrospectively. The main question for me was whether there needed to be a specific policy for AI-generated content or whether it could be covered by a less-specific one encompassing the same kinds of content whether human or AI-generated. The 'trash' policy deals with some of that. I haven't had site moderation permissions since I resigned when a spyware module company was re-instated without discussion in about 2010 or similar, so a bit out of touch.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States hestenet Portland, OR ๐Ÿ‡บ๐Ÿ‡ธ

    As @gisle noted above, I've made a very basic effort to start developing a policy for AI generated content:

    In which I updated both:
    https://www.drupal.org/docs/develop/issues/issue-procedures-and-etiquett... โ†’
    And:
    https://www.drupal.org/drupalorg/docs/marketplace/abuse-of-the-contribut... โ†’

    This may be a bit tricky to write a policy for, because (at least for the moment) I've been leaning towards allowing AI generated content if and only if the use of AI is disclosed, and the user posting has reviewed and edited the material for accuracy.

    But maybe that's most of what we need:

    Users agree to do the above when using AI, or else the account will be temporarily suspended, and the user contacted with a link to the policy pages above.

    Users will be unblocked when they have acknowledged the policy materials and agreed to follow them in future. After an additional violation they may be blocked permanently.

    Thoughts on what else we should include?

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States volkswagenchick San Francisco Bay Area

    @hestenet The two link you posted in comment #14 lead me to 404s.

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada deviantintegral

    I think it would be great if the policy would apply to content in the drupal.org planet feed too. As written, the policies wouldn't prevent sites from using generated content if they so choose, but requires a baseline of human review, quality, and attribution that would increase the overall quality of content on the feed.

  • ๐Ÿ‡ฑ๐Ÿ‡นLithuania mindaugasd

    AI is good at writing. @deviantintegral is there some existing (or predicted) AI related problems within Planet planet feed?

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada deviantintegral

    It may be good at the structure of writing, but hallucinations are a challenge still.

    There's been some articles over the past 6 months that have felt like AI-generated content in that they were filled with generalizations and so on. But, I'm not confident enough to link to them. I think a policy would get ahead of things so that if content starts publishing problematic content there is a guideline to refer authors and publishers too.

  • ๐Ÿ‡ณ๐Ÿ‡ดNorway gisle Norway

    AI is terrible at writing. It is just terribly good at phrasing and grammar.

    The problem with AI generated content is not just the much talked about "hallucinations", but the verbosity and utter triviality of their utterances (which seems to be getting worse all the time as the AI companies tries to get rid of the "hallucinations" by removing any specificity from the output they spew forth).

    I think the problem with AI generated content is the same as the problem with spam. It is not bringing the world down, but having to deal with it is a big waste of time and resources โ€“ including my time. My vote is for muting any contributor, including Planet Drupal contributors, that posts AI generated content.

  • ๐Ÿ‡ณ๐Ÿ‡ดNorway gisle Norway

    Fixed broken link in issue summary.

  • ๐Ÿ‡ฆ๐Ÿ‡นAustria ai-sidekick

    In considering our approach to AI-generated content, the emphasis should be on the quality of the text rather than the means by which it was created.

    It's important to recognize the distinction between AI that generates content independently and AI that is employed to articulate and polish a user's pre-existing ideas effectively. While the first can be problematic, the second is merely better grammar correction.

    Just to prove my point, AI mostly wrote this based on: `Write a comment: Focus on text quality, not how created; Differentiate AI can be used to make up content, or to formulate users' ideas nicely - the second is good.` Having a first draft and then refining it is much easier for me than writing from scratch.

    PS: If someone were actually able to provide correct and helpful comments with an LLM and RAG, that would be wonderful. I wasn't able to do this yet.

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium BramDriesen Belgium ๐Ÿ‡ง๐Ÿ‡ช

    You are correct, but I think the issue that started it all here was that some user was posting forum answers by just copy pasting the question and then posting the (very long) answer of ChatGPT or whatever he used.

    Like you say, if itโ€™s to help you write what youโ€™re trying to say (grammar/wording) itโ€™s fine to an extent that it isnโ€™t noticeable. Just like your example.

    The tools will become better and more difficult to detect anyway over time. I think we need to keep an eye on this and keep evaluating what is happening in the community.

  • ๐Ÿ‡ณ๐Ÿ‡ดNorway jenniferhook12

    Yeah, Chatgpt can be used for multiple reasons including progamming and writing purposes. it summaries everything you ask it. you name it and its done.

Production build 0.71.5 2024