I see this as an opinionated and somewhat intrusive one-sided stance on artificial intelligence as a whole.
The majority of public-facing websites exist to share content and not for profit. Most of the people with issues about their content ending up in a model are the minority (and powerful) profit-seeking content creators who will suffer with the progress of technology - as they have already had with the progress of the internet killing paper.
Although not apples to apples, this is like blocking the internet archive bot from accessing information about your site because you might post content you want to be removed in the future.
Besides, as weak barriers are setup like this, it will create a market for subversion, with people using lesser-known models to bypass the blacklist and with models like open-source LLAMA and the many others with close to equivalent power of GPT4, these non-commercial options would be just as powerful as the popular ones and hacks will be used to employ them. You will end up chasing a constantly growing list of bots.
I say, still put the code but commented for those who care about their content being indexed by AI, they should have the resources to uncomment these if they really need it.
frazras → created an issue.
frazras → created an issue.
I have added a patch with a potential solution by renaming the label to something readable
frazras → created an issue.