[META] Create an AI Security module for custom moderation calls

Open on Drupal.org →

Created on 13 June 2024, about 1 year ago

Problem/Motivation

Some LLMs have no moderation calls prior to sending a prompt to an LLM. Whilst the standard moderation calls provided by the larger companies are being targetted by hackers constantly and may not offer adequet protection again things such as prompt injection.

This may cause issues due to prompt injection. Or individuals may want to prevent their users from asking even a self-hosted LLM potentially harmful prompts. Organisations may simply wish certain topics not be allowed via their chatbots.

Proposed resolution

Create some way in the core AI module for other modules to hook into events before a prompt is sent
Create a seperate more advanced AI security module for handling advanced cases including:
Ability to use more than one LLM provider's content moderation for any of the LLM prompts. (For example a self-hosted LLama 3 model might use both OpenAI and Athropic's content moderation before sending it to Llama3 via Ollama)
This could support using these moderation calls for other areas in Drupal, such as prior to comments being added to a blog.
Ability to create custom prompt rules into a moderation call. For example setting up a Chatgpt 3.5 Prompt that returns a Yes or No custom prompt could also look for evidence of other topics you don't want discussed. These can be arbitrary for example, no discussion of sports.
Ability to create "Prompt Injection" preventers. Create a prompt that is always asked to answer a secret deterministic question and to ignore the user prompt. If there is any prompt injection it will likely cause the LLM to provide a different answer (Need to research if this is a good idea)
Hook into third party services and libraries for prevent prompt injection or other malicious acts

As these are experimental, contain business logic and are quite heavy (potentially doubling the cost of any prompt). This should be a sub-module for now.

User interface changes

API changes

Data model changes

✨ Feature request

Status

Active

Version

1.0

Component

Code

Created by

🇬🇧United Kingdom yautja_cetanu

Live updates comments and jobs are added and updated live.

Sign in to follow issues

Comments & Activities

Issue created by @yautja_cetanu
Comment about 1 year ago →
🇱🇹Lithuania mindaugasd
Sounds like great idea 🙌🏻, security against LLM hacks is important.
Comment about 1 year ago →
🇩🇰Denmark ressa Copenhagen
Thanks for creating this issue @yautja_cetanu, it would be a nice feature to be able to block prompts based on some rules, before passing them on to a LLM.
Comment about 1 year ago →
solideogloria
Comment 11 months ago →
🇧🇪Belgium wouters_f Leuven
Traditional chatbot companies do this by having a layer in front of the communication to GPT.
They have a whitelist/blacklist kind of approach.

They first do intent detection (to find the forbidden/allowed events).
According to the output of the intent detection they will act accordingly.

I guess this is where the ai_external_moderation module comes into play?
https://git.drupalcode.org/project/ai/-/tree/1.0.x/modules/ai_external_m...
Comment about 1 month ago →
🇺🇸United States Kristen Pol Santa Cruz, CA, USA
We are doing some issue management housekeeping and adding/removing components.

We are removing the "Code" component and want people to categorize issues with the best module/submodule component.

Moving this issue to "...to be triaged".

See 📌 Update AI module project components Active for more details.
Comment 25 days ago →
🇩🇪Germany marcus_johansson
I think this will be mostly covered by Guardrails, but I'm moving it into the ideas section of Drupal AI Initiative anyway.

contrib.social Blog FAQ Discussions

Production build 0.71.5 2024