[META] Create an AI Security module for custom moderation calls

Created on 13 June 2024, 5 months ago
Updated 11 September 2024, 2 months ago

Problem/Motivation

Some LLMs have no moderation calls prior to sending a prompt to an LLM. Whilst the standard moderation calls provided by the larger companies are being targeted by hackers constantly and may not offer adequate protection again things such as prompt injection.

This may cause issues due to prompt injection. Or individuals may want to prevent their users from asking even a self-hosted LLM potentially harmful prompts. Organisations may simply wish certain topics not be allowed via their chatbots.

Proposed resolution

  • Create some way in the core AI module for other modules to hook into events before a prompt is sent
  • Create a seperate more advanced AI security module for handling advanced cases including:
  • Ability to use more than one LLM provider's content moderation for any of the LLM prompts. (For example a self-hosted LLama 3 model might use both OpenAI and Athropic's content moderation before sending it to Llama3 via Ollama)
  • This could support using these moderation calls for other areas in Drupal, such as prior to comments being added to a blog.
  • Ability to create custom prompt rules into a moderation call. For example setting up a Chatgpt 3.5 Prompt that returns a Yes or No custom prompt could also look for evidence of other topics you don't want discussed. These can be arbitrary for example, no discussion of sports.
  • Ability to create "Prompt Injection" preventers. Create a prompt that is always asked to answer a secret deterministic question and to ignore the user prompt. If there is any prompt injection it will likely cause the LLM to provide a different answer (Need to research if this is a good idea)
  • Hook into third party services and libraries for prevent prompt injection or other malicious acts

As these are experimental, contain business logic and are quite heavy (potentially doubling the cost of any prompt). This should be a sub-module for now.

User interface changes

API changes

Data model changes

Feature request
Status

Active

Version

1.0

Component

Code

Created by

🇬🇧United Kingdom yautja_cetanu

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.71.5 2024