AI -KR3 - Making experimenting feel safe.

Created on 12 November 2024, about 1 month ago

Problem/Motivation

KR3 - Add an undo feature that lets marketers reverse agent actions -- evaluate design a UX for rollback functionality, and evaluate how much of Drupal's Workspace functionality we can reuse. Goal: end-users can feel safe experimenting with AI and we can take a bit more risk with AI agent development.

AI will probably not get every request 100% correct. We have a number of ways that we can try and mitigate this:

Proposed resolution

  • Workspaces - Make use of core Drupal workspaces so that AI can make changes to a specific workspace and the user can check everything and publish when they are happy or roll back to a previous version.
  • Have an "Undo" button next to something AI has done - (Complicated to show exactly what is being undone when there are a series of actions taken).
  • Provide logs and details of the specific actions the AI has taken in code. So that Drupal is report exactly what has happened (unlikely to make sense to the end-user).
  • Provide verbose descriptions of what AI has done generated by the AI including links so the end-user can check everything. (Possible chance to hallucinate)
  • Create a step where the AI describes what it will do in plain english and you have to say "Yes" or even click a button before it implements that step. (Again confusing when there are multiple steps required)
  • Blueprints - Provide a step where the AI tells you the exact JSON/ YAML it has generated to tell Drupal what to do that can be accepted before Drupal implements it. Much safer but unlikely to make sense to an end-user

One issue is Personas. Logs of what the code is doing is likely to appeal to developers but maybe it is too confusing for the ambitious sitebuilder or marketeer.

πŸ“Œ Task
Status

Active

Component

Track: AI

Created by

πŸ‡¬πŸ‡§United Kingdom yautja_cetanu

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @yautja_cetanu
  • πŸ‡¬πŸ‡§United Kingdom yautja_cetanu

    A couple of things from our initial investigations:

    1. Workspaces looks good but doesn't support field config yet which is essential so its unlikely to be ready for RC1

    Below is an AI Agent trying to add something to a vocab.

    It shows up as not yet pushed.

    2. We have verbose responses with links and details.

    Here it is medium levels of verbosity. It tells you the category but it doesn't tell you literally every term its going to make. It does give you links to places where you can check the work.

    Below is a details dropdown. It shows to a log of the actual things Drupal has done and this log is generated by code, not by an LLM and therefore will be completely accurate. However its what Drupal needs to know, it doesn't necessarily make sense to an end-user. We can at least improve it by making it so the labels are printed. But it will still be fairly techy.

  • πŸ‡©πŸ‡ͺGermany marcus_johansson

    First of all the Workspaces modules + WSE Config is really cool, however I did get stuck on most config entities where it currently won’t let me save them into a Workspace, which is how things are supposed to work as far as I could see. With Views it works.

    I could have the chatbot switch over to its own workspaces anytime it takes an action and currently I had to manually publish the Workflow But it’s some work to get something useful there both from our side and on, I’m guessing, the WSE’s side.

    In the Agent Interface we have added so that you can fetch all the content and config creations and changes as ids, this was not meant for the Chatbot UI from the beginning, but we have added a detail under the chat answer that can be toggled in the chatbot settings where you can see the actual changes made by the last response from the chatbot.

    What has also been added, so far in the taxonomy and fields agent, is that they will return links for every part of the system they change. This means that the Chatbot can be prompted to return this, which I have updated it to do on the Drupal CMS MR. So any beginner to Drupal would find links in the response where they can verify and see where changes are made.

    See screenshot below for example of both changes:

    Currently the agents are hardcoded not to do any deletions, the question is if we should also extend this to editing of configs that have substantial (10+) content items connected to them? This would make sure that you have a tool that can help you with the initial building, but that can’t really do any changes that are not revertable without a database backup/Workspaces?

  • πŸ‡¬πŸ‡§United Kingdom yautja_cetanu

    - We should have more details in the details. So if we're changing the name from X to Y it should say that. If its doing taxonomy terms it should show the term itself, not just the ID

  • πŸ‡©πŸ‡ͺGermany marcus_johansson

    From the fixed tickets and fixed structure on how to add original vs changed config and content.

    Diff on config entities:

    Labels on content entities:

    Later I will see if I can force this via interface, so anyone doing agents has to follow this. It would also help with fixing of revert of the last run agent(s)

  • πŸ‡¬πŸ‡§United Kingdom yautja_cetanu

    For Drupal CMS v1 we want to avoid anything "Destructive" - Which is defined as "Anything change that permanently deletes content we can't get back.

    - We don't allow deleting any content, any terms, or deleting a field. AI can help the end-user do this manually.
    - We have checked "Removing Alt Text" for an image field. And this doesn't delete anything. The Alt text is still there and when its enabled the field comes back.
    - We want to test changing multiple cardinality so if you have cardinality of 3 and content uses 3 and then change it to 1 what happens (We might make it so for Drupal CMS v1 it simply can't change cardinality, or allow it to change it upwards.

    In the future I think I'd like destructive changes if there is a single piece of content as its sometimes helpful when building things to have an example piece of content you don't mind if it gets destroyed.

  • πŸ‡¬πŸ‡§United Kingdom yautja_cetanu

    Descisions:

    • We are happy with AI generating the Review Stage before it does something and the Recap Summary after is does something. This means there is a chance of hallucinations but it means things will be more readable.
    • We will turn off "Details" for now. The deterministic logs of actions will instead be stored in the logs, not directly available to the chatbot user.
    • We have decided that the "Review Stage" and "Recape" SHOULD use accurate Drupal terminology. However if the end-user uses less exact Drupal Terminology (such as categories), AI will explain the the User how and why Drupal uses the specific systems it does.
    • The Recap stage should have links to everything so the end-user can verify everything themselves.
    • We have decided that for Drupal CMS 1.0 we will expect the end-user to do undo's manually with the AI helping them do it for themselves. If Undo doesn't involve deleting things the AI can perform a new actions that undoes the previous action but it is not technically an "undo"
    • Attached is an example of a workflow everyone really likes. We need to edit the instructions so Drupal outputs the below model workflow without the user having to type those instructions in to force it.

    Post Drupal CMS v1.0

    • We want to work on an approach for Review and Recap that is more accurate. For example. We may create forms for creating 100s of taxonomy terms that have a single purpose and an undo. We may use Workspaces. We may extend Experience Builder to forms and use their tools. We also may take the YAML file generates by "blueprints" that the AI follows and design a UI around that.
    • Explore approaches to Undo but need to remember that multiple people can work on the site at one time. If AI creates 15 taxonomy terms someone else could delete one before the chatbot user says undo. Need to explore that. (With Experience builder or Views, these issues are sidestepped as they are handled by the editing framework)

    Model Conversation:

    - Below is a picture that is formatted nicely and text that can by copied and pasted but isn't formatted nicely

    AI: Hello! How can I help you today?

    User:
    I would like to categorize the events into terms of price ranges.

    Can you do that for me?
    Before you do anything, please explain exactly what you will be doing. If you use Drupal terms can you explain what those terms means in terms of the language I have chosen to use?

    AI:
    Certainly! Here's what I plan to do to categorize the events into terms of price ranges:
    1. Create a Vocabulary: In Drupal, a "taxonomy" is used to categorize content. I'll create a new vocabulary named "Price Ranges". A vocabulary in Drupal is similar to a category type, where you can add terms/options (like Cheap, Moderate, Expensive) that can be selected to categorize content.
    2. Add Terms to the Vocabulary: I'll populate the "Price Ranges" vocabulary with terms that represent different price levels (e.g., Cheap, Moderate, Expensive).
    3. Add a Field to Events: Once the vocabulary is created, I'll attach it to the "Events" content type. This means I'll add a field to the Event content type that allows the person creating or editing an event to select one or more of the price range terms.
    Would you like me to go ahead and set this up for you?

    User:
    Yes please and when you do it please give me a review step where you explain exactly what you've done and give me links to all the places where you've created things so I can check it myself.

    AI:
    I have set up the categorization for the events based on your specified price ranges. Here's what I did:
    1. Created a Vocabulary:
    β—‹ A new vocabulary called "Price Ranges" was created to categorize events based on price.
    β—‹ You can review this vocabulary here: Price Ranges Vocabulary.
    2. Added Terms to the Vocabulary:
    β—‹ I added three terms to the "Price Ranges" vocabulary: Cheap, Moderate, and Expensive.
    β—‹ You can view these terms here: Price Range Terms.
    3. Added a Field to Events:
    β—‹ I attached the "Price Ranges" vocabulary to the Events content type by creating a new field.
    β—‹ This field allows event creators to select one of the price range terms.
    β—‹ You can review the field settings here: Event Fields and check the specific configuration here: Field Configuration.
    Feel free to explore these links and let me know if you want any changes or need further assistance!

  • πŸ‡¬πŸ‡§United Kingdom catch

    We are happy with AI generating the Review Stage before it does something and the Recap Summary after is does something. This means there is a chance of hallucinations but it means things will be more readable.

    Testing on the original AI recipe issue showed that this is an inherently broken approach. For example the AI telling someone it had created a vocabulary with 20 terms when it instead it had created a vocabulary with a single term with the label '20 terms'. This was found via a single person doing testing so it seems highly likely it will be a common outcome.

    It is not hard to produce a human readable summary of changes made to a Drupal site, just needs a list of 'created x type with y label' type messages. Might not be as fluid but it won't be misrepresented.

    It's bad enough when the changes themselves are bad but the AI 'hallucinating' about what's it's done is even worse.

Production build 0.71.5 2024