yautja_cetanu → created an issue.
yautja_cetanu → created an issue.
yautja_cetanu → created an issue.
https://www.drupal.org/project/drupal_cms/issues/3467680 → - I've added a roadmap and all the important related issues.
A couple of things from our initial investigations:
1. Workspaces looks good but doesn't support field config yet which is essential so its unlikely to be ready for RC1
Below is an AI Agent trying to add something to a vocab.
It shows up as not yet pushed.
2. We have verbose responses with links and details.
Here it is medium levels of verbosity. It tells you the category but it doesn't tell you literally every term its going to make. It does give you links to places where you can check the work.
Below is a details dropdown. It shows to a log of the actual things Drupal has done and this log is generated by code, not by an LLM and therefore will be completely accurate. However its what Drupal needs to know, it doesn't necessarily make sense to an end-user. We can at least improve it by making it so the labels are printed. But it will still be fairly techy.
yautja_cetanu → created an issue.
yautja_cetanu → created an issue.
Added the alpha of AI Evaluations. I don't think we need evaluations to reach beta before AI can get to release or rc1
yautja_cetanu → created an issue.
For Tim Doyle at the Drupal Association. Before Phase 3 goes live it will likely be worth the DA looking at this as Phase 3 is a potential revenue source.
Hi Catch, I see what you mean, especially you're last point, we may by design have things that could increase the probability of hallucinations. We really need to see the prompts. We're planning to do a very initial version of a systemic test of the AI agents tomorrow. I'll write up the issue about it today and post it here as if we could see the prompts above we could know.
I think by far the most important thing is to try these prompts out in reality with real people who might use this and know little about Drupal and have a full log of exactly what is happening with the AI to know. I also believe we should ask people "Did it work?" and "Are you happy with this/ Would you use this again?" as I believe its important to answer the "Expectations" question you raise.
We might find that for non-drupal users a low probability of success is still really cool for them as they had low expectations of AI as you had. We might find they treat AI like tradition computers and even if the success is high they get frustrated if it doesn't work.
So if we describe the problem:
Potential issue: AI reports in plain english what it did, but in reality it does something different.
There are 3 possible causes:
- Full Hallucations: The AI describes what it does in one response and in the same response does something completely different. (Creates 4 terms but actually does 5 terms or maybe adds an image field).
- Introduces bugs/ typos in its code: The AI describes what it does and then tries to implement it but types out the instructions incorrect. It might have a typo so the JSON can't be parsed or might use the incorrect structure to define something.
- Bugs outside of AI: The AI gets everything correct but the Drupal code around it implements things incorrectly meaning the end-user gets told something has happened but it doesn't happen.
For the problem you're speaking about all 3 are issues as they all result in the same thing as far as the end-user is concerned. Also all 3 are potentially side-stepped through the Workspaces approach you suggested. But they have different approaches to fixing them if you want to avoid it happening at all.
I have an intuition from what I've seen that the full hallucinations is unlikely, but 2. is likely to very likely for some smaller opensource models and 3 is likely in the way all code is likely. However I do agree with you that to some degree we have made 1 more likely by design as we want it to describe what it does in plain english, not using Drupal terms but then its description of what its done is plain english. Currently also there are different LLMs writing the logs of what is done to the LLMs writing what it has done.
I wonder if we want to do something like create a "Review" agent. Have the Agent that implements the taxonomy write up in plain english for the logs what it has done but describe it in Drupal language. Then we have a seperate agent that actually goes in and checks and then writes a description of what has happened. But we should see the results of evaluations first.
Yes it is a typo!! 80% right!
Catch RE the example above. This seems to be a bug in the code not the LLM hallucinating. This will be caught by better automated testing with more test coverage.
When we have the Evaluations module next week we'lll be able to see the prompt and response to see if this is a case of LLM hallucinating or not.
There are two things above. The Agents are doing the work, there is something called the "Assistant API" that is orchestrating the agents and displaying answers to the end-user. Our first look is that the Agent replied with an error that the Assistant didn't catch properly. So its a bug rather than an inherent AI issue.
Obviously both matter but they are solved differently. We'll know more when we can get proper logs.
I replied to Catch before I saw your message Dries as it was a long reply! Very much agree with most of it.
- Yes I think we need that. We're starting it here: https://www.drupal.org/project/ai_evaluations → and hope to get a demo of it working on Monday. We will then need to export the prompts and what they do and store them publically somewhere. We also will be speaking to the Drupal CMS privacy track to see how we can do this securely with GDPR in mind. This can help us improve the prompts but also eventually fine-tune an opensource model.
- It is something we can enforce and there are a number of ways of doing this. It does make everything much less fluid though and what if there are multiple steps the AI needs to do?
- Agreed, for now I think we just remove features we don't want in Drupal CMS. We should do Agent level permissions later with a recipe that defines what we want for Drupal CMS. (We have an agent for installing and removing modules contributed by the community but this shouldn't be enabled by default and maybe shouldn't be in the agent's module, it was more a proof of concept for someone)
- Yes, hope to have something to show next week.
- I agree with you on the Review Step, early testing with it confused people who didn't get Drupal. Fundamentally Drupal has all its own strange terminology (like taxonomy) and so a review step will introduce people to lots of terms they don't know if they need to know. It's hard to know how to review something when you don't know what it is you're reviewing and the consequences.
- I definitely find the stress of picking which abstraction to go for is a big headache when starting with Drupal. It's getting easier but I'm sure everyone remembers having to figure out "Do I use a node or entity?". One cool thing about AI migration if we get it working is it will become easier to convert a node into an entity or a select list into a taxonomy or an image field into a media field. So when we're further ahead with migration it might be one to revisit.
Re: Catch
I mostly agree with your response and definitely think Workspaces looks like an amazing solution to this.
Personally I've always thought the Minikanban approach here: https://www.youtube.com/watch?v=tXvIdjcB718 Could be really good as it can describe every step with images, links etc. Maybe it could be just a list of tasks in order instead of kanban.
I've seen how other systems implement Agents and they always describe what they are going to do with each step and then output the JSON to do the steps. We had reasons not to do this but I think it might be good for us. One reason is that this counts as "Chain-of-thought" and will likely result in the output being more accurate. It will also help with debugging even if it doesn't perfectly match the JSON. We could also use AI review agents to check the description matches the JSON.
I don't know if you're correct about expectations though. I don't know if the expectations are always going to be higher when it does it for you compared to if you do it yourself when it comes to GenAI. The ability for AI to get things wrong and hallucinate is so wide-spread and the probability you'll encounter AI Getting something wrong happens so soon that I think people will start to trust AI like they would trust a human intern, not like they would trust a calculator.
Even with perfect AI models, I think meaningful human-in-the-loop will always be necessary in the same way I would review changes if a genius intern joined my organisation and started changing things on our website. Its one reason why I think Drupal will become the best and safest AI orchstration platform out there.
I think an expectation that AI will get it wrong 80% of the time and I'll have to fix it will be fine. If you arn't a developer but a project manager/ owner/ designer etc. I don't think we have expectations that developers will get things right either and most of us have to build into our process some way of checking it. So with time and testing it will be good to fully research what expectations the persona we are aiming at has.
Thanks for your review and glad it exceeded your admittedly low expectations! I'll try and reply to all your concerns.
- Firstly whilst there was some prompt-crafting for the demo we tried to craft in the opposite direction by making it harder for AI and closer to what a Sarah persona would say. But as you can imagine it's much harder to do something real in the wild than in a demo and so we will be, starting from next week start testing the Agents with real people fitting the persona and report back with statistics so we can know what happens without any prompt crafting.
- We're opting to focus on Drupal CMS release to have a small number of agents working really well rather than all the agents. Views and Migrate won't be in scope for 1.0
- For the Demo we had to do some of the most prompt-crafting for taxonomy. As of this morning we have significantly improved the field agents and also added all the features for the core fields including configuration and display.
- As Drupal CMS developers we are likely going to have to start building up documentation of Drupal best practices as you described with tagging. We tested with Claude and found if you asked it "In Drupal should I use a select list or Taxonomy" it was quite good at explaining how to decide.With specific examples of "wine regions" vs "rough expense", it was good at selecting taxonomy for one and select list for the other. However, going forwards, we shouldn't rely on a model's internal knowledge especially if we want to use smaller opensource models. So this will be something we need to work on.
- To add further to the complication, I don't think these things are fully agreed on by the Drupal community. With all our clients we would never enable tagging in the way you've suggested as the information architecture is too important. We have tended to use "Other + Suggest" so that a member of staff can choose to add it or not. Now I'm not suggesting my view should be the default in Drupal CMS. But decisions like this likely need to be owned, thought through and tested by someone to mould the vision of Drupal CMS vs Drupal core
- Similarly we can prompt the agents to present options to Sarah as you've suggested. One issue we had during the demo was for Wine Regions it would regularly create the vocabulary but not attach the field to the entity. So we've prompted the Agent to always try and attach it to an entity every time. If it can't work it out, it asks the user if its sure it wants a vocabulary not attached to something. The issue is that inherently asking the end-user to many questions ruins a lot of the usability. So we have to decide what level we want.
- These prompts are all stored in YAML and so can be overridden on a site by site basis. Our plan is to refactor agents at some point to make it easier to edit the prompts in the UI and to make it easier to understand the order and flow of them for site builders. It might be possible as well for us to build a UI to create "Tools/ actions" (The abilities the agents have on the site) through the UI. It's a question of priorities.
- Media is much more complicated to use than image fields, even as a seasoned Drupal site architect find it difficult to know when to use one or another. I think if we can make the image field work well we can port our work to a specific Media in Drupal CMS agent for images. But because of the flexibility of what Media can do it will be a lot of work to make an Agent that can handle every possible configuration of media entities. (Note: Marcus has told me that since yesterday it will use Media if you ask if “Should I use Media or Image” and then choose it. But more work will need to be done for this)
- We are working on Agents and Workspaces, we believe it's one of the main new features we think we need for release. Hope to have something to show early next week. We have the ability for agents to rollback specific actions, but workspaces will allow a number of actions. Similarly we initially had something called blueprints that would show the end-user the YAML of the actions it will take and have them click approve before it implements it. It's still in the agents module but no UI for it in the chatbot yet. I think this will appeal to developers more than marketeers/ site builders.
- As previously stated, migrate will not be in version 1.0. I've written a lot about this and we've done a lot of experiments and made some demos of one click full wordpress migration (Theme, design, layout, content types, content, everything). We think in the short term tools to help sitebuilders speed up migrations considerably are more likely to be successful than a true magical one click migration for reasons you've stated. I can go into detail elsewhere when we release our migrate agents properly.
- Re: your question about events recipe, it's up to us to decide. Long term I would like the Agents to work with the project browser to suggest starting points and even create its own recipes. (For example, for adding reviews features, it could find the reviews module and configure it for wine tours). We have started exploring it on http://askdrupal.com but its not for version 1.0
Re: Your questions on the codebase
- My plan is that AI Agents will have a role assigned to it. The user will also have a role. So it will perform the task only if both the Agent and the User have permission. However there are a couple of issues with it (Do I want to have agent roles on the same page as user roles?). So for now we will remove the abilities from the agents that they shouldn’t have permissions for in Drupal CMS. We will then build Agent specific permissions into the ai agent module.
- This is a good look at what agents are: https://github.com/openai/swarm . The agents in Drupal were built before this was released so there are differences (We use Drupal to orchestrate the workflow rather than agents do it themselves for example). But at their heart agents are Instructions (Prompt) and tools that do things where tools can also handoff a task to another agent. The tools are coded and either link to a specific drupal function or bunch of them. So we will/ already have removed the ability for agents to delete. They don’t just get full reign of everything Drupal does.
- Re: the delete. We had a demo to make people feel more comfortable about this where the agent would ask to delete something. It would try, find it has no permissions, tell the user and then help the user do it themselves. We removed that demo last minute but it's why the code was still there.
- You are correct that confirmation messages are provided by the LLM and there is a chance it can be different. In fact we might actively want it to be different as we want the LLM to use plain english instead of specifically Drupal terms. I think Workspaces can help. Whilst the response of an LLM is non-deterministic and repeatable (Although that changes with a temperature of 0 which we could set for Drupal CMS). There are things they are more or less likely to get wrong. It’s unlikely the LLM will report what its done differently to what its actually done.
If you’re interested in a deeper dive or more questions about this Catch me and Marcus are around on slack for a chat or huddle! We are focusing on this stuff more or less full-time until Drupal CMS release.
This isn’t finished yet but we also have a roadmap here: https://www.drupal.org/project/ai/issues/3485451 🌱 [Meta] Path to rc1 Active for the underlying AI module to get to version 1 (Even if not all the modules are used in Drupal CMS).
- Remove System Role (everywhere in the AI module but I think this is the only remaining place for it)
yautja_cetanu → created an issue.
For deciding Conditionals (this should go in its own issue:
We think we should use twig
- Twig is ONLY used to decide IF a specific automator is run or not.
- Eventually we'd like a UI that can handle the logic for if an automator is run or not similar to how views exposed filters works.
- But this is more complex as we need to choose each field, each field type will have a variety of operators and ways of working and we don't know if people want to use everything. So lets start with TWIG so the use cases we have with automators that are currently impossible, become possible and we can improve the UI with later version.
- Twig is default Off, and a global automator setting can turn it on.
yautja_cetanu → created an issue.
yautja_cetanu → created an issue.
yautja_cetanu → created an issue.
yautja_cetanu → created an issue.
We will likely want to make use of this issue: https://www.drupal.org/project/drupal_cms/issues/3482992 ✨ The installer should collect input for recipes that need it Active
It will likely be important to decide:
- If you want AI at all in Drupal CMS, its there to help people very new to Drupal and so its possibly too much to ask for them to find AI in a project browser, etc.
- Similarly with API keys and providers this will be important before any of the agents work for someone.
We will need some way of exploring what the UI could look like for the rest of the Starshot team to look at.
☑Documentation about what is in Beta1 + Agents
- https://freelygive.io/blog/drupal-beta-ai -
☑ Video about Beta 1 + Agents
https://www.youtube.com/watch?v=HoaJHYI-AmY
This is done and we have basic documentation. Don't have a handle on a drupal.ai site yet though
scott_euser → credited yautja_cetanu → .
Ok at the very least then this makes me think if we do this we should do this as a module outside of the AI module
Really like this! Could I get some screenshots of what this might looks like?
Do you imagine the tempting system to be a page within Drupal for end users? Or something in code? Like example yaml files?
yautja_cetanu → created an issue.
yautja_cetanu → created an issue.
Changed the title to reflect that this is about specifically changing the model metadata during adding as we previously allowed adding new models with huggingface but this will allow us to bring across metadata (such as Context limit and its functionality).
By ensuring at a low level the model metadata is there, it allows modules built on top of the abstraction layer to rely on that information being there and making changes. (For example, a chatbot could allow for voice control which goes away if the selected model doesn't support audio input).
We are trying to balance, providing the user setting up the model with the flexibility they need whilst not overwhelming them with things they can't know the answer to.
A couple of other things:
https://drupal-cms-test.ai.freelygive.dev/admin/config/ai/providers/hugg...
"Choose an available key. If the desired key is not listed, create a new key. The Access Token. Can be found on Huggingface admin pages. Make sure that the token has the correct rights. (This is usually Read access, but you can fine-grain it if needed)"
Also a cool follow-up would be enabling recipes as that is much safer and much more "Drupal CMS" ey. Is it worth tying this to permissions?
nicxvan → credited yautja_cetanu → .
yautja_cetanu → created an issue.
dan2k3k4 → credited yautja_cetanu → .
Does this mean ai search itself requires 8.1 even if you're not using pinecone? I think we need the ai module and ai search to support 8.1.
yautja_cetanu → created an issue.
gábor hojtsy → credited yautja_cetanu → .
scott_euser → credited yautja_cetanu → .
https://bootcamp.uxdesign.cc/the-unstoppable-rise-of-spark-as-ais-iconic... - Explanation of the sparkle
yautja_cetanu → created an issue.
yautja_cetanu → created an issue.
Ok lets keep this closed.
ok do either way we have to do something unique for each DB. Milvus does not allow for custom IDs, Pinecone requires them.
so we pick, milvus being default by accident of history and make the pinecone provider create its own thing
The next provider made like Weviate, if it needs the same thing can copy and paste code from Pinecone and if this happens enough we maybe should provide a function for creating chunk ids in the abstraction layer.
We can't even think of any reason why any module would need some consistent method for chunk IDs. Maybe appending entity ids into the chunk IDs? But that seems useless. Maybe if I had a log system that stored the chunk IDs instead of the contents of chunk I want to keep those IDs consistent but every time you reindex or do anything funky the IDs are going to be messed.
As Scott says "I think we can over time make traits or base classes for similar vdb providers when they share the same functionality to avoid them repeating each other"
Vivek. Do you think you could present some exploration of the benefits of putting something that handles chunk IDs into the VDB abstraction layer itself or putting it in the provider?
If its in the Layer it means all modules can interact with the "Chunk ID" in a consistent manner. What use cases would there be for that? If its in the provider itself then it means that the VDB can't decide how to create IDs and has to rely on each Db's internal approach to it. Does that matter?
Basic messages:
If they choose the RAG service.
- "I'm looking up your answer" - (when the LLM responds saying it wants to do a RAG search)
- "I'm now analysing the results and preparing my response" (When we sent the new prompt with context to the LLM)
We originally had this proposed resolution here. However its explored more in the child issue.
- Can we make it so that it chooses when to use RAG, rather than always doing a RAG search. This could be achieved via a simplechat model to save money (Such as ChatGpt 4o mini) - This is explored in this child issue: https://www.drupal.org/project/ai/issues/3470263 🌱 Handle threading and history with Assistants API and RAG Active
- Implement "SimpleChat" - default -
- Chatbot should be able to receive files. (Where are those files stored if we don't store history?)
- If we decide a simple model decides if its RAG or not check the flag for JSON
- Add the ability for a "Pre Search" prompt (Also what should it tell the next LLM - Provide a default answer) and a "Post search Prompt". If its empty don't use it. (DON'T DO THIS, BUT NEE TO MAKE SURE THIS USE CASE IS SORTED
Ok thinking about it more the "Pre-Pre-System-Prompt" - needs to be put together by our system. If we want the end user to make changes they could write "When to use this service" as configuration on the assistants config page that lets you pick the services.
yautja_cetanu → created an issue.
yautja_cetanu → created an issue.
Regarding your 4 points
Firstly, we plan to release ai search with an llm chatbot next week in alpha 5.
1. Unfortunately this is really hard to do in a way that is performance. We've been exploring lots of ways of doing this and I think it's really important but we haven't figured out to do it in a way that scales.
Once we release it I'll make an issue about the problems surrounding this problem and we welcome ideas.
2. We use search api and whilst it's not literally real time it's pretty close? Like I'm finding search api with solr is usually only off by milliseconds.
3. In the ai module the ai search module requires search api and so will have integration with that.
4. The goal is to integrate ai search and automators so then we would probably do it on the content. But in a couple of weeks we are going to try out using the ai module for a recommender.
yautja_cetanu → created an issue.
Automator Chain [Disposable Entity, Automator Chain Disposable Entity] - This is the entity that stores all the fields that have automators on it in one chain. Often this is just a Content Type and this term is only a label on a tab alongside manage fields (Manage Automator Chain). As a result this won't appear in code often. However when we have disposable entities it will be called an Automator Chain Disposable Entity.
Automator [Machine name:automator type] - These are like the plugins or "Rules" you can download and install for the Automators module. They are the things that have to be written by code in order to do a specific thing. For example there will be an LLM Text Long Automator - it has the ability to call an LLM with a prompt and then put the output of that LLM into the feild. There may be a web scraping automator, that can scrap a website.
Automator Instructions [Machine name Automatator.Config] - This is the list of settings you see on a field when you click "Enable AI Automator/s". These settings tell the Automator what to do at that step in the chain (For example the prompt, the temperature, how many lines deep does the webscraper scrape). They are call Instructions conceptually in the UI as they are more than just config but in code we'll call it config as it behave much like config throughout Drupal.
Automator config - Doesn't matter as much, its just a single field on the config.
Chatted to a UX person, This is where we thought
"Clear or straightforward vs branded" terms
Red - Automator Chain
Green - Automator Instructions
Purple - Automator (automator type)
Yellow - Automator config
valthebald → credited yautja_cetanu → .
valthebald → credited yautja_cetanu → .