Talking with content - is it the content management of the future? (plus Python side question)

Issue created by @mindaugasd
Comment almost 2 years ago →
🇱🇹Lithuania mindaugasd
Comment almost 2 years ago →
🇺🇸United States afinnarn
Maybe Drupal has to go all in to integrate with Python? So it would be easier to adopt all the richness of Python ecosystem across Drupal websites? (and so, adopt llama index)

I wholeheartedly agree with trying to leverage the Python libraries vs. trying to port them to PHP and keep up with the original Python library. IMHO, that approach would be far easier to maintain as you mention PHP not having as much activity in the AI/ML space as Python does...and PHP never will have a lot of activity since "connector tools" allow you access to the Python libraries without having to port the code to another language.

Another benefit is if there is a list of "data connectors" for ChatGPT/Llama Index that includes Drupal, people can say "hmm, what's this Drupal thing?" and try Drupal out vs. not trying to integrate and "stay on the island" so only Drupal and PHP devs are aware of how AI tools can be integrated and used in a structured CMS system.

My two cents...
Status changed to Postponed: needs info about 1 year ago5:28am 13 June 2024
Comment about 1 year ago →
🇦🇺Australia pameeela
This feels more like a general discussion than a specific proposal for Drupal core?
Status changed to Active about 1 year ago6:43am 13 June 2024
Comment about 1 year ago →
🇱🇹Lithuania mindaugasd
Yes, I am doing the same as for other issues and moving it to AI initiative project.
Comment about 1 year ago →
🇳🇱Netherlands jurriaanroelofs
I think this feature is the most important AI feature Drupal can develop. Drupal is uniquely positioned to offer the best AI CMS because Drupal content is typically very clearly structured with semantically named components like entity types, bundles, fields, paragraphs, layouts etc. A lot of content and metacontent (e.g. layout) that an LLM can work with.

"Dear AI, please find most innovative ideas that people have posted on drupal.org in the last 3 months"

For this to work, I think the technology to bet on is RAG (Retrieval Augmented Generation) which involves vectorizing the Drupal data you want to integrate with the LLM. There is a reason there aren't any AI PHP modules while there are many Python AI modules, Python is just much more computationally efficient and therefore suited to computationally intensive work, which vectorization is.

It would already be a great achievement to have a Hello world example that used vectorized Drupal data but here is a list of things I'd like to see as part of this intiative as well:

Multi tenancy Make it possible for an end user to ask "Why is my latest invoice so high" and for the chatbot to reply something like "A higher than usual metered "Line-Item-X" was consumser through your project "Software-Y". But won't want to expose invoices of other accounts to the chat instance.

Unsupervised updating of vectorized database. At some point real-time would be great of course

Integration with Search (API). I think they are already separately working on RAG but I haven't kept track. It would be great if the default search form could leverage LLM capabilities.

Recommender algorithm: finding related content with fuzzy keyword. If you like reading Drupal issues about RAG from the AI module you might enjoy these issues about using generative AI in search_api module issue queue. etc.
Comment about 1 year ago →
🇱🇹Lithuania mindaugasd
@JurriaanRoelofs good ideas, relates to this issue: 🌱 Project/issue nodes are knowledge that can be graphed Active

Current situation is:

I hear that SQL databases are slowly releasing native features for vectors/RAG, but don't follow the details of this (yet)

FreelyGive team have built working demo vectorizing project browser and assistant http://project-browser.freelygive.io/ (Source code https://gitlab.com/freelygive/demos/module-bot)

AI module https://drupal.org/project/ai have build many AI search features this month, and during next month I hear this will be developed further. Team includes people who worked on all previous solutions, so this is next version already.

Next week I will explore project browser demo myself and ways to improve. In general, improving project browser demo to a high level (a common feature we all have access to), can improve general Drupal capabilities in this area.

Previously I had project browser design without RAG (Retrieval Augmented Generation) in mind - to summarize every module description, and then give AI all those summaries. So AI having full context of everything, could produce very accurate answer which module to use for what use case. For this, was working on these modules and issues:

To bulk summarize everything ✨ Views bulk operations Active within https://www.drupal.org/project/ai →

To provide AI context ✨ Create "views" prompt segment plugin Fixed within https://www.drupal.org/project/aiprompt →

But using full context all the time can be expensive (the bigger the context, the more it costs per request), while RAG retrieved context can cost a lot less.

But best approach maybe is to combine summaries and RAG together, because one of the problems with project browser RAG probably is that not all modules are well described (like wrote more here here ✨ Add filter by project dependencies (ecosystem) Active and here 🌱 Project/issue nodes are knowledge that can be graphed Active in more detail), so summarizing all module descriptions first, and then do RAG retrieval can be perfect.
Comment about 1 year ago →
🇬🇧United Kingdom yautja_cetanu
Regarding your 4 points

Firstly, we plan to release ai search with an llm chatbot next week in alpha 5.

1. Unfortunately this is really hard to do in a way that is performance. We've been exploring lots of ways of doing this and I think it's really important but we haven't figured out to do it in a way that scales.

Once we release it I'll make an issue about the problems surrounding this problem and we welcome ideas.

2. We use search api and whilst it's not literally real time it's pretty close? Like I'm finding search api with solr is usually only off by milliseconds.

3. In the ai module the ai search module requires search api and so will have integration with that.

4. The goal is to integrate ai search and automators so then we would probably do it on the content. But in a couple of weeks we are going to try out using the ai module for a recommender.
Comment about 1 year ago →
🇱🇹Lithuania mindaugasd
Graph RAG vs. Vector RAG
https://youtube.com/watch?v=pvXnDNrOOQw
Comment about 1 year ago →
🇳🇱Netherlands jurriaanroelofs
Thx guys I appreciate your feedback and I'm looking forward to dive deeper into the topics. Will check out the video about graph RAG. I also recently saw a video about how Hybrid Search can be a better solution than RAG and maybe that is also easier to implement. I'm just starting out getting into this so you probably thought about this already.
Status changed to Postponed about 2 months ago3:36am 8 June 2025
Comment about 2 months ago →
🇺🇸United States Kristen Pol Santa Cruz, CA, USA
mindaugasd has handed over maintainership, so I'm going through all of the issues.

Postponing this one for now. We can review all the postponed issues in the coming weeks to see what might be worked on via the AI Initiative project → .

Talking with content - is it the content management of the future? (plus Python side question)

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Comments & Activities