Talking with content - is it the content management of the future? (plus Python side question)

Created on 20 September 2023, about 1 year ago
Updated 29 July 2024, 4 months ago

Problem/Motivation

Python has an interesting library "LlamaIndex" https://github.com/jerryjliu/llama_index

  1. It offers data connectors to ingest content from all data sources and data formats:
    • Simple Directory
    • Psychic
    • DeepLake
    • Qdrant
    • Discord
    • MongoDB
    • Chroma
    • MyScale
    • Faiss
    • Obsidian
    • Slack
    • Web Page
    • Pinecone
    • Mbox
    • Milvus
    • Notion
    • Github Repo
    • Google Docs
    • Database (SQL etc.)
    • Twitter
    • Weaviate
    • Make
    • Deplot
  2. Provides ways to structure data (indices, graphs) so that this data can be easily used with LLMs.
  3. Provides an advanced retrieval/query interface over data: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.
  4. Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else).

To sum up: it allows to gather content from anywhere and talk with AI about it. In other words, manage content smart automatically.

Example how content management of the future might work:

Let's say, I want to find something on drupal.org, I could ask an AI:

"Dear AI, please find most innovative ideas that people have posted on drupal.org in the last 3 months"

And AI having index of all drupal.org content, would give the answer (ideally)

Proposed resolution

  1. Drupal has early version of chat interface https://drupal.org/project/aichat
  2. Drupal has many attempts to index content using Search API and various vector databases. Full list of modules here: https://www.drupal.org/project/ideas/issues/3346258#modules-list-indexing πŸ“Œ [META] Drupal could be great for building AI tools (like ChatGPT) Active
  3. Drupal has very nice new innovation to my knowledge: https://drupal.org/project/aiprompt
  4. Drupal has some traditional benefits for AI content management, as explained in this video "Beyond Vector Search: Knowledge Management with Generative AI"
  5. [Drupal has many other good AI things which are not related to this issue]

But LlamaIndex is way ahead of Drupal.

I tried to look what libraries PHP has to offer, but it is void of AI activity (only few unrelated libraries).

Maybe Drupal has to go all in to integrate with python? So it would be easier to adopt all the richness of Python ecosystem across Drupal websites? (and so, adopt llama index)

Remaining tasks

User interface changes

API changes

Data model changes

πŸ’¬ Support request
Status

Active

Component

Discussion

Created by

πŸ‡±πŸ‡ΉLithuania mindaugasd

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @mindaugasd
  • πŸ‡ΊπŸ‡ΈUnited States afinnarn

    Maybe Drupal has to go all in to integrate with Python? So it would be easier to adopt all the richness of Python ecosystem across Drupal websites? (and so, adopt llama index)

    I wholeheartedly agree with trying to leverage the Python libraries vs. trying to port them to PHP and keep up with the original Python library. IMHO, that approach would be far easier to maintain as you mention PHP not having as much activity in the AI/ML space as Python does...and PHP never will have a lot of activity since "connector tools" allow you access to the Python libraries without having to port the code to another language.

    Another benefit is if there is a list of "data connectors" for ChatGPT/Llama Index that includes Drupal, people can say "hmm, what's this Drupal thing?" and try Drupal out vs. not trying to integrate and "stay on the island" so only Drupal and PHP devs are aware of how AI tools can be integrated and used in a structured CMS system.

    My two cents...

  • Status changed to Postponed: needs info 5 months ago
  • πŸ‡¦πŸ‡ΊAustralia pameeela

    This feels more like a general discussion than a specific proposal for Drupal core?

  • Status changed to Active 5 months ago
  • πŸ‡±πŸ‡ΉLithuania mindaugasd

    Yes, I am doing the same as for other issues and moving it to AI initiative project.

  • πŸ‡³πŸ‡±Netherlands jurriaanroelofs

    I think this feature is the most important AI feature Drupal can develop. Drupal is uniquely positioned to offer the best AI CMS because Drupal content is typically very clearly structured with semantically named components like entity types, bundles, fields, paragraphs, layouts etc. A lot of content and metacontent (e.g. layout) that an LLM can work with.

    "Dear AI, please find most innovative ideas that people have posted on drupal.org in the last 3 months"

    For this to work, I think the technology to bet on is RAG (Retrieval Augmented Generation) which involves vectorizing the Drupal data you want to integrate with the LLM. There is a reason there aren't any AI PHP modules while there are many Python AI modules, Python is just much more computationally efficient and therefore suited to computationally intensive work, which vectorization is.

    It would already be a great achievement to have a Hello world example that used vectorized Drupal data but here is a list of things I'd like to see as part of this intiative as well:

    1. Multi tenancy Make it possible for an end user to ask "Why is my latest invoice so high" and for the chatbot to reply something like "A higher than usual metered "Line-Item-X" was consumser through your project "Software-Y". But won't want to expose invoices of other accounts to the chat instance.
    2. Unsupervised updating of vectorized database. At some point real-time would be great of course
    3. Integration with Search (API). I think they are already separately working on RAG but I haven't kept track. It would be great if the default search form could leverage LLM capabilities.
    4. Recommender algorithm: finding related content with fuzzy keyword. If you like reading Drupal issues about RAG from the AI module you might enjoy these issues about using generative AI in search_api module issue queue. etc.
  • πŸ‡±πŸ‡ΉLithuania mindaugasd

    @JurriaanRoelofs good ideas, relates to this issue: 🌱 Project/issue nodes are knowledge that can be graphed Active

    Current situation is:

    • I hear that SQL databases are slowly releasing native features for vectors/RAG, but don't follow the details of this (yet)
    • FreelyGive team have built working demo vectorizing project browser and assistant http://project-browser.freelygive.io/ (Source code https://gitlab.com/freelygive/demos/module-bot)
    • AI module https://drupal.org/project/ai have build many AI search features this month, and during next month I hear this will be developed further. Team includes people who worked on all previous solutions, so this is next version already.
    • Next week I will explore project browser demo myself and ways to improve. In general, improving project browser demo to a high level (a common feature we all have access to), can improve general Drupal capabilities in this area.

    Previously I had project browser design without RAG (Retrieval Augmented Generation) in mind - to summarize every module description, and then give AI all those summaries. So AI having full context of everything, could produce very accurate answer which module to use for what use case. For this, was working on these modules and issues:

    But using full context all the time can be expensive (the bigger the context, the more it costs per request), while RAG retrieved context can cost a lot less.

    But best approach maybe is to combine summaries and RAG together, because one of the problems with project browser RAG probably is that not all modules are well described (like wrote more here here ✨ Add filter by project dependencies (ecosystem) Active and here 🌱 Project/issue nodes are knowledge that can be graphed Active in more detail), so summarizing all module descriptions first, and then do RAG retrieval can be perfect.

  • πŸ‡¬πŸ‡§United Kingdom yautja_cetanu

    Regarding your 4 points

    Firstly, we plan to release ai search with an llm chatbot next week in alpha 5.

    1. Unfortunately this is really hard to do in a way that is performance. We've been exploring lots of ways of doing this and I think it's really important but we haven't figured out to do it in a way that scales.

    Once we release it I'll make an issue about the problems surrounding this problem and we welcome ideas.

    2. We use search api and whilst it's not literally real time it's pretty close? Like I'm finding search api with solr is usually only off by milliseconds.

    3. In the ai module the ai search module requires search api and so will have integration with that.

    4. The goal is to integrate ai search and automators so then we would probably do it on the content. But in a couple of weeks we are going to try out using the ai module for a recommender.

  • πŸ‡³πŸ‡±Netherlands jurriaanroelofs

    Thx guys I appreciate your feedback and I'm looking forward to dive deeper into the topics. Will check out the video about graph RAG. I also recently saw a video about how Hybrid Search can be a better solution than RAG and maybe that is also easier to implement. I'm just starting out getting into this so you probably thought about this already.

Production build 0.71.5 2024