Milvus-based server silently failing to insert to collection when indexing

Created on 16 October 2024, about 1 month ago

Problem/Motivation

After setting up a very standard out-of-the-box AI Server and Index, using OpenAI and Milvus, everything works with no errors. Indexing succeeds, showing 25/25 of my test nodes have been indexed, but the "Server index status" reads "there are 0 items indexed on the server", and, indeed, my Zilliz collection shows 0 entries. This worked just a few weeks ago just fine with the exact same setup, the only difference (that I can tell) being the Embeddings change that went out a few weeks ago from the following issue: https://www.drupal.org/project/ai/issues/3471759 πŸ“Œ Search API Index fields need more attributes Active . Of course, it's entirely possible this is unrelated.

Steps to reproduce

1. Enable ai, ai_search, provider_openai, and vdb_provider_milvus
2. Under Config/System/Keys add OpenAI and Milvus keys.
3. Under Config/AI/Provider Settings/OpenAI Authentication, select the OpenAI key.
4. Under Config/AI/Vector DBs Settings/Milvus Configuration, add the Zilliz server, and select the Milvus key.
5. Under Config/Search and Metadata/Search API, add a Server. Name it "AI Test Server", and select AI Search as the backend. I used text-embedding-3-small for the Embeddings Engine, but repeated this bug with 3-large as well. Select Milvus DB as the database, give `default` as the Database name and an unused name for the Collection.
6. Verify on Zilliz that the Collection created in the previous step now exists, and is empty and Unloaded.
7. Add an index. Name it "AI Test Index" and select Content as a datasource, and any content types that have existing nodes for bundles. In my case, it's all node types; we have 25 total across a few bundles. Select "AI Test Server" as the server.
8. Add fields to the index. Add "Rendered HTML output" and "Content: Title".
9. Set Rendered HTML Output to index as the main content, and all bundle types to render as "default".
10. Go to the AI Test Index page and click "Index now".
11. Wait for the process to proceed. It appears to be processing!
12. Verify that while "Index status" says all nodes are indexed, "Server index status" confirms there are 0 items indexed on the server, and Zilliz confirms that the Collection verified in step 6 is empty.

Proposed resolution

I have no proposals to correct this bug at this time; to be completely honest, I'm a bit out of my depth in trying to troubleshoot this to make sure it's not something I've done, and I'll be quite happy if it's some configuration issue.

Remaining tasks

πŸ› Bug report
Status

Active

Version

1.0

Component

AI Search

Created by

πŸ‡ΊπŸ‡ΈUnited States jwaxo

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @jwaxo
  • πŸ‡¬πŸ‡§United Kingdom scott_euser

    I'll take a look tonight to see if I can spot anything with your setup that could use extra validation. Just a couple hours ago on dev branch I reindexed milvus with umami without issue.

  • πŸ‡¬πŸ‡§United Kingdom scott_euser

    I had to click edit and select the view mode here, I selected 'Search Index' so it renders the node without the wrapper header and footer:

    Checking if I can reproduce not having selected view mode saved or something along those lines perhaps...

  • πŸ‡¬πŸ‡§United Kingdom scott_euser

    Yep tried a bunch of scenarios:

    1. Add rendered html via the initial wizard then add a bunch of other fields
    2. Add rendered html on its own via the initial wizard
    3. Add rendered html after already saving initial fields

    All scenarios it seems to render the content fine.

    Checking what else might be problem

  • πŸ‡¬πŸ‡§United Kingdom scott_euser

    Okay I can't see where it goes wrong, but I can see a few extra checks and validations I can add to be helpful.

    Note I did need to click 'Load data' a few times in the data preview area, it didn't immediately load and kept replying with no data yet data was there. Eventually it did show up. But that would be a bug to take up with Zilliz Cloud, not here though if that is your issue. I just indexed some content off wikipedia to try it out.

    Switching to 1.0.x-dev as well as I don't have the ability to help or test with alpha8 without setting up new test sites and given I'm doing this in my free time, trying to keep it to a minimum - hope you understand. Someone else might be willing to try out on alpha8 though, in which case they can switch it back; but will miss out on the extra validations/checks already in place on 1.0.x-dev...

  • πŸ‡¬πŸ‡§United Kingdom scott_euser

    Couple more notes:

    • The number of indexed items comes from Search API not the AI Search module. It is a bit misleading, check the 'More information' - it says its actually the number returned from an empty query. Because of how vector databases work, an empty query will return 0 results, it needs something to compare against
    • Check out the 'Preview' section at the bottom of the Search Index -> Fields page. That shows you what content actually gets vectorized + filterable attributes added without actually writing to the index. So you can double check there is actual content ready to be indexed for a particular content item.
    • Beyond that if you have no errors on the Search APi Server -> View page + Nothing in the error logs, all I can think of is you've got some processor excluding or messing with something, e.g. access control for example
  • πŸ‡¬πŸ‡§United Kingdom scott_euser
Production build 0.71.5 2024