undefined - Contrib.social

🇬🇧United Kingdom seogow

All the points were incorporated. Main changes:

Fields now created programmatically at AI Log
Added Raw request tracking
Menu was changed into tabs, tabs are created via deriver
Delete button is now tab
LLM processing is async, both via Cron and Batch
Tabs were renamed for better clarity

📌 | AI | Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes

🇬🇧United Kingdom seogow

I have opened a new MR for improved version of the chunking. This one incorporates new table for chunks, allowing for control of what has been indexed and when.

🌱 | AI | Extend Logging Capabilities

Comment about 1 month ago →

🇬🇧United Kingdom seogow

Thanks @scott_euser!

I assume I shouldn't get this double raw + processed logs - that is strange, I have to investigate, it didn't do that for me.
We need some sort of help text here explaining what these things are... what does 'processed mean'? I go to it, its empty. I guessed maybe I need to mark something as processed (like reviewed)? but I open a log and I don't see such a button, nor in dropbutton, so I hit a dead end there. - Good catch. These are 2 separate issues:
1. We need help text. Processed logs are logs which have been processed by LLM and have structured output (nice UX). Now, the only ones we do proces that way, are Chat logs, but that list shall grow. You should start getting something there as soon as you log first call of type Chat.
2. The tab with Processed logs should have sub-tabs with types of processed logs (again, now just Chat). But you should at least see an empty View there!
Is tabs the right interface? The typical interface for this would be Local Tasks I believe, like attached. In which case 'Delete' could also be a local task like it is for Drupal Core database logs rather than the more scary danger button? - The UX here is quite complex. We need:
- Menu for selecting 'Raw logs' (what you have now) and 'Processed logs' (nice logs, processed by AI, with call chain and all the attributes). With sumbenu for each type of processed logs, because they can have very different fields and post-processing:
  - Chat processed logs
  - Image generation processed logs
  - ...
- Delete butto - to give the Delete functionality the same prominence like have 'Raw logs' and 'Processed logs' didn't feel right with me, but I might be the only one?

🌱 | AI | Extend Logging Capabilities

Comment about 1 month ago →

🇬🇧United Kingdom seogow

@marcus_johansson yes, we discussed that.

The issue with adding event trigger for streaming response into every LLM provider is, that functionality of this event then depends on 3rd party implementations.

Not sure we want to rely upon these, especially with ever growing number of them.

Hence, I believe what we have now is a bug and it should be rectified asap, even if it indeed is a breaking change? Especially when we have just a few LLM providers which do stream at the present time?

🌱 | AI | Extend Logging Capabilities

Comment about 1 month ago →

🇬🇧United Kingdom seogow

@andrewbelcher

The change in the AI Providers (in their abstraction) is unfortunately necessary. There was missing an event triggered on streamed response reception.
'Prompt' can be a misnomer here - it is an initial request to LLM, commonly named 'Prompt'. That is why you see the dialogue here sometimes and sometimes not - the LLM receives all the dialugue content + System prompt on input. It is not a summary, it is really just humanly styled LLM input (that is why you can download it and copy/paste into LLM and it still works).
Response should probably be renamed to Chain Response and a new tab Call response should be introduced, to distinguish between response of the whole Call Chain (which is indeed response of its last call - the final reaction to a human query) and response of the actual LLM call (which frequently is an invocation of a function call 'Function call for answer').
With any complexity of nested calls, the chain will always be uninterrupted, beginning with a human request and ending with LLM providing a final response. The only UX/UI change I would introduce is to make a function decription more prominent, so you wuld see in the log exactly what was called, along with its arguments.
We should do async processing of the logs, yes, both in Cron and in the batch.
You do have RAW input - see the 'Prompt' above. the Prompt you download is the raw input. But we can/should also save JSON paypload to LLM.
Requested and Performed action tabs are intended to show LLM-processed input and output. I do on runtime replace that information with raw data if the processed ones aren't available (for logs recorded before LLM processing was enabled), but I see this is confusing. I will hide the tabs if LLM processed data aren't there.
Vertical tabs are not natively utilized in the default display of entity views, they are designed to enhance the user experience during content creation and editing. They work on forms, not on display. It would be probably overkill to render a form just to get tabs.
Having real menu and real fields is an architectural decision stemming from us originally having ability to edit Log entities (since the very first version of the module). I would also prefer to do it programmaticly as the practical advantage for logging is not clear.

✨ | Anthropic Provider | Add structured output and tool calling

Comment about 1 month ago →

🇬🇧United Kingdom seogow

The code is updated. Unfortunately, wpai-inc/anthropic-sdk-php doesn't allow for tools (last commit was 7 months ago), so we either change the library, or (I would recommend) fork it with our patch (attached).

✨ | Anthropic Provider | Add structured output and tool calling

Comment about 1 month ago →

🇬🇧United Kingdom seogow

✨ | Anthropic Provider | Add structured output and tool calling

Comment about 1 month ago →

🇬🇧United Kingdom seogow

seogow → made their first commit to this issue’s fork.

Comment about 2 months ago →

🇬🇧United Kingdom seogow

mcdruid → credited seogow → .

🌱 | AI | Extend Logging Capabilities

Comment about 2 months ago →

🇬🇧United Kingdom seogow

Thanks, I am in the middle of rewrite of the issue code so it plays nicely with the existing logging.

📌 | Views AI Summary | Allow the summary block to be cached

Comment 2 months ago →

🇬🇧United Kingdom seogow

Agreed. OK, I will aim to release this modules 1.0 with this functionality along with 1.1 version of AI Module on which it depends.

📌 | Views AI Summary | Allow the summary block to be cached

Comment 2 months ago →

🇬🇧United Kingdom seogow

Flexibility of having block independent to View is important for design purposes, hence usage of View area (and thus View's own caching) would not be good for UX builders.

But the point is valid.

I suggest to add fingerprinting of content and caching the LLM response. That way, should the content fingerprint - the View's result - remains the same, the LLM call is bypassed and output of a cache is used instead.

Would that be satisfactory?

📌 | AI | Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes

Comment 3 months ago →

🇬🇧United Kingdom seogow

@scott_euser - you are 100% correct. The problem we have now is that it works for most scenarios (under 10-12 chunks), but fails for long nodes (and slow, local embedding LLMs) completely.

The solution I coded is actually imperfect either - it bypasses batch-chunking of long Entity on CRUD when immediate indexing is allowed. That potentially not only creates orphaned chunks, but actually fails to chunk and index long Entities.

I suggest we implement a robust solution:

Now, an Entity is marked as indexed when child chunks are added to query. It should be the opposite - it should be marked as indexed, only when all the child chunks are processed.
For AI Search index we replace Index status page with page reflecting chunking, not Entities. The user then will see non-processes Entities and - if any - an unfinished Entity, including number of unembedded chunks.
We set robust handling of indexing for Entity CRUD, where we set message on failure where we offer to index manually on index page, or add to a Cron query. If user doesn't react, it will be added to the query (that way the re-indexing request is actually recorded and it will be processed eventually).

🌱 | AI Evaluations | AI Evaluations to extend AI Logging functionality

Comment 3 months ago →

🇬🇧United Kingdom seogow

🌱 | AI Evaluations | AI Evaluations to extend AI Logging functionality

Comment 3 months ago →

🇬🇧United Kingdom seogow

📌 | OpenAI Provider | Add event triggering for streaming

Comment 3 months ago →

🇬🇧United Kingdom seogow

I have uploaded a patch which uses a suggested fix in a AI module.

Each AI provider supporting stream should be able to iterate any way it needs to, so I suggest this change: https://git.drupalcode.org/issue/ai-3493277/-/blob/3493277-extend-loggin...

This way the getIterator() works as a wrapper template, ensuring compatibility, which at the end of stream triggers the event. If a provider requires different implementation if iterator, it can override overrideGetIterator().

🐛 | AI | Re-indexing on node save adds new chunks instead of replacing existing chunks

Comment 3 months ago →

🇬🇧United Kingdom seogow

🐛 | AI | Re-indexing on node save adds new chunks instead of replacing existing chunks

Comment 3 months ago →

🇬🇧United Kingdom seogow

Merge request is ready.

Facade search for Milvus IDs from Drupal IDs was limited to 10, hence when the number of chunks per node was larger, some of these remained in the VDB index, because only the first 10 were found and deleted before saving new chunks.

🐛 | AI | Re-indexing on node save adds new chunks instead of replacing existing chunks

Comment 3 months ago →

🇬🇧United Kingdom seogow

Cool, I will check Milvus for this.

🌱 | AI | Extend Logging Capabilities

Comment 3 months ago →

🇬🇧United Kingdom seogow

🌱 | AI | [Meta] Bring AI Search, Assistants and Chatbot out of Experimental

Comment 3 months ago →

🇬🇧United Kingdom seogow

🐛 | AI | Re-indexing on node save adds new chunks instead of replacing existing chunks

Comment 3 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🌱 | AI | [Meta] Bring AI Search, Assistants and Chatbot out of Experimental

Comment 3 months ago →

🇬🇧United Kingdom seogow

I went through all the tickets and have done/learned the following

Tickets mentioned in summary

[Meta] Bring AI modules Tests, Error Handling and Documentation 🌱 [Meta] Bring AI modules Tests, Error Handling and Documentation Active has one unfinished child Add search api integration test 📌 Add search api integration test Active , which is reported by @scott_euser. I have asked for either implementing or assigning to me.
SOLR 'boost' of results should find results that are not found by traditional SOLR search 📌 Check that SOLR 'boost' of results finds results that are not found by traditional SOLR search Active is reported by @scott_euser. I have asked for either implementing or assigning to me.
Pass metric type to vector search function ✨ Pass metric type to vector search function Active is reported by @joshhytr. I have asked for either finishing implementation or allowing maintainers to finish it.
Rename 'Boost' plugins to 'Boost and combine' plugins 📌 Rename 'Boost' plugins to 'Combine' plugins Active is reported and possibly also finished by @scott_euser. Asked for opening MR.
Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes 📌 Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes Needs work is reported by and assigned to me, with PR comments by @scott_euser. I am going to implement these.
[Meta] Features to focus on for v1.1 → - I have added a comment mentioning things which were done. Unless @yautja_cetanu wants to create child tickets, I suggest the remaining Features of Search module mentioned in the ticket as not required for moving out of experimental.

Other open bug reports

Failure to insert into collection in Milvus 2.4+ with float32 instead of floatvector data passed 🐛 Failure to insert into collection in Milvus 2.4+ with float32 instead of floatvector data passed Postponed: needs info - asked to close or follow-up.
Incorrect view mode configuration in RagAction::fullEntityCheck and hardcoded default value in Form 🐛 Incorrect View Mode Configuration in RagAction::fullEntityCheck and hardcoded default value in Form Active - merge request exists.
Milvus-based server silently failing to insert to collection when indexing 🐛 Milvus-based server silently failing to insert to collection when indexing Active - stale ticket, warning issued for it to be closed on Tuesday, 4 February 2025.
Error while indexing 🐛 Error while indexing Active - stale ticket, warning issued for it to be closed on Tuesday, 4 February 2025.
Flush index if Embedding Strategy is set 🐛 Flush index if Embedding Strategy is set Needs review - possibly stale ticket, warning issued for it to be closed on Tuesday, 4 February 2025.
Non-UTF8 encoding results in the exception "Failed to insert into collection" 🐛 Non-UTF8 encoding results in the exception "Failed to insert into collection" Active - possibly stale ticket, warning issued for it to be closed on Tuesday, 4 February 2025.
VDB Provider only visible in Search API create Index form after cache clear 🐛 Pinecone not visible in seach_api only after cache clear. Active - asked @mrdalesmith & @scott_euser for heads-up and offered to investigate on Wednesday, 22 January 2025.

Suggested way ahead

I shall either create the fixes for PR of MRs, or close the tickets due to inactivity on dates provided above.

Hence

Assuming the work on stale tickets (where reporter or assigned developer do not communicate, they do not follow the recommendations, or ask us for help with finishing the ticket) start on Tuesday, 4 February 2025, it can be ready for PR on Friday 7 Ferbuary 2025.

🐛 | Pinecone VDB Provider | Pinecone not visible in seach_api only after cache clear.

Comment 3 months ago →

🇬🇧United Kingdom seogow

@mrdalesmith & @scott_euser will you follow or should I investigate further on Wednesday, 22 January 2025?

🐛 | AI | Non-UTF8 encoding results in the exception "Failed to insert into collection"

Comment 3 months ago →

🇬🇧United Kingdom seogow

@fishfree are you happy to close ticket as being fixed in Broken Byte-Pair Encoding (BPE) 🐛 Broken Byte-Pair Encoding (BPE) Active ? Without reactkons, I shall close this on Tuesday, 4 February 2025.

🐛 | AI | Flush index if Embedding Strategy is set

Comment 3 months ago →

🇬🇧United Kingdom seogow

I believe we should not allow changing the embedding strategy for an index. The reason is that some of the strategies intentionally create a single chunk (so the search backend doesn't need to deal with multiple chunks per Entity ID), which can be expected and further processed as is. Changing the strategy would have unwanted consequences.

I suggest this had fixed the issue: https://git.drupalcode.org/project/ai/-/merge_requests/62

Proposal: close this as Done not Doing. I shall close this ticket accordingly on Tuesday, 4 February 2025 if there are no further objections?

🐛 | AI | Milvus-based server silently failing to insert to collection when indexing

Comment 3 months ago →

🇬🇧United Kingdom seogow

If no new input is provided, this ticket will be closed on Tuesday, 4 February 2025 due to inactivity.

🐛 | AI | Error while indexing

Comment 3 months ago →

🇬🇧United Kingdom seogow

If no new input is provided, this ticket will be closed on Tuesday, 4 February 2025 due to inactivity.

🐛 | AI | Exception: Failed to create collection: in Drupal\vdb_provider_milvus\Plugin\VdbProvider\MilvusProvider->insertIntoCollection() (line 221 of /var/www/html/drupal/web/modules/contrib/ai/modules/vdb_providers/vdb_provider_milvus/src/Plugin/VdbProvider/Milvu

Comment 3 months ago →

🇬🇧United Kingdom seogow

@fishfree - I also cannot replicate. I plan to close the ticket on Tuesday, 4 February if nothing new is found/reported.

🌱 | AI | [Meta] Features to focus on for v1.1

Comment 3 months ago →

🇬🇧United Kingdom seogow

Crossed out feature requests were implemented/present in 1.0.x, commit 7809b7fd392ab3eef2c361437091aa91913a334e.

- this is possible with existing module. It depends on VDB implementation (and capabilities).
- the existing code does this.
Views filters - working with RAG - Does work, but not good yet. - Probably need a specific Vector Exposed filter - this has to be done case by case, but we should/could probably start with disabling Views filters which actually do not work with VDB Search API index.
Have the search respect user permissions. (permissions in Drupal is so huge computationally - So Drupal even does recalculation by itself). - This can be actually done using the same method as Drupal does, @yautja_cetanu it might be worth opening a new ticket.

📌 | AI | Rename 'Boost' plugins to 'Combine' plugins

Comment 3 months ago →

🇬🇧United Kingdom seogow

@scott_euser is this ready to merge?

📌 | AI | Check that SOLR 'boost' of results finds results that are not found by traditional SOLR search

Comment 3 months ago →

🇬🇧United Kingdom seogow

I agree, it is a bug. In order to move with [Meta] Bring AI Search out of Experimental 🌱 [Meta] Bring AI Search, Assistants and Chatbot out of Experimental Active , are you happy for me to do the code and assign PR to you afterwards or will you have time to do it in in ~2 weeks?

✨ | AI | Pass metric type to vector search function

Comment 3 months ago →

🇬🇧United Kingdom seogow

@joshhytr this issue was flagged is important for [Meta] Bring AI Search out of Experimental 🌱 [Meta] Bring AI Search, Assistants and Chatbot out of Experimental Active . We would like to have it fixed until 4th February 2025. Will you manage or are you happy for maintainers to help?

📌 | AI | Add search api integration test

Comment 3 months ago →

🇬🇧United Kingdom seogow

@scott_euser - are you going to implement or should I assign this to me and ask for PR when tode?

✨ | AI | Add address field automator type

Comment 3 months ago →

🇬🇧United Kingdom seogow

The code doesn't introduce any security issue, good.

However:

As pointed out by @MrDaleSmith, there is duplicity in:

public function runChatMessage(string $prompt, array $automatorConfig, $instance, ?ContentEntityInterface $entity = NULL) {
  $text = $this->runRawChatMessage($prompt, $automatorConfig, $instance, $entity);
  // Normalize the response.
  $json = $this->promptJsonDecoder->decode($text);
  if (!is_array($json)) {
    throw new AiAutomatorResponseErrorException('The response was not a valid JSON response. The response was: ' . $text->getText());
  }
  return $this->promptJsonDecoder->decode($text);
}

Which should probably be fixed by:

$json = $this->promptJsonDecoder->decode($text);
if (!is_array($json)) {
  throw new AiAutomatorResponseErrorException('Invalid JSON: ' . $text->getText());
}
return $json;

There also are repeated class_exists() checks:

$addressFieldOverrideClass = '\CommerceGuys\Addressing\AddressFormat\FieldOverride';
if (!class_exists($addressFieldOverrideClass)) {
  return [];
}

Again as pointed by @MrDaleSmith, these should be in a single method, e.g. addressClassesArePresent(), that checks for all needed \CommerceGuys\Addressing\... classes at once.

Lastly, there is potential confusion with array_merge_recursive() in generate() - please check if it does what is expected: if LLM returns the same keys, array_merge_recursive() can cause unexpected nesting. For example, if $values has keys that already exist in $total, it may end up generating nested arrays instead of appending in a simple manner.

🐛 | AI | "Translate with AI" Button Not Appearing Due to Route Alteration Conflict with Other Modules

Comment 3 months ago →

🇬🇧United Kingdom seogow

I have tested the '3480861-translate-with-ai-retake' branch with the following setup:

Drupal 11.1.1
drupal/language_access:^2.0
drupal/ai:dev-3480861-translate-with-ai-retake as 1.0.x-dev (corresponds to commit 7fd2b16335c78caba729ade3454f04cd963460ab, above ai:1.0.2)
Modules enabled: content_translation, ai, ai_provider_openai, ai_translate, key, language_access

Result:

No Watchdog issues.
Language access prevents access to language versions user has no access to.
AI translations column contains translation link at '/node/{nid}/translations'.
A translation is made when link is clicked.

@arwillame can you provide description how to replicate the error you observed?

Comment 3 months ago →

🇬🇧United Kingdom seogow

greggles → credited seogow → .

🐛 | AI | "Translate with AI" Button Not Appearing Due to Route Alteration Conflict with Other Modules

Comment 3 months ago →

🇬🇧United Kingdom seogow

seogow → made their first commit to this issue’s fork.

🐛 | AI | Assistant rewrites links sometimes

Comment 3 months ago →

🇬🇧United Kingdom seogow

LLM will inevitably hallucinate links sometimes. Prompt engineering can get you reliable copy/paste result (on some models, including Gemini/Gemma, Claude 3.5 Sonnet and OpenAI 4o and later), but reliable rewrite is almost impossible to guarantee.

However, because we know the path constraints on Drupal and correct source URLs (in case of the search as a source), we can let LLM just choose as you suggested. That would work without hallucinations.

✨ | AI | allow configuration of prompts of AI in ckeditor

Comment 3 months ago →

🇬🇧United Kingdom seogow

Don't you need to add prompts into ai_ckeditor.schema.yml?

🐛 | AI | Installation/configuration resilience (ai_translate)

Comment 3 months ago →

🇬🇧United Kingdom seogow

The changes make the code more resilient to misconfiguration by:

Properly handling the case when no model is configured.
Properly handling the case when the default model string is empty.
Only attempting to create the translation link when we have a valid model name.

This aligns with the issue's "Proposed resolution" of making the module more resilient to misconfiguration.

✨ | AI | Restore permission for accessing AI Content Suggestion features

Comment 3 months ago →

🇬🇧United Kingdom seogow

Tested - the missing permision was restored.

📌 | AI Evaluations | Enhance Message History Usability and Accessibility

Comment 4 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

📌 | AI Evaluations | Improve Evaluations Listing with Summaries

Comment 4 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

📌 | AI Evaluations | Enhance Evaluation Entity View with Processed Results and Summaries

Comment 4 months ago →

🇬🇧United Kingdom seogow

📌 | AI Evaluations | Enhance Evaluation Entity View with Processed Results and Summaries

Comment 4 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

📌 | AI | Remove skeleton modules from core

Comment 4 months ago →

🇬🇧United Kingdom seogow

I have added an uninstall code, which makes sure the modules won't be enabled anymore.

📌 | AI | Remove skeleton modules from core

Comment 4 months ago →

🇬🇧United Kingdom seogow

Because there is no clear/safe way how to uninstall missing module, the removal should be done in 2.x, with clear statement at module's page, that the module must be updated to the latest 1.x version before upgrade into 2.x.

📌 | AI | Remove skeleton modules from core

Comment 4 months ago →

🇬🇧United Kingdom seogow

seogow → made their first commit to this issue’s fork.

📌 | AI Evaluations | Enhance Evaluation Entity View with Processed Results and Summaries

Comment 4 months ago →

🇬🇧United Kingdom seogow

I have added suggested initial updates to the module.

Now an LLM process the user/agent communication and provides a summary of what was the chat about, what is the result and what could be done better:

📌 | AI Evaluations | Enhance Evaluation Entity View with Processed Results and Summaries

Comment 4 months ago →

🇬🇧United Kingdom seogow

📌 | AI Evaluations | Enhance Evaluation Entity View with Processed Results and Summaries

Comment 4 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🌱 | AI | Add Messenger Support to Automators

Comment 4 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🌱 | AI | Implement Process System Integration for Automators

Comment 4 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🌱 | AI | Extend Logging Capabilities

Comment 4 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🌱 | AI | Add Symfony Messenger Integration Module

Comment 4 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🌱 | AI | Add Process UI Components to Core

Comment 4 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

✨ | AI | Normalize Structured responses

Comment 4 months ago →

🇬🇧United Kingdom seogow

When we are at it, this is worth including: https://github.com/cognesy/instructor-php

🌱 | AI | Add Process Abstraction Layer to Core AI Module

Comment 4 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🌱 | Ai Automator Extractor | Add token-based extraction, joiner configuration, and improved RegEx extraction

Comment 4 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🌱 | AI Evaluations | AI Evaluations to extend AI Logging functionality

Comment 4 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🐛 | AI Evaluations | Fix evaluations listing to show proper links to individual evaluations

Comment 5 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🐛 | Ai Automator Extractor | Typo in dependencies

Comment 5 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

📌 | AI | Implement Advanced Input Mode with Token Chunking for Text Automator

Comment 5 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

📌 | AI | Add Vector Database (VDB) Automator for Entity Reference Fields

Comment 5 months ago →

🇬🇧United Kingdom seogow

📌 | AI | Add Vector Database (VDB) Automator for Entity Reference Fields

Comment 5 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🐛 | AI | Length of dynamic field in Milvus is limited to 65536

Comment 5 months ago →

🇬🇧United Kingdom seogow

🐛 | AI | Length of dynamic field in Milvus is limited to 65536

Comment 5 months ago →

🇬🇧United Kingdom seogow

🐛 | AI | Length of dynamic field in Milvus is limited to 65536

Comment 5 months ago →

🇬🇧United Kingdom seogow

Good catch, I shall update the JSON statement.

The infinite loop is prevented here:

// Nothing to do, if we do not have content field or it is empty.
if (!isset($data['content']) || (strlen($data['content']) == 0)) {
  throw new \Exception("Failed to record vector.");
}

If the calculation becames wrong for some reason (possibly a change in the API of Milvus), we shorten the content by 5% during each loop:

$difference = -max(1, (int) (mb_strlen($data['content'], 'UTF-8') * 0.05));

And when it reaches 0, exception is thrown.

📌 | AI | Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes

Comment 5 months ago →

🇬🇧United Kingdom seogow

🐛 | AI | Length of dynamic field in Milvus is limited to 65536

Comment 5 months ago →

🇬🇧United Kingdom seogow

📌 | AI | Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes

Comment 5 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🐛 | AI | Complex search queries are not supported

Comment 8 months ago →

🇬🇧United Kingdom seogow

marcus_johansson → credited seogow → .

🐛 | AI | Broken Byte-Pair Encoding (BPE)

Comment 8 months ago →

🇬🇧United Kingdom seogow

TikToken doesn't fail, our implementation of it does. The easiest temporary fix is to disable overlap (put it to 0) and use averaging strategy. We need to better extend TikToken where this is accounted for.

📌 | AI | Search API Index fields need more attributes

Comment 8 months ago →

🇬🇧United Kingdom seogow

📌 | AI | Search API Index fields need more attributes

Comment 8 months ago →

🇬🇧United Kingdom seogow

📌 | AI | Search API Index fields need more attributes

Comment 8 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

📌 | AI | Create LLM abstraction layer

Comment 8 months ago →

🇬🇧United Kingdom seogow

marcus_johansson → credited seogow → .

🐛 | Views AI Summary | Global fields need rendered View

Comment 8 months ago →

🇬🇧United Kingdom seogow

🐛 | Views AI Summary | Global fields need rendered View

Comment 8 months ago →

🇬🇧United Kingdom seogow

This has been fixed in Alpha 2.

✨ | Views AI Summary | Allow for disabling block for other than first page of View

Comment 8 months ago →

🇬🇧United Kingdom seogow

✨ | Views AI Summary | Allow for disabling block for other than first page of View

Comment 8 months ago →

🇬🇧United Kingdom seogow

This has been fixed in Alpha 2.

✨ | Views AI Summary | Allow for disabling block for other than first page of View

Comment 8 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🐛 | Views AI Summary | Global fields need rendered View

Comment 8 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🐛 | AI | Length of dynamic field in Milvus is limited to 65536

Comment 8 months ago →

🇬🇧United Kingdom seogow

🐛 | AI | Length of dynamic field in Milvus is limited to 65536

Comment 8 months ago →

🇬🇧United Kingdom seogow

🐛 | AI | Length of dynamic field in Milvus is limited to 65536

Comment 8 months ago →

🇬🇧United Kingdom seogow

🐛 | AI | Length of dynamic field in Milvus is limited to 65536

Comment 8 months ago →

🇬🇧United Kingdom seogow

🐛 | AI | Length of dynamic field in Milvus is limited to 65536

Comment 8 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🐛 | AI | Broken Byte-Pair Encoding (BPE)

Comment 8 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🐛 | AI | Zero 'Minimum chunk overlap' causes Typed property not initialized

Comment 8 months ago →

🇬🇧United Kingdom seogow

🐛 | AI | Zero 'Minimum chunk overlap' causes Typed property not initialized

Comment 8 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🐛 | Views AI Summary | Performance Issue: Slow Summarization of Long Fields

Comment 8 months ago →

🇬🇧United Kingdom seogow

🐛 | Views AI Summary | Performance Issue: Slow Summarization of Long Fields

Comment 8 months ago →

🇬🇧United Kingdom seogow

Added AJAX, which doesn't slow View loading and provided Drush command for summarizing the source fields in bulk.

🐛 | Views AI Summary | Performance Issue: Slow Summarization of Long Fields

Comment 8 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

✨ | AI | Bulk actions

Comment 9 months ago →

🇬🇧United Kingdom seogow

seogow → created an issue.

🐛 | AI | Flush index if Embedding Strategy is set

Comment 9 months ago →

🇬🇧United Kingdom seogow

The difference between VDB and Solr (when only Keyword search is allowed in Solr) is that Solr will still work the same, if you add/remove a field. The field will just be returned as NULL if not populated. on the other hand, the vector might not just have different dimensions number (hard fail), but also different handling of returned items (soft fail in postprocessing in strategy).

However, I see your point with being able to choose different strategy for the same setup (e.g. during development for testing strategies' outcome). For that case, the index can be emptied (deleted/recreated) in VDB backend, whilst the index configuration stay (the deleting/recreating happens during saving changed index).

TYhis is a destructive operation and we must issue a pop-up warning before allowing saving the changed index.

Adding/removing fields in that case doesn't cause any issues - they will be taken care of during new indexing.

🐛 | AI | Flush index if Embedding Strategy is set

Comment 9 months ago →

🇬🇧United Kingdom seogow

I believe index should not allow strategy to be changed when it was created. Index is a data structure defined by both strategy and embedding vectors. If something needs to change, a new index should be created.

✨ | AI | Create a configurable Views filter for pure embeddings bases search

Comment 10 months ago →

🇬🇧United Kingdom seogow

The initial dev version was pushed to 'ai_search' branch.

📌 | AI | Have different approaches to chunking, perhaps configurable, perhaps different chunking engines.

Comment 10 months ago →

🇬🇧United Kingdom seogow

The initial dev version was pushed to 'ai_search' branch.

🇬🇧United Kingdom @seogow

Merge Requests

!525Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes
Open
Show issue
AI

!507Extend Logging Capabilities
Open
Show issue
AI

!283Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes
Closed
Show issue
AI

!504Extend Logging Capabilities
Closed
Show issue
AI

!471Extend Logging Capabilities
Closed
Show issue
AI

!421Add Vector Database (VDB) Automator for Entity Reference Fields
Merged
Show issue
AI

Recent comments

I went through all the tickets and have done/learned the following

Tickets mentioned in summary

Other open bug reports

Suggested way ahead

Hence

🇬🇧United Kingdom @seogow

Merge Requests

!525Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding ProcessesOpen Show issue AI

!507Extend Logging CapabilitiesOpen Show issue AI

!283Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding ProcessesClosed Show issue AI

!504Extend Logging CapabilitiesClosed Show issue AI

!471Extend Logging CapabilitiesClosed Show issue AI

!421Add Vector Database (VDB) Automator for Entity Reference FieldsMerged Show issue AI

Recent comments

I went through all the tickets and have done/learned the following

Tickets mentioned in summary

Other open bug reports

Suggested way ahead

Hence

!525Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes
Open
Show issue
AI

!507Extend Logging Capabilities
Open
Show issue
AI

!283Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes
Closed
Show issue
AI

!504Extend Logging Capabilities
Closed
Show issue
AI

!471Extend Logging Capabilities
Closed
Show issue
AI

!421Add Vector Database (VDB) Automator for Entity Reference Fields
Merged
Show issue
AI