Account created on 22 July 2015, about 10 years ago
#

Merge Requests

More

Recent comments

🇬🇧United Kingdom scott_euser

Which VDB Provider are to using? It's more the support of the filterable attributes to VDB provider filtering (which varies wildly) that I'd suspect limitations get hit for...

🇬🇧United Kingdom scott_euser

Will take care of this in Allow to do vector search by user supplied query Active first to make it easier to handle + satisfy other historical requests from other devs

🇬🇧United Kingdom scott_euser

Hmmmm actually I can see the use, though slightly tweaked (updated issue summary and title). It would be much easier to do a similarity search and avoid an extra LLM call to generate embedding if we could look-up by ID, e.g. for here Make an option to get embedding from the already indexed content to avoid going to LLM Active . In that case in my module there I would retrieve the embedding via the vdb provider and then use that to query the vector database.

🇬🇧United Kingdom scott_euser
  1. For your front-end error, I'm guessing its this 🐛 Fatal error on render on front-end resulting in no related content Active
  2. Adding in a no LLM call option here Make an option to get embedding from the already indexed content to avoid going to LLM Active

If there are other things missing do let me know, otherwise will wait and see what you push, or wait for you to reach out in Drupal Slack to chat.

Thanks!

🇬🇧United Kingdom scott_euser

scott_euser created an issue.

🇬🇧United Kingdom scott_euser

scott_euser created an issue.

🇬🇧United Kingdom scott_euser
🇬🇧United Kingdom scott_euser

scott_euser created an issue.

🇬🇧United Kingdom scott_euser

Of course I'm only seeing half the puzzle at the moment but maybe before you take it out of dev to a full release I'll be able to see and we can discuss better....

🇬🇧United Kingdom scott_euser

Hmmm trying to think back; I can imagine this came in to not stop the status quo of chunking adding a tiny bit of context (originally there was no control of contextual content + no way to index filterable attributes. Hard to follow git history as so much reworking happened to get it there. I expect the decision at that time was that title should be the minimum contextual content in a chunk (but its too long ago to remember/be sure)

Moving forward I don't think we should move to a place where title is not there as contextual content by default as we know however clear we make instructions, people will inevitably skip reading them. So whatever approach we take here should be an 'opt out' for those who specifically want to not have the title in contextual content.

And yes we should fix the duplicate title so it appears only once in what actually gets indexed. What actually gets indexed hopefully accurately matches what the Chunker Preview outputs but I cannot check rate now - ie, AiSearchIndexFieldsForm::buildCheckerChunkTable()

From our past discussions and searching, we had come to the conclusion that LLMs do correctly understand markdown, hence the '# $title' (conversion to h1 is only for the sake of the preview and only if you have the optional php league package installed, LLM is sent the markdown - again unless I remember wrong).

In terms of the test, I believe its testing against the chunker preview which again need to follow the code a bit more, but I don't think actually uses the indexed content, just checks if the item is in the index (could be wrong). Looking at AiSearchIndexFieldsForm::getCheckerEmbeddings() I believe it is generating fresh embedding based on current field indexing settings.

🇬🇧United Kingdom scott_euser

Thanks! Apologies that was in there in the first place, the module doesn't provide a drush command so removed it altogether

🇬🇧United Kingdom scott_euser

scott_euser made their first commit to this issue’s fork.

🇬🇧United Kingdom scott_euser

Personally would much prefer to keep to the issue summary here as we have spent a lot of time having a back and forth to agree approach and get test coverage in. The goal here is to break up the options so it's not an all or nothing but otherwise leave the status quo to keep the scope limited and in a place that is realistically mergeable.

Changing the UX and what the options are do sound like you're raising seperate issues that are related to the user controls in general. I'm not a maintainer here of course so would suggest waiting until they chime in :)

🇬🇧United Kingdom scott_euser

Do you want to consider making not using embeddings an option in there? Any chance you're on drupal slack to chat? Just feels a shame to not try to team up and make something more powerful overall

🇬🇧United Kingdom scott_euser

Thanks for the work on this! Does look like a related test failure, worth checking what the browser output is there before and after

🇬🇧United Kingdom scott_euser

No problem thanks for the update!

🇬🇧United Kingdom scott_euser

scott_euser created an issue.

🇬🇧United Kingdom scott_euser

For adding filterable attributes however assuming you are using AI Search, the Search API Indexing Fields configuration page is where to go to add new attributes to you meta data (and limited support via Views to filter via those)

🇬🇧United Kingdom scott_euser

We haven't attempted to normalise filters in VBD providers (their support is too inconsistent and not close to what views filters offers) so this is more likely a question for the milvus/zilliz module (although may also need a change here to allow milvus/zilliz to use it via whatever the chatbot code is). So probably this issue needs to be postponed on milvus/zilliz issue first.

Note that methods exist in the vdb providers to accept queries but you'd need to follow milvus/zilliz documentation rather than AI module documentation (unless things have changed there that I've missed)

🇬🇧United Kingdom scott_euser

Oh yeah that's not what I mean - definitely the functionality here is important and provides quite a bit more value; more thinking of the UX changes only.

Had to dig a bit but this was an earlier release (not in stable version yet of AI module): https://drupal.slack.com/archives/C0803LX4536/p1751900254090099?thread_t... which is this mp4 video of using a button below an entity reference field

🇬🇧United Kingdom scott_euser

Thanks again! merged in

🇬🇧United Kingdom scott_euser
  1. Updated issue support to reflect the making generic is addressed in #86.
  2. Merged 11.x into the branch locally to resolve conflicts, think I got it right, but let's see test coverage. Test warnings for php8.5 are from 11.x as they also occur on the daily pipeline
🇬🇧United Kingdom scott_euser

There are a few more issues raised by @kim.pepper around some refactoring that are breaking changes which require coordinated releases with the VDB provider modules (most of which we maintain) - those have suggested we target AI 2.0.x - so perhaps that is the point we aim for stable. So those I think should go into the 'must' area to deal with them at once before we tie ourselves to stable and then struggle to get them in later due to BC

🇬🇧United Kingdom scott_euser

I think we are probably pretty close; there are some minor issues in the issue queue but all bugfixes; in terms of features we are probably rather complete as this is of course essentially a bridge between TMGMT and AI module. The patch to TMGMT core listed on the module homepage shouldn't be a blocker as its an upstream problem and the consequence without it is higher token usage but at least it continues to work.

🇬🇧United Kingdom scott_euser

Thank you! Makes sense to make it more protective like that given the unpredictable nature of the AI suggestions. Thanks, getting merged.

🇬🇧United Kingdom scott_euser

For reference for someone coming across this, this is the status quo UX of this module:

🇬🇧United Kingdom scott_euser

I think before spending too much time on a direction would like to check in on the latest for Field Widget Actions in the AI module. Hoping we can leverage that somehow - its merge into the 1.2.x branch there.

I also wonder if while leveraging that, we'll see any design patterns there e.g. modal window or something with suggestions which could avoid a full form submission. With a lot of AI Initiative efforts happening there, aligning the UX as much as possible while keeping the suggestion levels and searching existing content from this module would great.

It may be hard however to avoid a page refresh unless we e.g. require the field to be Tagify like Drupal CMS uses (or at least remove support for autocomplete) otherwise e.g. an entity reference field that results in 3 references added yet only 1 field initially available doesn't work of course.

🇬🇧United Kingdom scott_euser

FYI there are other test failures now from https://www.drupal.org/project/ai/issues/3525303#comment-16266934 Create Plugin Action for Recipes to check for installed default provider Active it seems, so worth noting when working on this to expect that if fork gets updated with latest 1.2.x

🇬🇧United Kingdom scott_euser

I think this is adding some new test failures on phpunit next minor, e.g. https://git.drupalcode.org/project/ai/-/jobs/6550213

Error 0.723s testVdbServerSetup
Pass 0.728s testVdbServerNotSetup
Failure 10.775s *** Process execution output ***
PHPUnit 11.5.39 by Sebastian Bergmann and contributors.

Runtime: PHP 8.3.25
Configuration: /builds/project/ai/web/core/phpunit.xml.dist

DDDDDDDDDED 11 / 11 (100%)

Time: 00:10.553, Memory: 6.00 MB

Verify Setup Ai (Drupal\Tests\ai\Kernel\Plugin\ConfigAction\VerifySetupAi)
⚠ No provider setup
⚠ Provider setup
⚠ Provider does not exist
⚠ Provider exists
✔ Operation type has default model
✔ Operation type does not have default model
✔ Invalid value type
✔ Invalid array type
✔ Empty settings
✘ Vdb server setup

├ InvalidArgumentException: The VDB provider 'echo_db' does not exist, so this recipe will not work.

│ /builds/project/ai/src/Plugin/ConfigAction/VerifySetupAi.php:113
│ /builds/project/ai/tests/src/Kernel/Plugin/ConfigAction/VerifySetupAiTest.php:178

✔ Vdb server not setup

Could re-open this or someone could make it as a separate issue.

🇬🇧United Kingdom scott_euser

Thanks! I gave this a whirl:

  1. Checked UI and details shown
  2. Compared with VDB providers like Pinecone which have additional info added; Ping gets duplicated, but I can fix that in Pinecone as it makes sense in base here
  3. Removed a collection in Milvus and the auto-recreate works well
🇬🇧United Kingdom scott_euser

scott_euser made their first commit to this issue’s fork.

🇬🇧United Kingdom scott_euser

Thank you both!

🇬🇧United Kingdom scott_euser

Thanks for flagging! Can probably start with (1) given its a nice-to-have type feature provided by this module to keep it simple. Thanks!

🇬🇧United Kingdom scott_euser

That sounds like a good plan to have a setup method trigger the cache dir creation. I think probably can use

protected function setUp(): void {
  parent::setUp();
  ...

and in KernelTestBase that this class extends, Core already has an initFileCache() where they do a similar thing it seems, so we can have ours call initTiktokenCache to match

Its likely once we have something we will need to merge and monitor if the issue recurs as it'll be hard to know for sure if it solves, but can always revert if not.

🇬🇧United Kingdom scott_euser

Thanks for your persistence with it! Will close this one for now and continue in that thread as that one is the issue I saw failures for e.g. after committing the above into 1.2.x

🇬🇧United Kingdom scott_euser

Its definitely opinionated unfortunately, so it needs to be configurable.

🇬🇧United Kingdom scott_euser

scott_euser changed the visibility of the branch 3519927-citations-are-not to hidden.

🇬🇧United Kingdom scott_euser

Okay got some more automated test coverage in as well to ensure with & without grouping works okay. Will leave this here for a little bit as needs review in case anyone has any concerns

🇬🇧United Kingdom scott_euser

Thanks @dieterholvoet for the start on this!

We have been working on a giant publication with hundreds of footnotes and we have also realised that when across multiple fields, not only is the grouping not shared for identical numbers, the matching of same text appearing in different fields does not group those still.

Working on some refactoring to share some of the FootnotesFilter code with FootnotesGroup - moved to a separate branch for now.

Seems to be working for now, but will be adding test coverage to this as its a problem that has come back too often.

🇬🇧United Kingdom scott_euser

scott_euser made their first commit to this issue’s fork.

🇬🇧United Kingdom scott_euser

Refactored it, this I think is the most easy to read, but has the downside of repeating the title & description. Could move those into variables I suppose, but maybe this is keeping it simpler?

🇬🇧United Kingdom scott_euser

Even after merge, 1.2.x pipeline failed, then on retry passed. Wonder if its something to do with the order pipeline jobs get picked up or parallel test running. Open to help/suggestions!

🇬🇧United Kingdom scott_euser

Aha so its taking the Drupal Core version here, yet if the module is updated independently of core, its potentially an issue. Okay will merge, thanks!

🇬🇧United Kingdom scott_euser

Is this still an issue actually?

I tried to reproduce on 2.0.x:

  1. Added `$config['user.role.anonymous']['permissions'][9999] = 'access devel information';` to settings.local.php (confirmed with a die that it kicks in)
  2. Checked in UI to see if devel is checked and its not (ie, override is not affecting the form)
  3. Checked in original code here to add a dvm and can see that `access devel information` is not added to the array
    
        foreach ($roles as $role_name => $role) {
          if (in_array(self::ALL_OPTIONS, $filter['roles']) || in_array($role_name, $filter['roles'])) {
            // Retrieve role names for columns.
            $role_names[$role_name] = $role->label();
            // Fetch permissions for the roles.
            $role_permissions[$role_name] = $role->getPermissions();
            $admin_roles[$role_name] = $role->isAdmin();
          }
        }
    dvm($role_permissions);
    
  4. I also tried saving after anyways in case its an issue in submit handler only.
  5. After save, config export does not result in any changes, ie, devel not added to anonymous

I think I need a little bit more help understanding how to reproduce or perhaps this is just outdated.

🇬🇧United Kingdom scott_euser

Seems like it could be unrelated, as there is a random test failure from TextChunkerText, which on retry passes:

---- Drupal\Tests\ai\Kernel\Utility\TextChunkerTest ----
Status Duration Info
--------------------------------------------------------------------------------------------------------
Error 1.883s testTokenCount with data set #0
Pass 1.477s testTokenCount with data set #1
Pass 0.574s testTokenCount with data set #2
Pass 0.528s testTokenCount with data set #3
Pass 0.569s testTokenCount with data set #4
Pass 0.495s testTokenCount with data set #5
Pass 0.572s testTokenCount with data set #6
Pass 0.490s testTokenCount with data set #7
Pass 0.547s testTokenCount with data set #8
Pass 0.481s testTokenCount with data set #9
Pass 0.556s testTokenCount with data set #10
Pass 0.554s testTokenCount with data set #11
Log 11.252s *** Process execution output ***
PHPUnit 11.5.39 by Sebastian Bergmann and contributors.

Runtime: PHP 8.3.25
Configuration: /builds/project/ai/web/core/phpunit.xml.dist

E........... 12 / 12 (100%)

Time: 00:11.024, Memory: 8.00 MB

Text Chunker (Drupal\Tests\ai\Kernel\Utility\TextChunker)
✘ Token count with 0

├ Yethee\Tiktoken\Exception\ParseError: Could not decode token "IHdpbm5p=" at line 11231

│ /builds/project/ai/vendor/yethee/tiktoken/src/Vocab/Vocab.php:93
│ /builds/project/ai/vendor/yethee/tiktoken/src/Vocab/Loader/DefaultVocabLoader.php:89
│ /builds/project/ai/vendor/yethee/tiktoken/src/EncoderProvider.php:177
│ /builds/project/ai/vendor/yethee/tiktoken/src/EncoderProvider.php:138
│ /builds/project/ai/src/Utility/Tokenizer.php:51
│ /builds/project/ai/src/Utility/TextChunker.php:25
│ /builds/project/ai/tests/src/Kernel/Utility/TextChunkerTest.php:52

✔ Token count with 1
✔ Token count with 2
✔ Token count with 3
✔ Token count with 4
✔ Token count with 5
✔ Token count with 6
✔ Token count with 7
✔ Token count with 8
✔ Token count with 9
✔ Token count with 10
✔ Token count with 11

For now this separates out the indexing so failure to reach 100% indexing in setup becomes visible, as otherwise the pipeline does not show what the failure is. Will merge it since it passes, but I suspect its not the end of it if we are still getting random test failures from tiktoken like the above.

🇬🇧United Kingdom scott_euser

scott_euser changed the visibility of the branch 3544206-ai-search-next-minor-attempt2 to hidden.

🇬🇧United Kingdom scott_euser

scott_euser changed the visibility of the branch 3544206-ai-search-tests to hidden.

🇬🇧United Kingdom scott_euser

Thank you!

🇬🇧United Kingdom scott_euser

Thank you!

🇬🇧United Kingdom scott_euser

Hmmm strange, tests sometimes pass sometimes fail as can be seen here https://git.drupalcode.org/project/ai/-/merge_requests/867/pipelines

Won't revert as this commit at least fixes part of the issue.

Needs further digging unfortunately

🇬🇧United Kingdom scott_euser

Added a nitpick + a question, thanks!

🇬🇧United Kingdom scott_euser

Great thank you Andrii!

🇬🇧United Kingdom scott_euser

Thanks for looking into this and reviewing!

So what would be the proposal with getSetupData? That Gemini and other providers should suggest a default Tokenizer from Tiktoken PHP's list? https://github.com/yethee/tiktoken-php/blob/master/src/EncoderProvider.php

Personally I don't know enough about the other providers to know if there are noticeable differences in the counting from gpt3.5 token counting.

🇬🇧United Kingdom scott_euser

scott_euser changed the visibility of the branch 3491446-solr-boost-of to hidden.

🇬🇧United Kingdom scott_euser

scott_euser created an issue.

🇬🇧United Kingdom scott_euser

Probably needs update hook as well to set default empty value, but will wait for feedback as to whether this direction is acceptable first or not

🇬🇧United Kingdom scott_euser

Couldn't see a good option in Core for word splitting, opted to borrow the one used in the Search API Database backend and therefore suggesting to move it into Utility class as a result, but feel free to disagree of course

🇬🇧United Kingdom scott_euser

scott_euser created an issue.

🇬🇧United Kingdom scott_euser

That sounds very sensible yeah - and at least to me makes sense to have a very minimal page, e.g. like cloudflare managed challenge is.

Would you imagine a plugin system where we contribute plugins to the various captcha modules?

(And of course also contribute to them sending signals as you mentioned)

🇬🇧United Kingdom scott_euser

No reply for 5 months with request for more info, so closing. Feel free to reopen, thanks!

🇬🇧United Kingdom scott_euser

No reply for a year with request for more info, so closing. Feel free to reopen, thanks!

🇬🇧United Kingdom scott_euser

Hi! If you could please make as a merge request so test coverage runs, thanks!

🇬🇧United Kingdom scott_euser

Hi! If you could please make as a merge request so test coverage runs, thanks!

🇬🇧United Kingdom scott_euser

Hi! Thanks for this module, read about it in this blog https://www.drupal.org/blog/beyond-patching-drupal-association-and-crowd...

Would the captcha instead of block be something for the road map here as well? It is mentioned on the module homepage as a potential future feature.

Thanks!

🇬🇧United Kingdom scott_euser

Thanks for flagging! Should work now on 10x as well (have only been using it on 11x myself sorry)

🇬🇧United Kingdom scott_euser

scott_euser made their first commit to this issue’s fork.

🇬🇧United Kingdom scott_euser

Hi! As an FYI, the 11.3.x issue for AI Search is being addressed here 🐛 AI Search tests is failing on next minor Active

🇬🇧United Kingdom scott_euser

Thanks very much for the continued persistence with this. Its a really complicated chunk of work, hence the challenge as well to properly find big chunks of time to review in my free time, but at least for now holiday season over for me, back into school term time!

Indexing progress bar

Okay I had a bit of a dig into this to see what it would take to get the progress bar to show overall progress as I still find it so confusing as a user when I run indexing like this. I wanted to understand if its something we can easily do later in a follow-up, but I think its not easy to do later unfortunately. I think this is what would need to happen:

  1. The AI Search Tracker should do the chunking rather than chunking happening on demand. I think this would be fine as the chunking itself is not a heavy load as its no API calls.
  2. The tracker can then select the chunks that are not yet processed (we already have ai_search_chunks.status in the database table), and use it for all tracker methods which includes counts of number of items to be processed

A single batch rather than sub-batches can be used + the indexing status overview (index_progress variable as added in template_preprocess_search_api_index) simply then needs to clarify what is the count. E.g. something like:

Before:

65 of 980 indexed

After:

65 of 980 chunks indexed (This is 12 of 210 items which have been broken into 980 chunks in the vector database)

Other notes

  1. We have more outstanding unresolved comments still some indicating breaking changes we need to work out how to deal with
  2. The complexity of things here, particularly the indexing/chunking would be hard to maintain without test coverage I think, but open to opinions
🇬🇧United Kingdom scott_euser

Thanks, that debugging was very helpful!

Not solved yet, but I can see as a result of `git bisect` that 📌 Deprecate $variables['page'] for node.html.twig Active is the issue. Namely the change there to the node.html.twig now checks if view mode is full, and if so, does not output the URL.

So we are expecting to render the 'full' view mode and therefore the URL should not be expected after that commit.

We can update the tests to check that the title of the found results are as expected rather than title and link. E.g. either [Chocolate Cake]( (to keep it working in 11.2) or # Chocolate Cake (i.e., without the URL for 11.3)

🇬🇧United Kingdom scott_euser

Re #8 whoops sorry retargeted branch to 11x

Re #9 do you mean what to do if your database doesn't support json? You need to make sure your database is one the supported ones at https://www.drupal.org/docs/getting-started/system-requirements/database...

Leaving as needs work per refactor note in #8

🇬🇧United Kingdom scott_euser

Hi! Can you try debugging in the Hooks folder at the ::handleFloodProtection() method? That's all the logic there that affects the View filters.

🇬🇧United Kingdom scott_euser

Thanks! I couldn't test of course at the moment, so added a couple comments that might just be questions for you to confirm behaviour will be as expected. Thanks!

🇬🇧United Kingdom scott_euser

Sorry not very around at the moment but maybe first step is enabled ai api Explorer to make sure you're getting results in the vector database test section there.

Not necessary to have the indexed fields match at all, essentially the boost:

1) programmatically runs the free text search against the vector database to fetch entity IDs only

2) the does something like this pseudo query into database:
"WHERE
(keywords LIKE" %normal database search%" OR item id IN (IDs retrieved from vector database)
AND (other filters & exposed filters
ORDER BY
Vector database retrieved item IDs DESC,
Relevance from database search DESC,
Any other orders you have applied DESC
"

You can e.g. in the boost plugin code add a dvm to check item IDs are found in (1) and that should be exactly what the AI API Explorer submodule shows to make sure that part is working okay

Hope that helps keep you going on debugging

🇬🇧United Kingdom scott_euser

Hi thanks for the contribution. I believe @seanb was keen to avoid adding many more plugins here to the core functionality and possible use viewsreference_extras instead, but will check since we have 'Hide header', this would match and maybe sensible.

However per 🐛 EntityUntranslatableFieldsConstraint can cause translation issues when content is created with an earlier version of viewsreference Needs work adding plugins may need update hook as well as its caused a comparison regression in multilingual it seems (though I have not looked very deeply into it yet).

🇬🇧United Kingdom scott_euser

scott_euser made their first commit to this issue’s fork.

🇬🇧United Kingdom scott_euser

Will set to Needs Work per your comment in #3 in any case, but my guess is update hook is probably the way to go instead

🇬🇧United Kingdom scott_euser

Hi! Thanks for the contribution. Looks like it causes test failures for when JS is disabled. Test coverage should also be extended to avoid a future regression on this. Thanks!

🇬🇧United Kingdom scott_euser

Thanks for the contribution! This sounds like it might be better as a update hook instead? Though I have not looked deeply at the cause.
For now I rebased latest into your MR since tests past on latest at the moment

🇬🇧United Kingdom scott_euser

Thanks! In the end I modified what you wrote to fit in the alternative/complementary approaches section of the module homepage: https://www.drupal.org/project/protect_views_flood_control#module-project--alternatives

Thank you

🇬🇧United Kingdom scott_euser

I raised 💬 Provide more help understanding what triggers "Non-translatable fields can only be changed when updating the current revision" Active in core (needs consideration on best approach) to try to find a way to provide more information for developers to be able to understand the cause of the problem in their specific case, please do consider providing suggestions there on how to make this particular symptom more friendly to debug. Thanks!

🇬🇧United Kingdom scott_euser

I think the problem here is there are many ways to get to the same symptom so many people get to this issue via Google or other means.

Layer builder asymmetric translations for example doesn't have this module as a dependency as far as I can see.

Ultimately we need to try to solve the problem described in the issue summary and if there are other problems ideally those get raised as seperate issues (potentially in other modules or core if unrelated to this) or it will be extremely difficult for this to ever get merged (and even better, ideally having failing tests to reproduce the issue).

🇬🇧United Kingdom scott_euser

For honeypot I wonder if it's accessibility issue having extra hidden form field if its not then hidden from screen readers (in order to not be hidden from AI bots)

🇬🇧United Kingdom scott_euser

That's an interesting suggestion with the honeypot!

In https://capellic.com/blog/ai-bot-abuse-our-evolving-strategy-taming-perf... @smustgrave just posted he goes into details on what worked for them.

🇬🇧United Kingdom scott_euser

Despite ignoring me plea in #55 the comments seem to indicate this is working well and given we have test coverage for it as well, hoping its okay to move to RTBC even though I did some of the work here

🇬🇧United Kingdom scott_euser

In any case it's a sort of fallback to things that get past cloudflare

Production build 0.71.5 2024