🇦🇺
Account created on 10 April 2015, about 9 years ago
#

Merge Requests

More

Recent comments

achap 🇦🇺

Still not sure why it only failed in ci only and not locally but changed the test to just need the exception name rather than the whole object since the serialization error was only happening in the test state recording. Tests passing now.

achap 🇦🇺

Setting back to needs review as I found a small edge case after the type hinting changes when the OS mappings are not set.

achap 🇦🇺

Have updated the code based on feedback. Looks like that error was unrelated to my code but I was able to trace it to the search_api_item table not being installed in the SpellCheckTest so I added there.

achap 🇦🇺

I took a stab at this with MR#52

It retrieves the mappings from the OpenSearch server and compares them with the local Drupal mappings after event subscribers have fired.

I don't think comparing the original index fields to the new ones will work because the modified mappings after event subscribers have fired aren't stored anywhere and so it's not really possible to compare.

The other thing to be aware of is the dynamic mappings in OpenSearch. If you don't explicitly provide a type it will guess based off the data that is indexed. That means that every time you click save even if nothing has changed it will clear the index (because there are no local mappings). This is mostly a problem if you create fields in processors etc and the fix is just to use the IndexParamsEvent to explicitly give your custom fields the correct types. I have found that if you incorrectly give a Drupal field that is a float for example a string type in Search API then OS will override it with the float type in the mappings once data has been indexed. That will also cause clearing rather than re-indexing. So you need to be careful to assign the correct type.

Did not tackle settings (analysers etc), only field mappings and doesn't negate the need for a blue/green deployment but should definitely help reduce the amount of times you need to switch indexes.

achap 🇦🇺

Same as #13 when using the patch at #15 which is using getUserInput the blockFormmethod is called twice on load. The first time the defaultConfiguration is correct. The second time all the defaultConfiguration values are null. When I switch getUserInput to getValues like #6 it is only called once and the defaultConfiguration is correct.

achap 🇦🇺

This is possible using rank_feature queries in opensearch and it is implemented in search_api_opensearch. See https://git.drupalcode.org/project/search_api_opensearch/-/commit/fa23b3e5f6d24d39d126ab0ad40b974997e16ae4

The search_api_boost value is created for each document and you can use a rank_feature query on that field to boost results.

achap 🇦🇺

I saw this as well.

Steps to reproduce:

1. Enable PHP8.2 on your environment.
2. Switch to a branch running core 10.2 (where enforceDefaultTranslation is added to ContentEntityBase).
3. Clear cache (no warnings).
4. Switch to a branch running core 10.1 (no enforceDefaultTranslation).
5. Clear cache (warnings).

But don't think this is an issue with this module.

achap 🇦🇺

I've converted those annotations to php attributes. Unfortunately when I've merged in the latest code from 11.x there is a test failure in testRowSkippedEvent however the error message is obscured by the annoying "Exception: Serialization of 'Closure' is not allowed" message which doesn't actually show what's wrong. When I run MigrateEventsTest.php locally I don't get any errors so this one is hard to debug. Apparently this is quite a common issue faced in Migrate tests (see related issue).

I do have a stack trace:

    /builds/issue/drupal-2756269/core/lib/Drupal/Core/Cache/MemoryBackend.php:119
    /builds/issue/drupal-2756269/core/lib/Drupal/Core/Cache/CacheCollector.php:275
    /builds/issue/drupal-2756269/core/lib/Drupal/Core/State/State.php:102
    /builds/issue/drupal-2756269/core/modules/migrate/tests/src/Kernel/MigrateEventsTest.php:481
    /builds/issue/drupal-2756269/vendor/symfony/event-dispatcher/EventDispatcher.php:206
    /builds/issue/drupal-2756269/vendor/symfony/event-dispatcher/EventDispatcher.php:56
    /builds/issue/drupal-2756269/core/modules/migrate/src/MigrateExecutable.php:247
    /builds/issue/drupal-2756269/core/modules/migrate/tests/src/Kernel/MigrateEventsTest.php:354

That shows that it is trying to serialize the cached data (which would contain an exception). My guess would be that the exception data has a closure in there somewhere but like I said very hard to debug if I can't reproduce it locally.

Hopefully someone can help pinpoint the issue!

achap 🇦🇺

Just to remove any ambiguity I updated the title to reflect that this is an issue with the 8302 update hook not 8301. I think that was entered in error.

From looking at the committed code in that other issue I don't see how it can be related or fix this issue as the modified code only affects the password_policy_history table while this primarily affects user__field_pending_expire_sent. Unless the conclusion is that the error occurs due to db vendors not following Ansi SQL as suggested in #16? My website that was affected by this is hosted with Acquia if that helps?

achap 🇦🇺

Just ran into another error code from our remote service (429) so thought I would attempt to make this a little bit more flexible rather than just logging specific error codes. Unfortunately I couldn't find much documentation on OpenSearch error responses so it's a little hard to test.

This error response from OpenSearch:

Array
(
    [_index] => index_a
    [_id] => entity:node/152691:en-AU
    [status] => 429
    [error] => Array
        (
            [type] => status_exception
            [reason] => Error from remote service: {
    "error": {
        "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.",
        "type": "insufficient_quota",
        "param": null,
        "code": "insufficient_quota"
    }
}
        )
)

Ends up looking like this in the logs:

 Error from remote service: { "error": { "message": "Incorrect API key provided: foobar. You can find your API key at https://platform.openai.com/account/api-keys.", "type": "invalid_request_error", "param": null, "code": "invalid_api_key" } } for id: entity:node/189871:en-AU. Status code: 401

I updated the log message to include the status code and also changed "for index" to "for id" since it is logging the id of the item not the index it's in.

I'm getting some composer errors in my pipeline but I think it's unrelated to this change:

$ composer install $COMPOSER_EXTRA
No composer.lock file present. Updating dependencies to latest instead of installing from lock file. See https://getcomposer.org/install for more information.
Loading composer repositories with package information
Updating dependencies
Your requirements could not be resolved to an installable set of packages.
  Problem 1
    - drupal/core 10.2.x-dev conflicts with drush/drush <12.4.3.
    - drupal/core-recommended 10.2.x-dev requires drupal/core 10.2.x-dev -> satisfiable by drupal/core[10.2.x-dev].
    - Root composer.json requires drupal/core-recommended 10.2.x-dev -> satisfiable by drupal/core-recommended[10.2.x-dev].
    - Root composer.json requires drush/drush ^11.0 -> satisfiable by drush/drush[11.0.0-rc1, ..., 11.x-dev].
achap 🇦🇺

@jweowu this occurred for us on 9.5. I have run into this issue before though and when it occurred it was when I was installing field config programmatically in update hooks, which from glancing at the update hook is what is occurring inside 8302:

  if (!isset($sandbox['progress'])) {
    /** @var \Drupal\Core\Config\ConfigInstallerInterface $configInstaller */
    $configInstaller = \Drupal::service('config.installer');
    $configInstaller->installDefaultConfig('module', 'password_policy');

    $sandbox['progress'] = 0;
    $sandbox['max'] = $db_connection->query('SELECT COUNT(uid) FROM {users}')->fetchField();
    $sandbox['current_user'] = -1;
  }

So I think you're right that this might be a core issue... but I can't help with much more than that unfortunately.

However, what has fixed it for me in the past is just resaving the field storage definition without changing it. That makes the error go away:

  $update_manager = \Drupal::entityDefinitionUpdateManager();
  $field_storage_definition = $update_manager->getFieldStorageDefinition(
    'field_pending_expire_sent',
    'user'
  );
  $update_manager->updateFieldStorageDefinition($field_storage_definition);
achap 🇦🇺

Have added a change record and made some adjustments based on the comments and also made the getRow method consistent with regards to type hinting/annotations and the rest of the code. Will wait and see regarding the annotations issue.

achap 🇦🇺

Just ran into this as well with queues and static cache with state api. Workaround for those who need it is to use the keyvalue factory service directly which is what the state api is built on top of and doesn't use a static cache. Agree with #5 that it would be good to have a way of busting the cache for a particular key using state.

achap 🇦🇺

6 years later and a need for this came up again.

I think I have addressed most of the issues from #19. I added additional tests for everywhere the migration fails and ensure an event is dispatched. In addition, I passed the original exception to the event since in a few cases MigrateExecutable catches the generic Exception event, which means it will catch all exceptions giving classes further up the stack no chance to react to the event. In my case, a source plugin can throw a SuspendQueueException but this was being caught by MigrateExecutable and not the queue runners. So I needed to re-throw it in my event subscriber.

Removed the additional ideas from the issue summary as they are outside the scope of the core issue. I guess they can be opened in new issues if they're still wanted.

achap 🇦🇺

Thanks for the MR. Checking if fields have changed does seem like a good idea but I think a clear rather than a re-index will be necessary as alluded to in previous comments. From the Opensearch docs:

If you want to create or add mappings and fields to an index, you can use the put mapping API operation. For an existing mapping, this operation updates the mapping.

You can’t use this operation to update mappings that already map to existing data in the index. You must first create a new index with your desired mappings, and then use the reindex API operation to map all the documents from your old index to the new index. If you don’t want any downtime while you re-index your indexes, you can use aliases.

See docs: https://opensearch.org/docs/latest/api-reference/index-apis/put-mapping/

So if we could detect if a field doesn't yet exist then we could use the re-index flag, and if it does already exist then we would need to clear. However that could get complicated if one field is added and another changed etc.

So just checking if any fields exist in new index vs original and only clearing then already seems like an improvement. However I think the clear would need to come before the call to updateSettings and updateFieldMapping

achap 🇦🇺

This also only occurred for me once I installed webprofiler. hook_event_dispatcher was working fine before then. Given that this is the case for others as well maybe this issue should be closed in favour of the related one in tracer?

achap 🇦🇺

Also, I'm not entirely sure if the test fails are due to my new code because the tests were already failing on the upstream branch

achap 🇦🇺

Based on my understanding that hook is called after all the batches have been queued for processing, so it's too late for my use case. I want to prevent them from being included in the queue in the first place as in my case removing them gets rid of about 20,000 nodes that need to be processed.

Although dispatching the event in EntityUpdateManager is probably not necessary as in that case the event would already have been fired if we are coming from a batch process, or we are just saving an individual node. So in the latter case I could just use the hook.

achap 🇦🇺

I think the module will still have issues with subdomain languages as it just removes the www during comparison but this should fix the language prefix issue.

achap 🇦🇺

Just want to clarify my comments from number 7. It should work if you pass the --uri parameter when running via drush. Cron I'm less certain on but I imagine you can set the correct value somewhere.

achap 🇦🇺

Discovered another issue with the derivative approach. It creates a mapping table for each derivative migration, so I think that also won't work as I will have thousands of new tables...

achap 🇦🇺

So I quickly realized that this wasn't going to work/be very difficult and the way I was going about it was sending me down a rabbit hole. Stepping back... I realized that I can achieve the same result by using the deriver key in my migration and then passing the product ids I need to generate migration plugins for using the MemoryCache service. That way, each product id has a migration plugin created for it which can be added to a queue. The queue can then process migrations concurrently!

achap 🇦🇺

While this approach allows migrations to run concurrently the status becomes useless because the value is stored in key value storage with the id of the migration as the key. So even if you instantiated a migration for each individual item the status would just keep being overridden and not be very helpful. I wonder if there is a way to have a uuid for each instance returned by createInstance?

In the Migration plugin class these two methods use the plugin id:

  /**
   * {@inheritdoc}
   */
  public function setStatus($status) {
    \Drupal::keyValue('migrate_status')->set($this->id(), $status);
  }

  /**
   * {@inheritdoc}
   */
  public function getStatus() {
    return \Drupal::keyValue('migrate_status')->get($this->id(), static::STATUS_IDLE);
  }
achap 🇦🇺

Just wanted to say I had the exact same issue as tallytarik. For the most part everything being an array did not affect anything apart from when implementing a knn_vector field. It throws the same error for a multi value field. I used the IndexParamsEvent to alter the field to be single value and it works.

achap 🇦🇺

Our workaround for the above was to re-architect our migration using the Queue API to process 1 item at a time, and give our queue worker a cron lease time of 1 hour (same as cron run interval). This way the buffer is emptied once per hour at least and it doesn't overwhelm the purge queue. Hope that helps someone.

achap 🇦🇺

Just want to chime in and say this issue has affected me too when running migrations. We often run migrations that can take a few hours or more. I stepped through the code and think I discovered what's happening. In Drupal\purge\Plugin\Purge\Queue\QueueService::add invalidation tags are not added to the queue straight away but rather to an internal buffer. Then at the end of the request (For example a long running cron job or drush script) in Drupal\purge\Plugin\Purge\Queue\QueueService::destruct it looks like the items from the buffer are finally committed to the queue.

The problem is, during the whole time the migration is running none of the invalidation tags that are generated by the migration can be processed by any of the purge processors. They are all dumped at once at the end of the migration which usually results in being over the 100k limit.

Not sure what the fix is but that seems to be the root cause of the issue at least for us.

achap 🇦🇺

Sorry for not replying, got a bit side tracked :D I've been using this patch in production without issues for a while now. In terms of which one should be default I guess something to consider is index size and performance. Don't have any hard data to back this up but I guess tokenizing every character is a lot more expensive than every word. So maybe because of that and also preserving backwards compatibility it makes sense to keep filter as the default and add the tokenizer as a new plugin?

achap 🇦🇺

Adds some basic validation if the value is a Point. Not sure this handles all cases but does fix the use case I came across.

achap 🇦🇺

It's possible that items were removed even if the queue stayed the same size or new items were added. It would probably be better to compare the original items with the new items, and if any that are in old items are not in new items, dispatch a removed event with those items, and any that are in new items that are not in old items dispatch an added event with those items.

achap 🇦🇺

I see what you mean looking at the code now. I guess the same would be true for the pushImport feed handler as it uses queue api and would have the same lease times. Does that mean I should remove the changes above for cronImport and pushImport or should it be left as is?

I'm only using the import (via drush) and startBatchImport methods in my project so haven't tested the other methods.

achap 🇦🇺

I'm back on the project I had this issue on again (yearly festival site) and have solved my own problem using the Feeds parse event to filter out missing resources. See: https://www.drupal.org/docs/contributed-modules/feeds/feeds-howtos/altering-feed-data

Regarding #26 happy to re-open the issue but I still think my solution isn't the best one because it's either on or off. So it would solve your issue but it would stop entity references that get imported after the main feed from ever getting referenced. So I think we can do better. With that said, it is opt-in...

Would be interested to know how the Migrate API approaches this.

achap 🇦🇺

I have previously played around with those settings on the edge n-gram field before using a custom tokenizer and it didn't appear to do anything but I haven't had a chance to try it out yet for search_as_you_type. I imagine it's caused by the same issue, i.e. that search_as_you_type is probably using the standard tokenizer which splits tokens up on word boundaries rather than individual characters.

This SO question appears to solve it in the same way for the search_as_you_type implementation (implementing an edge n-gram tokenizer) https://stackoverflow.com/questions/59677406/how-do-i-get-elasticsearch-to-highlight-a-partial-word-from-a-search-as-you-type

From the https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html#analysis-tokenizers it says that a tokenizer is among other things responsible for:

  • Order or position of each term (used for phrase and word proximity queries)
  • Start and end character offsets of the original word which the term represents (used for highlighting search snippets).

If I analyze a title field that is using the custom edge ngram tokenizer I get the following token information for the sentence "This is a title":

{
  "tokens" : [
    {
      "token" : "t",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "th",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "thi",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "this",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "i",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "is",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "a",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "t",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "ti",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "tit",
      "start_offset" : 10,
      "end_offset" : 13,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "titl",
      "start_offset" : 10,
      "end_offset" : 14,
      "type" : "word",
      "position" : 10
    },
    {
      "token" : "title",
      "start_offset" : 10,
      "end_offset" : 15,
      "type" : "word",
      "position" : 11
    }
  ]
}

If I analyze a search_as_you_type field I get the following information:

{
  "tokens" : [
    {
      "token" : "this",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "is",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "a",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "title",
      "start_offset" : 10,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 3
    }
  ]
}

So if the offset information is used for highlighting that explains why only the edge_ngram_tokenizer is working as expected.

achap 🇦🇺

Switching from filter to tokenizer is working for me with Edge N-gram filters. I guess the two plugins can co-exist?

achap 🇦🇺

Thanks for putting that together. From what I'm seeing it actually has the same issue as the original edge n-gram implementation, i.e. it's highlighting the entire word rather than the n-grams themselves. Not sure why that is based on the docs https://opensearch.org/docs/latest/search-plugins/searching-data/highlight/

achap 🇦🇺

One workaround to this is to create your own plugin similar to the Session one that returns NULL in the getLancode method if it's an admin path and then falls back to either the URL plugin or the Content Language plugin. This at least makes nodes editable in other languages. But the Session plugin should handle it on its own.

achap 🇦🇺

For the comments in #40, #42 and #67 I agree that the original idea of the issue has been forgotten about so I created a new issue to deal specifically with the fact that you can't edit a node if only session negotiation is enabled. https://www.drupal.org/project/drupal/issues/3349579 🐛 Using only Session language negotiation method makes it impossible to edit content other than the current session language Active

achap 🇦🇺

Thanks! I just tried to pull the latest 2.x but the code from the MR issue isn't in there

achap 🇦🇺

Not sure if you agree with the testing approach or not?

I initially tried to mock the dependencies of BackendClient for a unit test but there are a lot and ran into issues with AnalyserManager being a final class and mocking that not being allowed, and generally having too much mocking of dependencies. Eventually, I just created a test subscriber to assert that the event was dispatched correctly.

achap 🇦🇺

I think I have replicated the issue you were talking about with unrelated changes in the diff. It seems to be only when you add .patch to the end. Gitlab looks to be using a cached diff that no longer applies, though the diff itself (without .patch) is fine.

achap 🇦🇺

Hmm that's weird. Do you mind giving an example? When I look at the diff I only see changes related to this issue.

https://git.drupalcode.org/project/search_api_opensearch/-/merge_requests/10/diffs

achap 🇦🇺

Rather than having an event specific to analyzers, create a generic event allowing developers to edit any settings before they are sent to OS. Feel like this is a better solution than the previous one although possibly not as a good as making the plugins configurable, though I think that would be considerably more work.

achap 🇦🇺

Actually. It seems that by the time $item->getBoost() is called with both processors enabled the scores get combined. So this already works correctly for both use cases.

achap 🇦🇺

A question I have is that there is also a processor for "Number field-based boosting" per item. So maybe it would be good to have separate fields in OS. Right now in OS there is just one search_api_boost. I guess you could combine the values from Drupal into one boost field or have separate OS fields.

achap 🇦🇺

We also need this. Locally it's working great for me. Merged in latest changes, fixed the failing tests and also fallback to decimal type. Anything else need to be done?

achap 🇦🇺

achap made their first commit to this issue’s fork.

Production build 0.69.0 2024