Remove entity id mapping config

Issue created by @bcizej
Comment almost 2 years ago →
🇸🇮Slovenia bcizej
After spending a few hours writing this comment describing various possible solutions each with pros and cons I came up with a solution that I think fits the best with minimal cons.

The criteria for finding/using the best solution was:
- Can be mapped both ways without storage (obviously)
- Handles any datasource search api item id formats
- Human readability or should I say easy recognition of the item in Drupal from the meilisearch document id

The solutions I thought of were a variations of character replacements to fit the meilisearch primary key character restrictions, base64 encoding/decoding or using entity uuids neither of which satisfied all of the above.

So the solution I'm proposing is...

Generate uuid for every new document, add a uuid attribute to the document and set that as the document primary key in meilisearch index.
Then set a second attribute which allows any characters to store the search api item id as it is and set the attribute as distinct.

Pros:
- Uuid contains acceptable characters for primary key
- Uuid is unique by itself and the second attribute can be forced as distinct in meilisearch
- Requires no mapping logic
- Second attribute has the same value as search api item id making it readable/recognizable

Cons:
- Additional attribute (not that much of a problem)
- The index must insure that the attribute is displayable otherwise no mapping will occur on search as the attribute would be excluded from results. By default all fields are displayable so should not be a problem.
- The additional attribute should not be searchable I guess but not a big issue
- Would need a unique name and we would probably need some mechanism so that users can't choose this name for any fields they add to the index via UI or code

Now every solutions would require a reindex of every document either manually or via update hook. There are not that many sites reporting usage of the module but I do know @tgoeg - who has been very helpful with testing, reporting issues etc., thanks :) - uses it in production, so I'd like to know if reindexing everything would present an issue.

I'll do some code fiddling and see how this works but if anyone has another or better solution feel free to contribute.
Comment almost 2 years ago →
admirlju
Why not just use hyphens as "escape" characters? So looking at Drupal side, the ID is made from <machine name>:<machine name>/<numeric id>:<language code>.

From my understanding machine names can't contain hyphens. The only problem I see is with language codes since some of those can (for example Chinese Simplified zh-hans), that said we could use double hyphens.

So let's say we use something like this as encoding
--fs-- == /
--co-- == :

entity--co--node--fs--2--co--en
entity--co--node--fs--2--co--zh-hans

Since you mentioned using uuid as the primary key on meilisearch, I assume the character limit will not be a problem. While the readability is reduced, it's still easy to quickly figure out what is indexed and its id. When we need to convert one or the other way it's a simple str_replace()
Comment almost 2 years ago →
🇸🇮Slovenia bcizej
After playing around with the code I hit a wall with updating and deleting items which was the main concern as the ids no longer match and we would have to get all the documents first from meilisearch and match the additional attribute with the search api item id basically trading id map storage for extra api calls.

So I tried a similar approach with how we currently convert search api item ids but expand on it with extra field and account for other invalid characters.

In the ItemConverter class we have the following code to strip invalid characters

$itemId = str_replace([':', '/'], '_', $item->getId()); $convertedItem = ['id' => $itemId];
And similar in the `SearchApiMeiliSearchBackend::deleteItems()` method

$ids = []; foreach ($item_ids as $id) { $itemId = str_replace([':', '/'], '_', $id); $ids[] = $itemId; } $this->meiliService->deleteDocuments($index->id(), $ids);
Replace str_replace([':', '/'], '_', $item->getId()) with preg_replace('/[^A-Za-z0-9_-]/', '_', $item->getId()) so that any invalid character in search api item id is replaced with "_" which for the `ContentEntity` datasource nothing changes so the document ids stay the same as before. And for any other datasources we also get a valid format for document primary key.

Add a second "search_api_item_id" attribute to the document with original search api item id and make it distinct.

Then for search results we can use this code to create search api item id from the meilisearch document. I think the search results is the only place where we need "document id -> search api item id" mapping.

$item = $this->getFieldsHelper()->createItem($index, $row['search_api_item_id']); $results->addResultItem($item);
And then we remove the code and schema for meilisearch_id_mappings configuration.
Comment almost 2 years ago →
🇸🇮Slovenia bcizej
Why not just use hyphens as "escape" characters? So looking at Drupal side, the ID is made from :/:.

I had considered something like this, the problem here is that this would work only for ContentEntity datasource plugin, while there could be potentially other datasources that do not follow this format.
Assigned to bcizej
Comment almost 2 years ago →
🇸🇮Slovenia bcizej
Comment almost 2 years ago →
🇸🇮Slovenia bcizej
Updated the issue with proposed solution and plan.
Open in Jenkins → Open on Drupal.org →
Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
last update almost 2 years ago
101 pass
@bcizej opened merge request.
Issue was unassigned.
Status changed to Needs review almost 2 years ago8:59pm 1 September 2023
Comment almost 2 years ago →
🇸🇮Slovenia bcizej
Implemented the solution and tests, although I did not just add the notice to update hook but schedule all meilisearch indexes for a full reindex.
Open in Jenkins → Open on Drupal.org →
Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
last update almost 2 years ago
103 pass
Comment almost 2 years ago →
🇸🇮Slovenia bcizej
Added a few tests if the fields in index use reserved names for mapping and saw it would break searching so added a fix as well. 🐛 Remove the possibility to add field with machine name id Fixed will prevent naming fields as reserved words.
Open in Jenkins → Open on Drupal.org →
Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
last update almost 2 years ago
103 pass
Comment almost 2 years ago →
🇸🇮Slovenia bcizej
Search api already defines special search_api_id field as item id as well as search_api_datasource and search_api_language so I changed the code a bit to reuse those fields and also index them as they are present by default on all indexes and can be also be put in views as displayed fields, filterable, sortable etc.
Open in Jenkins → Open on Drupal.org →
Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
last update almost 2 years ago
103 pass
Open in Jenkins → Open on Drupal.org →
Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
last update almost 2 years ago
103 pass
Comment almost 2 years ago →
admirlju
The changes worked fine, but there was a small problem in search_api_meilisearch_update_9002, if there was an index that did not have a server set, it would attempt to get the backend on NULL (line 88) $backend = $index->getServerInstance()->getBackend();. Pushed a quick fix, that will ignore indexes without a server.
Status changed to RTBC almost 2 years ago2:01pm 4 September 2023
Comment almost 2 years ago →
🇸🇮Slovenia deaom
Can confirm the changes are working, tests are also passing, so marking this as RTBC.
Comment almost 2 years ago →
🇦🇹Austria tgoeg
@tgoeg - who has been very helpful with testing, reporting issues etc., thanks :)

Thank you for the compliments, much appreciated :-)

I try to be of help where I can as my programming skills are pretty, well, non-existant in PHP. And I really like Meilisearch because of its simplicity from a sysadmin's PoV (and not being Java!), so I'd like Drupal to build upon this potential.

We have a few sites using Meilisearch in production, yes.
But re-indexing all of them is no big deal. It takes around 10mins on the biggest site, which is perfectly OK for our normal maintenance windows for Drupal updates.

(I'd even trade in more inconvenience to get rid of that cumbersome configuration files! It makes rolling out dev config to production a pretty dangerous step if you don't exclude these files. And export production config before. And then merge the new dev config with the old entity id mapping of prod. This is content-dependent configuration which always makes my life hard.)

I'll see whether I can test this.
I have another problem with search_api_meilisearch together with file_extractor at the moment (with Drupal 10.1) however, which kinda stopped me in my tracks when I tried to upgrade our infrastructure to more recent versions of search_api_meilisearch and Drupal core.

I'll open a separate issue for this - if anyone could be kind enough to take a look at that I am sure I can provide real world tests for most of the recently closed issues!
Open in Jenkins → Open on Drupal.org →
Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
last update almost 2 years ago
103 pass
Comment almost 2 years ago →
🇸🇮Slovenia bcizej
Comment almost 2 years ago →
🇸🇮Slovenia bcizej
(I'd even trade in more inconvenience to get rid of that cumbersome configuration files! It makes rolling out dev config to production a pretty dangerous step if you don't exclude these files. And export production config before. And then merge the new dev config with the old entity id mapping of prod. This is content-dependent configuration which always makes my life hard.)

Ouch, I forgot about this completely. That is definitely a big problem on deployments. If you have any other hassles like this, open an issue! While we are trying to get the module into a stable production ready state we are limited in thinking of issues that potentially exist. We are aware of issues in the code or functionality, but stuff like fiddling with configs across deployments didn't cross my mind as I am running the module on local environment only with a clean drupal installation.
Comment almost 2 years ago →
System Message

bcizej → committed ebef53ac on 1.0.x
Issue #3384347 by bcizej, admirlju, DeaOm, tgoeg: Remove entity id...
Status changed to Fixed almost 2 years ago8:30pm 5 September 2023
Comment almost 2 years ago →
🇸🇮Slovenia bcizej
Rebased and merged, thanks everyone.
Comment almost 2 years ago →
🇦🇹Austria tgoeg
That is definitely a big problem on deployments. If you have any other hassles like this, open an issue!

I sometimes have a bad feeling for sending in so many issues, but as you say, it's only for the better of the product as a whole and points out aspects one likely doesn't see in dev-only environments. So yes, I will definitely report anything coming up sooner or later, I'm just buried in other stuff right now so I am somewhat slow at catching up.

We use config_split for a dev/prod Meilisearch instance, and this works out mostly as it should. There might be some issues in this direction once I get around to fully testing the setup based on D10.1 and 1.0.x-dev.
Comment almost 2 years ago →
🇦🇹Austria tgoeg
I add this here and in 📌 New release Fixed as this is important.
An upgrade to code implementing this fix here means not only do the indexes have to be rebuilt, they have to be deleted and created anew (easiest by just drush cim after deleting the index as all views etc. also get lost when deleting the index).

I got all sorts of strange connection timeout issues and errors in the log that stated I have a duplicate column "id" (which, in turn, is caused by 🐛 Remove the possibility to add field with machine name id Fixed , but it is correct like this!)

This should be part of the upgrade instructions/release notes on the front page.
Comment almost 2 years ago →
🇸🇮Slovenia bcizej
Hmm thanks for reporting this, this should be a new issue, we should handle this somehow via update hooks, the users should not be required to delete indexes.
Comment almost 2 years ago →
System Message
Automatically closed - issue fixed for 2 weeks with no activity.

Remove entity id mapping config

Problem/Motivation

Remaining tasks

Proposed resolution

User interface changes

API changes

Data model changes

Comments & Activities