- Issue created by @bcizej
- 🇸🇮Slovenia bcizej
After spending a few hours writing this comment describing various possible solutions each with pros and cons I came up with a solution that I think fits the best with minimal cons.
The criteria for finding/using the best solution was:
- Can be mapped both ways without storage (obviously)
- Handles any datasource search api item id formats
- Human readability or should I say easy recognition of the item in Drupal from the meilisearch document idThe solutions I thought of were a variations of character replacements to fit the meilisearch primary key character restrictions, base64 encoding/decoding or using entity uuids neither of which satisfied all of the above.
So the solution I'm proposing is...
Generate uuid for every new document, add a uuid attribute to the document and set that as the document primary key in meilisearch index.
Then set a second attribute which allows any characters to store the search api item id as it is and set the attribute as distinct.Pros:
- Uuid contains acceptable characters for primary key
- Uuid is unique by itself and the second attribute can be forced as distinct in meilisearch
- Requires no mapping logic
- Second attribute has the same value as search api item id making it readable/recognizableCons:
- Additional attribute (not that much of a problem)
- The index must insure that the attribute is displayable otherwise no mapping will occur on search as the attribute would be excluded from results. By default all fields are displayable so should not be a problem.
- The additional attribute should not be searchable I guess but not a big issue
- Would need a unique name and we would probably need some mechanism so that users can't choose this name for any fields they add to the index via UI or codeNow every solutions would require a reindex of every document either manually or via update hook. There are not that many sites reporting usage of the module but I do know @tgoeg - who has been very helpful with testing, reporting issues etc., thanks :) - uses it in production, so I'd like to know if reindexing everything would present an issue.
I'll do some code fiddling and see how this works but if anyone has another or better solution feel free to contribute.
Why not just use hyphens as "escape" characters? So looking at Drupal side, the ID is made from
<machine name>:<machine name>/<numeric id>:<language code>
.From my understanding machine names can't contain hyphens. The only problem I see is with language codes since some of those can (for example Chinese Simplified zh-hans), that said we could use double hyphens.
So let's say we use something like this as encoding
--fs-- == /
--co-- == :
entity--co--node--fs--2--co--en
entity--co--node--fs--2--co--zh-hans
Since you mentioned using uuid as the primary key on meilisearch, I assume the character limit will not be a problem. While the readability is reduced, it's still easy to quickly figure out what is indexed and its id. When we need to convert one or the other way it's a simple
str_replace()
- 🇸🇮Slovenia bcizej
After playing around with the code I hit a wall with updating and deleting items which was the main concern as the ids no longer match and we would have to get all the documents first from meilisearch and match the additional attribute with the search api item id basically trading id map storage for extra api calls.
So I tried a similar approach with how we currently convert search api item ids but expand on it with extra field and account for other invalid characters.
In the
ItemConverter
class we have the following code to strip invalid characters$itemId = str_replace([':', '/'], '_', $item->getId()); $convertedItem = ['id' => $itemId];
And similar in the `SearchApiMeiliSearchBackend::deleteItems()` method
$ids = []; foreach ($item_ids as $id) { $itemId = str_replace([':', '/'], '_', $id); $ids[] = $itemId; } $this->meiliService->deleteDocuments($index->id(), $ids);
Replace
str_replace([':', '/'], '_', $item->getId())
withpreg_replace('/[^A-Za-z0-9_-]/', '_', $item->getId())
so that any invalid character in search api item id is replaced with "_" which for the `ContentEntity` datasource nothing changes so the document ids stay the same as before. And for any other datasources we also get a valid format for document primary key.Add a second "search_api_item_id" attribute to the document with original search api item id and make it distinct.
Then for search results we can use this code to create search api item id from the meilisearch document. I think the search results is the only place where we need "document id -> search api item id" mapping.
$item = $this->getFieldsHelper()->createItem($index, $row['search_api_item_id']); $results->addResultItem($item);
And then we remove the code and schema for
meilisearch_id_mappings
configuration. - 🇸🇮Slovenia bcizej
Why not just use hyphens as "escape" characters? So looking at Drupal side, the ID is made from :/:.
I had considered something like this, the problem here is that this would work only for
ContentEntity
datasource plugin, while there could be potentially other datasources that do not follow this format. - Assigned to bcizej
- last update
over 1 year ago 101 pass - @bcizej opened merge request.
- Issue was unassigned.
- Status changed to Needs review
over 1 year ago 8:59pm 1 September 2023 - 🇸🇮Slovenia bcizej
Implemented the solution and tests, although I did not just add the notice to update hook but schedule all meilisearch indexes for a full reindex.
- last update
over 1 year ago 103 pass - 🇸🇮Slovenia bcizej
Added a few tests if the fields in index use reserved names for mapping and saw it would break searching so added a fix as well. 🐛 Remove the possibility to add field with machine name id Fixed will prevent naming fields as reserved words.
- last update
over 1 year ago 103 pass - 🇸🇮Slovenia bcizej
Search api already defines special
search_api_id
field as item id as well assearch_api_datasource
andsearch_api_language
so I changed the code a bit to reuse those fields and also index them as they are present by default on all indexes and can be also be put in views as displayed fields, filterable, sortable etc. - last update
over 1 year ago 103 pass - last update
over 1 year ago 103 pass The changes worked fine, but there was a small problem in
search_api_meilisearch_update_9002
, if there was an index that did not have a server set, it would attempt to get the backend on NULL (line 88)$backend = $index->getServerInstance()->getBackend();
. Pushed a quick fix, that will ignore indexes without a server.- Status changed to RTBC
over 1 year ago 2:01pm 4 September 2023 - 🇸🇮Slovenia deaom
Can confirm the changes are working, tests are also passing, so marking this as RTBC.
- 🇦🇹Austria tgoeg
@tgoeg - who has been very helpful with testing, reporting issues etc., thanks :)
Thank you for the compliments, much appreciated :-)
I try to be of help where I can as my programming skills are pretty, well, non-existant in PHP. And I really like Meilisearch because of its simplicity from a sysadmin's PoV (and not being Java!), so I'd like Drupal to build upon this potential.
We have a few sites using Meilisearch in production, yes.
But re-indexing all of them is no big deal. It takes around 10mins on the biggest site, which is perfectly OK for our normal maintenance windows for Drupal updates.(I'd even trade in more inconvenience to get rid of that cumbersome configuration files! It makes rolling out dev config to production a pretty dangerous step if you don't exclude these files. And export production config before. And then merge the new dev config with the old entity id mapping of prod. This is content-dependent configuration which always makes my life hard.)
I'll see whether I can test this.
I have another problem with search_api_meilisearch together with file_extractor at the moment (with Drupal 10.1) however, which kinda stopped me in my tracks when I tried to upgrade our infrastructure to more recent versions of search_api_meilisearch and Drupal core.I'll open a separate issue for this - if anyone could be kind enough to take a look at that I am sure I can provide real world tests for most of the recently closed issues!
- last update
over 1 year ago 103 pass - 🇸🇮Slovenia bcizej
(I'd even trade in more inconvenience to get rid of that cumbersome configuration files! It makes rolling out dev config to production a pretty dangerous step if you don't exclude these files. And export production config before. And then merge the new dev config with the old entity id mapping of prod. This is content-dependent configuration which always makes my life hard.)
Ouch, I forgot about this completely. That is definitely a big problem on deployments. If you have any other hassles like this, open an issue! While we are trying to get the module into a stable production ready state we are limited in thinking of issues that potentially exist. We are aware of issues in the code or functionality, but stuff like fiddling with configs across deployments didn't cross my mind as I am running the module on local environment only with a clean drupal installation.
- Status changed to Fixed
over 1 year ago 8:30pm 5 September 2023 - 🇦🇹Austria tgoeg
That is definitely a big problem on deployments. If you have any other hassles like this, open an issue!
I sometimes have a bad feeling for sending in so many issues, but as you say, it's only for the better of the product as a whole and points out aspects one likely doesn't see in dev-only environments. So yes, I will definitely report anything coming up sooner or later, I'm just buried in other stuff right now so I am somewhat slow at catching up.
We use
config_split
for a dev/prod Meilisearch instance, and this works out mostly as it should. There might be some issues in this direction once I get around to fully testing the setup based on D10.1 and 1.0.x-dev. - 🇦🇹Austria tgoeg
I add this here and in 📌 New release Fixed as this is important.
An upgrade to code implementing this fix here means not only do the indexes have to be rebuilt, they have to be deleted and created anew (easiest by justdrush cim
after deleting the index as all views etc. also get lost when deleting the index).I got all sorts of strange connection timeout issues and errors in the log that stated I have a duplicate column "id" (which, in turn, is caused by 🐛 Remove the possibility to add field with machine name id Fixed , but it is correct like this!)
This should be part of the upgrade instructions/release notes on the front page.
- 🇸🇮Slovenia bcizej
Hmm thanks for reporting this, this should be a new issue, we should handle this somehow via update hooks, the users should not be required to delete indexes.
Automatically closed - issue fixed for 2 weeks with no activity.