Document doesn't have a `search_api_document_id` attribute after upgrading 1.0.0 to 1.1.0-rc1

Created on 2 October 2023, about 1 year ago
Updated 26 November 2023, 12 months ago

Problem/Motivation

After upgrading an existing 1.0.0 production instance to 1.1.0-rc1, I get the following error:

Document doesn't have a `search_api_document_id` attribute: `@"search_api_id":"entity:node/100:de","search_api_datasource":"entity:node","search_api_language":"de","id":"entity_node_100_de","node_uri":"/theurl","media_attachment_description":"the description","media_attachment_extract":"5000 chars of extracted PDF text ...","media_body":"

engelgwand mim um godds wujn gaudi wea nia ausgรคhd kummt nia hoam um godds wujn jo mei fingahaggln zua ham mongdratzal watschnbaam hoid oa schbozal woibbadinga ghupft wia gsprunga xaver sauba ned woar landla des milli dahoam biaschlegl spotzerl hendl semmlkneedl da oans zwoa gsuffa ramasuri wo hi am achtn tag schuf gott des bia eana in da naa woaรŸ schoo de musi guad heimatland hallelujah sog i luja gfreit mi gidarn und sei ham guad a so a schmarn af servas wea ko dea ko noch da giasinga heiwog hi mi!
\r\n","media_category_name":["Category 1 (e.g. some, things, here)","Cat 2"],"media_role_name":"group1","media_title":"node title"`.

if I restore src/Plugin/search_api/backend/SearchApiMeiliSearchBackend.php to commit a87ff3fd7f0201a68caa97d91b65fd5c43ef0e64, it works for this site at least (and this one is an important one). This was actually just a try as it had worked with older versions before, nothing really substantiated.

On another site, I get a very similar error (with the reverted file above):

Document doesn't have a `id` attribute: `@"search_api_id":"entity:node/342:de","search_api_datasource":"entity:node","search_api_language":"de","search_api_document_id":"entity_node_342_de","node_uri":"/material/mymaterial","field_age_group":["22","23"],"media_body":"some random \"text\"","media_category_name":["this & that","cat(egory) two"],"media_organisation":"my organisation","media_priority":50,"media_role_name":["kids: young & older","other people"],"media_title":"my title"`.

but only for two documents, so this seems to be something content-specific.

On both sites, I also see
Drupal\search_api\SearchApiException while adding Views handlers for field on index mydomain Node Index: Could not retrieve data definition for field '' on index 'mydomain Node Index'. in Drupal\search_api\Item\Field->getDataDefinition() (line 482 of /var/www/drupal/web/modules/contrib/search_api/src/Item/Field.php).
but this might be from before running the updb step, just noting it here for reference. I don't seem to see this anymore.

I'm recording this here mostly for documentation purposes, as I plan to move to 2.0.0 anyway and hope that it fixes it (though the content-specific nature of this makes me a little wary). Without actual reproducibility even on my dev setup, it's a pretty hard/dangerous task to upgrade the production environment without knowing what will break beforehand, however, so this might take a little while (and bravery!) still.
I'll try a 100% cloned production DB and see whether this makes things easier to check beforehand.

On the other hand, maybe you have a quick tip as to where I should look at so I can hotfix this easily.

The strange thing is that I cannot find the source for the error Document doesn't have a `<id-column>` attribute. Either I am searching at the wrong places (search_api and search_api_meilisearch sources) or the string is concatenated from so many partial strings that I am unable to find it.

Steps to reproduce

Hard to say.
I would have said take an existing 1.0.0 setup and upgrade it to 1.1.0-rc1, but even removing the index and reimporting the config on a 1.1.0-rc1 installation does not fix it. This seems to be something content-specific I need to get down to. That reverting the one source file however fixed it on one site smells like a bug.
I know all of this is much too little information for anyone to actual work on this, but bear with me. I hope to provide more concrete findings. Or maybe you have a shot in the dark feeling that might get me quicker to a solution.

Thanks for reading anyway!

๐Ÿ› Bug report
Status

Closed: outdated

Version

1.1

Component

Code

Created by

๐Ÿ‡ฆ๐Ÿ‡นAustria tgoeg

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @tgoeg
  • Hmm this is very weird, it looks as if you were updating from this ๐Ÿ› Remove the possibility to add field with machine name id Fixed dev version, but this functionality was reworked/removed before 1.1.0-rc1. That issue is the only instance of search_api_document_id that I know of.

  • ๐Ÿ‡ฆ๐Ÿ‡นAustria tgoeg

    Did a full DB and site files clone to a dev instance.

    Restored to 1.1.0-rc1.
    The second site works flawlessly.

    The first one breaks again with the error in the title Document doesn't have a `search_api_document_id` attribute.

    If I index with the older SearchApiMeiliSearchBackend.php and then switch to the current version, at least I have a version that somehow works for both sites. Indexing new (special?) nodes however will not work in this setup I guess.

    Something's off.

    The error
    Drupal\search_api\SearchApiException while adding Views handlers for field on index mydomain Node Index: Could not retrieve data definition for field '' on index 'mydomain Node Index'. in Drupal\search_api\Item\Field->getDataDefinition() (line 482 of /var/www/drupal/web/modules/contrib/search_api/src/Item/Field.php).
    is also back again.

    I'm pretty convinced at this point that is a bug indeed and not a configuration/environment problem on my end.

    Finally found it!

    Due to some previous version, I had a primary key search_api_document_id in the first page's index.
    Dropped all items from the index and changed the primary key with
    curl -X PATCH 'http://localhost:7700/indexes/myindex' -H 'Content-Type: application/json' --data-binary '{ "primaryKey": "id" }' -H "Authorization: Bearer mypass" directly in Meilisearch.

    I am unsure whether this warrants some error handling to be included somewhere around createField() in src/Plugin/search_api/backend/SearchApiMeiliSearchBackend.php
    This seems to be the cause for the successive error stating field '', i.e. a field that never got created.

    400   protected function getSpecialFields(IndexInterface $index, ItemInterface $item = NULL): array {
    401     $fields = parent::getSpecialFields($index, $item);
    402     $fields['id'] = $this->getFieldsHelper()->createField($index, 'id', [
    403       'type' => 'string',
    404       'original type' => 'string',
    405     ]);
    406
    407     if ($item) {
    408       $fields['id']->setValues([MeilisearchUtils::formatAsDocumentId($item->getId())]);
    409     }
    410
    411     return $fields;
    412   }

    createField() comes from Search API's src/Utility/FieldsHelper.php, which has no error handling either. It in turn calls new Field() which I was unable to track down further.

    Why exactly I got the error only for two nodes is still a mystery to me.

    Unsure what to do with this ticket now. This will most likely never hit anyone.
    Please decide whether some of the errors presented should get some better handling and/or tests.

  • ๐Ÿ‡ฆ๐Ÿ‡นAustria tgoeg

    Our posts overlapped. Yes, your diagnosis is absolutely correct!

  • I don't think that this issue needs extra handling since it happened between dev versions of the module, those are usually not recommended for production use. I'll leave the issue open so maintainers can decide what should be done. However, one thing I do think should probably be changed, is to add an explicit definition of what field should be considered as ID on meilisearch index creation, so it doesn't have to choose the field automatically.

  • ๐Ÿ‡ฆ๐Ÿ‡นAustria tgoeg

    Didn't I fix this in ๐Ÿ› Primary key inference fails for Meilisearch >=1.x and fields ending in "id" (includes patch) Fixed already? This is part of the current 1.1.0 release (and I guess in 2.0 as well, did not have the time to check, yet) at least. It's hard coded to id.

    Directly handling the problem at hand does not really make sense, yes.
    I just think the error handling is not ideal, this might be improved.

    I could imagine situations (that was part of the discussion in some other issue as well) where an existing index might get attached to a setup running this module. This could pretty easily hit the error described here, if the primary key differs. And I think it should get caught cleanly.

  • Ahh yeah missed that, I was looking at doing this at index creation so in the public function addIndex(IndexInterface $index) method adding primaryKey in the options is also a way to set the primary key.

    Yeah, the error handling could be improved. It would be much easier if meilisearch had any way to check what was set as ID. This way a simple fix would be to use whatever was preset by the index as ID, have to look more in to this.

  • ๐Ÿ‡ฆ๐Ÿ‡นAustria tgoeg

    It does have a way!
    If you request the index' details with
    xidel -s -e '$json' -H 'Content-Type: application/json' -H "Authorization: Bearer mypass" 'http://127.0.0.1:7700/indexes/mydomain'
    (might as well use curl, I use xidel because it understands and formats JSON and XML, can use xquery/xpath, etc.)
    you get some fields back, one of them being the primary key:

    {
      "uid": "mydomain",
      "createdAt": "2023-07-12T13:48:59.640002336Z",
      "updatedAt": "2023-08-24T12:03:38.907236139Z",
      "primaryKey": "id"
    }

    I am however not sure whether meilisearch_php exposes that, but gathering from https://php-sdk.meilisearch.com/classes/Meilisearch-Endpoints-Indexes.html it seems so.

  • So looking at how search api works, this becomes a bit more complex. We'd have to create a custom datasource, where the admin could then map a custom ID. Now to be honest we'll have to eventually add datasource as a new feature, especially if we want to support data created by some service outside Drupal.

  • ๐Ÿ‡ฆ๐Ÿ‡นAustria tgoeg

    I'd say we should restrict the scope here to whether error handling makes sense, regardless of how/where errors originate, to potentially give a better clue for admins/devs if they occur.

    Having differing data sources sounds like a useful addition that will have its application but should probably be created as a separate feature.

  • The problem comes down where to do this error handling. Looking at that specific error ... Could not retrieve data definition for field '' on index ..., this happenes on View handler, when it trys to map a field from meilisearch with Drupal storage data and not in any code we wrote. The weird part is where does it get that filed with empty key, I would have understood if error happened on 'id' or 'search_api_document_id'

    We switched back to using id so existing indexes still work correctly. The only times this could happen is either when using development branch and some code changes between versions, if someone modifies the index by hand (cURL, xidel, etc.) or some 3rd party service. In this last case that datasource feature would help admins map fields correctly.

  • So the Could not retrieve data definition error is potentially caused by the Search API module itself ๐Ÿ› PHP 8.1 Deprecated Function Needs work .

  • Status changed to Closed: outdated 12 months ago
  • ๐Ÿ‡ธ๐Ÿ‡ฎSlovenia bcizej

    @admirlju Do we still need to do something here? This issue as you mentioned would only occur if reindexing happened while on the dev branch in between the issues we were fixing regarding the document id and @tgoeg probably did the reindexing between the issues. If I remember correctly we added a BC which had to be reverted.

    I'm closing this issue for now, feel free to reopen if I missed something.

Production build 0.71.5 2024