Mismatched language saved in solr

Created on 22 May 2023, over 1 year ago
Updated 7 February 2024, 10 months ago

Problem/Motivation

On a complex site with 6 languages and a lot of nodes sometimes on save the English document in Solr contains a different translation. After another save on the node the translation may change to another (not necessarily English). After a couple of saves it changes back to English and stays that way.
In solr the document looks like:
{
"id":"62z07a-default-entity:node/8:en",
....
"ss_search_api_language":"es"
}

I don't think it is a solr issue because the translations are already scrambled when generating the index items.
I even seen an Item object with language English that contains an EntityAdapter with a different language that holds an different translation than that of the item or the EntityAdapter.
The problem is that it generates omissions and duplicates in the results returned by solr because the content for the Spanish node will exist in 2 documents in Solr (ids "62z07a-default-entity:node/8:en" and "62z07a-default-entity:node/8:es") while the English content is missing.

Did anyone else encounter this or have an idea where to look for the source of the issue?

I guess it happens when creating the Index items, more specifically in createItemFromObject in the FieldsHelper, but that as far as I got.

πŸ› Bug report
Status

Needs work

Version

1.0

Component

General code

Created by

πŸ‡·πŸ‡΄Romania bogdan.dinu

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Comments & Activities

  • Issue created by @bogdan.dinu
  • Status changed to Needs review over 1 year ago
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    538 pass
  • πŸ‡¦πŸ‡ΉAustria drunken monkey Vienna, Austria

    Oh, yes, that sounds quite bad. But no, I have never heard of something like that before.
    For the sake of debugging, could you check whether the attached patch resolves the issue for you? That would be more or less my only guess what could be happening: some code somewhere manipulating the entity/translation in question to point to a different language. Then you might be able to find the code responsible.

    (As said, the patch is just for debugging purposes. I don’t think I’d commit it as-is, as it would decrease performance for everyone else. However, if it does help but you can’t figure out the root cause of the problem, feel free to just keep it apply to the module on your site – better to have a little less performance during indexing/searching but correct search results.)

  • πŸ‡·πŸ‡΄Romania bogdan.dinu

    Hi! Thank you so much for the help! The patch works and all mismatched documents are gone.
    I already had something similar implemented in the the getOriginalObject method.
    As you said, better to have a small performance decrease and no duplicates/omissions in search results :)

    I'll try to do some debugging to get to the root couse of the issue.

  • πŸ‡¨πŸ‡¦Canada joseph.olstad

    This patch appears to be a safe change to make.

  • Status changed to Needs work about 1 year ago
  • πŸ‡¦πŸ‡ΉAustria drunken monkey Vienna, Austria

    I don’t think I will commit the patch in this form, for the reasons stated in #2. I don’t want want to make searches slower for lots of people for which the current code works.

  • πŸ‡§πŸ‡ͺBelgium swentel

    I'm suffering from the same problem on a site. The setup is this: two languages, negotiation happens on domain (not sure whether that contributes to the problem or not).

    At first I thought the patch in πŸ› Rendered HTML Output doesnt respect activeLanguage completely Needs work solved the problem, which it didn't entirely. So, I started looking into the index and saw that the ss_search_api_language value didn't necessarily matched that language in the search api id.

    With this patch, things started acting as they should and the languages in both values match, and they would only show up on the site in the right language (before, some of the content appeared twice in the same view, e.g. in dutch, because the language contained nl)

    As far as I can see, the following might be the problem: whether or not the original language of the entity matches the default language of the site. The entities that had the problem had their original/initial language set on english, where the default language of the site is dutch.

    It kind of relates with πŸ› Rendered HTML Output doesnt respect activeLanguage completely Needs work in the same way that the entity which is being rendered/indexed isn't necessarily, well, 'right' .. :)

  • πŸ‡¦πŸ‡ΉAustria drunken monkey Vienna, Austria

    @swentel:

    As far as I can see, the following might be the problem: whether or not the original language of the entity matches the default language of the site. The entities that had the problem had their original/initial language set on english, where the default language of the site is dutch.

    For those items, do you get a Dutch item ID (entity:node/…:nl) with "ss_search_api_language":"en" or the other way round?

    Also, can you find out any less β€œcostly” way of acquiring the correct entity adapter than cloning the entity, in that place in the code?
    Or, for you, does this maybe look like a Core issue that needs to be reported and resolved there?

    Maybe something like the attached works, where I at least check to make sure we need the clone fix before cloning?

    Finally, is someone able to provide a failing test case for this?

  • πŸ‡§πŸ‡ͺBelgium swentel

    For those items, do you get a Dutch item ID (entity:node/…:nl) with "ss_search_api_language":"en" or the other way round?

    What happened is that an entity which has English as the default langcode, where the site's default language is Dutch, became indexed with the dutch language, so, the other way around.

    I'll try to test the patch in the coming days.

Production build 0.71.5 2024