Reduce multilingual duplicate results via grouping or collapse?

Created on 17 September 2024, 7 months ago
Updated 19 September 2024, 7 months ago

Setup

  • Solr version: 9.5.0
  • Drupal Core version: 10.3.5
  • Search API version: 8.x-1.35
  • Search API Solr version: 4.3.5
  • Configured Solr Connector: Standard

Issue

I have multilingual content indexed into Solr; this creates multiple Solr docs keyed by node id and language code, e.g. 22:en, 22:fr, etc. Search results are returned by a view, and by default duplicate results for each node are returned. I don't want to apply a views language filter as that could exclude result matches for languages other than the site default.

It looks like the solution is to apply Solr grouping, as discussed at https://www.drupal.org/project/search_api_solr/issues/2900410 . Since hook_search_api_solr_query_alter() is deprecated, I adapted the solution in https://www.drupal.org/project/search_api_solr/issues/2900410#comment-12... to be a queryPreExecute():

$query->setOption('search_api_grouping', [
        'use_grouping' => TRUE,
        'fields' => ['nid_string'],
        'group.ngroups' => TRUE,
      ]);

(nid_string is used since default its_nid was rejected by Solr.)

This successfully reduced the results to a single instance of each node, but the facet counts showed the ungrouped values.

Since https://solr.apache.org/guide/solr/latest/query-guide/result-grouping.html says "Prefer Collapse & Expand instead" I had a second attempt per https://solr.apache.org/guide/solr/latest/query-guide/collapse-and-expan..., via a PreQuery event subscriber:

$solariumQuery = $event->getSolariumQuery();
$solariumQuery->addParam('fq', '{!collapse field=ss_nid_string}');

This restricts the results to a single instance of each node and the facet counts are correct.

Did I miss something on the first attempt? Is the second approach better (it works) but if so, how can Search API Solr support it more directly (e.g. should Search API offer support for Solr collapse)?

💬 Support request
Status

Active

Version

4.3

Component

Code

Created by

🇳🇿New Zealand jonathan_hunt

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @jonathan_hunt
  • 🇩🇪Germany mkalkbrenner 🇩🇪

    I think that this is the wrong way. I know so many multilingual sites that don't have any content available in any language. And none of them has to deal with custom code like that.

    Drupal itself has a language fallback system, display the default translation if a required translation is not available.
    And this path is followed by Search API and Search API Solr.

    Just add the general "Language (with fallback)" field to your index and filter the view using that field.

  • 🇳🇿New Zealand jonathan_hunt

    Thanks for the feedback. I added Language (with fallback) (language_with_fallback) to the index, and also the search results view is now filtered by, either
    1. "Language (with fallback)" filtered by Site's default language:
    1.1 returns 0 results for a keyword that exists only in non-default translation
    1.2 returns 1 result for keyword present in 2 translations
    2. "Language (with fallback)" filtered by each of the available languages:
    2.1 returns 1 match for keyword present in non-default translation
    2.2 returns 2 results for keyword present in 2 translations

    However, the desired result is that users get:
    a. 1 result for a keyword that exists only in non-default translation
    b. 1 result for keyword present in 2 translations (since once they visit the node they will see what non-default language field values exist).
    At this stage the {!collapse field=ss_nid_string} avoids the 2 results for same node issue.

    This site may differ from others in that we don't offer a user language switcher since 99% of site is in english, with sparse translations available.

Production build 0.71.5 2024