Search module should keep language information on snippet content.

Created on 8 April 2013, over 11 years ago
Updated 31 March 2023, over 1 year ago

Problem/Motivation

The search module is putting out snippet spans with xml:lang attributes on them, which is generally good. But if the original content has xml:lang spans in it, they are not reflected in the snippets.

As noted in #6, the solution should handle the use of lang attributes on any tag, and probably doesn't need to handle xml:lang attributes.

Steps to reproduce

  1. Install Drupal 10.1.x with Umami profile (Demo: Umami Food Magazine (Experimental))
  2. From the Manage menu, navigate to Content, search for "pasta" in the Title field filter.
  3. Edit the node with the title "Pasta vegetariana al horno súper fácil". (es/node/3/edit)
  4. For the field, Resumen, update the Formato de texto) to HTML completo. Click the blue Continuar button when the Change text format pop-up warning displays.
  5. On the Resumen field, in the CKEditor toolbar, click Origen to edit the HTML.
  6. Added the following inside the <p></p> tags: <span lang="en">Pasta is delicious!</span> <span xml:lang="en">XML Pasta is tasty!</span> so that the contents of the field are now:

    <p>
    	Una pasta al horno es la comida más fácil y saludable. Este delicioso plato es súper rápido de preparar y una comida ideal entre semana para toda la familia. <span lang="en">Pasta is delicious!</span> <span xml:lang="en">XML Pasta is tasty!</span>
    </p>
    
  7. Save the changes (at the bottom of the Edit (Editar) form, click the blue button Guardar (esta traducción).)
  8. Return to the site and ensure that you're on the Español version (which should be the case since you just edited a Spanish language node).
  9. Use the search field (Buscar) to search for the text "pasta".
  10. In the search result snippet for "Pasta vegetariana al horno súper fácil" (the 2nd result), the search result snippet displays: "Pasta vegetariana al horno súper fácil Una pasta al horno es la comida más fácil y saludable. Este … y una comida ideal entre semana para toda la familia. Pasta is delicious! XML Pasta is tasty! …"
  11. Inspect the HTML for this snippet. Notice that both <span> tags have been stripped.

Proposed resolution

I think we won't try to do the snippet processing really carefully -- it's just too complicated to break the text up and preprocess it. But we can fix the snippet creator to put out spans with xml:lang attributes.

Remaining tasks

1. Manually apply changes from the patch in #3 to core/modules/search.module, since that file has been massively refactored since that patch was made.
2. Add test(s)
3. Review, etc.
4. Commit

User interface changes

Snippets will have the right language attributes on them.

API changes

Original report by Heine

#867114: Search results should add lang tag if language of search result differs introduced an xml:lang attribute on certain search results whose declared language differed from the currently displayed language. This prevents snippet title and snippet text from inheriting the ancestral language.

I don't think search/node module can claim to know the language of the snippet though as it will just strip tags, removing any language context from the content. Consider the following node in language "nl":

<p>De titel van ons nieuwste boek: <span xml:lang="en">The Late Roman Cemeteries; Stray Finds and Excavations</span></p>

Searching for "boek" gives the following, incorrect fragment:

<p class="search-snippet" xml:lang="nl">        De titel van ons nieuwste <strong>boek</strong>:   The Late Roman Cemeteries; Stray Finds and Excavations            ...</p>

Should it not specify xml:lang=""? (or maybe xml:lang="und" ? need to digg for the correct xhtml spec).

🐛 Bug report
Status

Needs work

Version

9.5

Component
Search 

Last updated 8 days ago

Created by

🇳🇱Netherlands heine

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

  • Needs backport to D7

    After being applied to the 8.x branch, it should be considered for backport to the 7.x branch. Note: This tag should generally remain even after the backport has been written, approved, and committed.

  • D8MI

    (Drupal 8 Multilingual Initiative) is the tag used by the multilingual initiative to mark core issues (and some contributed module issues). For versions other than Drupal 8, use the i18n (Internationalization) tag on issues which involve or affect multilingual / multinational support. That is preferred over Translation.

  • Needs reroll

    The patch will have to be re-rolled with new suggestions/changes described in the comments in the issue.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇺🇸United States Amber Himes Matz Portland, OR USA

    We triaged this issue as part of the Bug Smash Initiative in the #bugsmash Drupal Slack channel.

    I think I reproduced the issue with the following steps:

    1. Install Drupal 10.1.x with Umami profile (Demo: Umami Food Magazine (Experimental))
    2. From the Manage menu, navigate to Content, search for "pasta" in the Title field filter.
    3. Edit the node with the title "Pasta vegetariana al horno súper fácil". (es/node/3/edit)
    4. For the field, Resumen, update the Formato de texto) to HTML completo. Click the blue Continuar button when the Change text format pop-up warning displays.
    5. On the Resumen field, in the CKEditor toolbar, click Origen to edit the HTML.
    6. Added the following inside the <p></p> tags: <span lang="en">Pasta is delicious!</span> <span xml:lang="en">XML Pasta is tasty!</span> so that the contents of the field are now:

      <p>
      	Una pasta al horno es la comida más fácil y saludable. Este delicioso plato es súper rápido de preparar y una comida ideal entre semana para toda la familia. <span lang="en">Pasta is delicious!</span> <span xml:lang="en">XML Pasta is tasty!</span>
      </p>
      
    7. Save the changes (at the bottom of the Edit (Editar) form, click the blue button Guardar (esta traducción).)
    8. Return to the site and ensure that you're on the Español version (which should be the case since you just edited a Spanish language node).
    9. Use the search field (Buscar) to search for the text "pasta".
    10. In the search result snippet for "Pasta vegetariana al horno súper fácil" (the 2nd result), the search result snippet displays: "Pasta vegetariana al horno súper fácil Una pasta al horno es la comida más fácil y saludable. Este … y una comida ideal entre semana para toda la familia. Pasta is delicious! XML Pasta is tasty! …"
    11. Inspect the HTML for this snippet. Notice that both <span> tags have been stripped.

    Note to re-rollers! The code that the last patch modifies still exists in core/modules/search/search.module, but that file has been massively refactored, so I would be very surprised if a regular re-roll would work. I think what probably needs to happen is someone needs to take the latest patch's changes and manually apply them to core/modules/search/search.module and create a new patch.

    Tests will also be needed.

    I updated the issue summary with my steps to reproduce and also added remaining tasks.

  • 🇺🇸United States Amber Himes Matz Portland, OR USA
Production build 0.71.5 2024