Highlighted excerpt does not match apostrophe in search query.

Created on 17 February 2025, about 1 month ago

When using a search query that includes an apostrophe, the highlighted search excerpt in the results does not match the apostrophe and that word remains un-highlighted.

Enable the search highlight process plugin. Perform a search with an apostrophe in the search query that matches a result. For example "Women's Law Society" will highlight "Law Society" but not "Women's"

Remove the extra HTML encoding before the search highlight is added.

Adjust the Highlight.php plugin accordingly

Extra notes:

Maybe I'm missing something but when the search excerpt highlight is calculated, in Highlight::createExcerpt there is $text = Html::decodeEntities($text); before calculating the ranges to highlight. However, before calling highlightField, the text is re-encoded, so the highlight range doesn't match in $out[] = Html::escape(mb_substr($text, $from, $to - $from));

Perhaps there's another reason for this Html::escape but when I remove it, the apostrophe in the excerpt can be highlighted correctly.

🐛 Bug report
Status

Active

Version

1.0

Component

General code

Created by

🇳🇿New Zealand davidwhthomas

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @davidwhthomas
  • Pipeline finished with Failed
    about 1 month ago
    Total: 593s
    #426947
  • 🇳🇿New Zealand davidwhthomas

    One of the tests fails, I'm not sure of the significance of this:

         ✘ Postprocess search results with complex keys
           ┐
           ├ Excerpt was added.                          
           ├ Failed asserting that two strings are equal.
           ┊ ---·Expected
           ┊ +++·Actual
           ┊ @@ @@
           ┊ -'…·This·foo·text·bar·will·get·baz·riddled·with·<strong>·tags.·…'
           ┊ +'…·This·foo·text·bar·will·get·baz·riddled·with··tags.·…'
           │
           │ /builds/issue/search_api-3507292/tests/src/Unit/Processor/HighlightTest.php:859
           ┴
  • 🇦🇹Austria drunken monkey Vienna, Austria
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks for reporting this issue!
    However, as you see by the failing test, this problem is unfortunately not that easy to fix as we do want to have our excerpts be valid HTML, which is why we need to escape the plain text version before adding the highlighting tags. Conceivably, we could also do the highlighting on the plain-text version and then escape the text right before adding the highlighting tags, but that would require larger changes to the code.

    For now, would you be able to add a regression test for this problem to HighlightTest? That way we can immediately see whether a proposed solution works as intended while also doesn’t break any existing functionality.

  • 🇦🇹Austria drunken monkey Vienna, Austria
Production build 0.71.5 2024