Trim in Highlight processor kills "normal" spaces.

Created on 17 April 2019, almost 6 years ago
Updated 5 April 2024, 11 months ago

We are trying to store the rendered html of an entity in solr to get better performance when loading. Unfortunatelly without the highlighting processor the html tags in the entity code get removed. We abuse the highlighting processor for that sake - without adding any tags for highlighting (in the index the settings for "Highlighting prefix" and "Highlighting suffix" are empty).

When doing so we realized that the processor trims out whitespaces, e.g.:
<li class="footnote" id="footnote1_88xcp4o"><a class="footnote-label" href="#footnoteref1_88xcp4o">1.</a> Hierbei handelt es sich um die Endnummer der Signatur dieses Reisetagebuchs innerhalb des Bestandes <a href="https://editionhansposse.gnm.de/wisski/navigate/195/view" title="Posse, Hans">Hans Posses</a> im <a href="https://editionhansposse.gnm.de/wisski/navigate/2222/view" title="Deutsches Kunstarchiv">Deutschen Kunstarchiv</a>.</li>

Should result to a display like with the according links:
Hierbei handelt es sich um die Endnummer der Signatur dieses Reisetagebuchs innerhalb des Bestandes Hans Posses im Deutschen Kunstarchiv.

However due to the trim in the Highlighting Processor it results to:
Hierbei handelt es sich um die Endnummer der Signatur dieses Reisetagebuchs innerhalb des Bestandes Hans PossesimDeutschen Kunstarchiv.

As it is unclear to me why the additional spaces in search_api/src/Plugin/search_api/processor/Highlight.php line 636 ($text = preg_replace($regex, $replace, ' ' . $text . ' ');) get added at all I would prefer leaving them out and to not trim the text after that.

πŸ› Bug report
Status

Needs work

Version

1.0

Component

Plugins

Created by

πŸ‡©πŸ‡ͺGermany Knurg

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024