Highlighting is very slow even if done by solr directly

Created on 9 June 2022, almost 3 years ago
Updated 28 August 2024, 7 months ago

Setup

  • Solr version: 8.11.1
  • Drupal Core version: 9.3.13
  • Search API version: 1.23.0
  • Search API Solr version: 4.2.7
  • Configured Solr Connector: Standard

Issue

Hello,

I configured the module and everything works quite well. Only one thing is still unsatisfying: the performance when highlighting is enabled. As mentioned in https://www.drupal.org/project/search_api_solr/issues/2616090 โ†’ I did the configuration and let solr do the highlighting which should be fast. But in fact it takes like 4s with highlighting and 1s without. I suspect, the highlighting directly through solr isn't used although I configured it that way.

I have several problems with that I saw when debugging into the issue:

1. If I understand the field captions in the module itself (version 4.2.7) correctly, even if I activate the highlighting snippet feature in the solr server configuration (which I did), I need to enable the highlighting processor (which I also did). If I disable the processor I don't get an excerpt anymore. So I guess this is different as described in https://www.drupal.org/project/search_api_solr/issues/2616090 โ†’ Is that the correct way I'm doing it?

2. If 1. is correct, the problem is that the resulting query sent to the solr doesn't contain any entry in hl.fl. Where do I need to define the fields added to hl.fl in the end? You can see my field definition in the attachment).

3. But even if I manually add a field using $hl->setFields('tcngramm_X3b_de_rendered_item'); in https://git.drupalcode.org/project/search_api_solr/-/blob/4.x/src/Plugin... then the result does contain data in the "highlight" json. But still this isn't used in the resulting excerpt (which works, but takes very long to generate like 4s). This means the data from the solr isn't used at all and the "slow" php logic is used instead in the highlighter in the search_api module. This shouldn't be the case right?

I suspect I'm configuring something the wrong way. Or maybe it's connected to my fields (I use ngram because I need partial word suppotr in the search)?

Any help is highly appreciated!

Thanks in advance!

Bye Defcon0

๐Ÿ› Bug report
Status

Fixed

Version

4.0

Component

Code

Created by

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • ๐Ÿ‡ฉ๐Ÿ‡ฐDenmark ressa Copenhagen

    Setup

    • Solr version: 8.11.2
    • Drupal Core version: 10.0.9
    • Search API version: 1.29.0
    • Search API Solr version: 4.2.10
    • Configured Solr Connector: Solr Cloud with Basic Auth

    Thanks for reporting @Defcon0 and @mkalkbrenner for fixing it.

    I can confirm that the Search API Highlight processor and the "Search: Excerpt" field it makes available in Views in Solr-based indexes works well.

    But it's important to take care to configure it correctly, otherwise you'll get a performance hit. I am so glad I found this issue. I have been trying to store and retrieve data in Solr, based on the great tips on @drunken monkey's Create a search view that doesnโ€™t load entities from the database โ†’ and Karim Boudjema's elaborate explanation in "3. Solr as backend for Views" from Solr in Drupal 8/9: a real live example (SFDUG, Oct 2020, starts at 31:37) but couldn't get the response time really low ...

    Reading this issue and experimenting with the Highlight processor settings I managed to get it pretty low. Here are some benchmarks with 7000 nodes, searching for two words which returns 515 nodes:

    1. With the default values for Highlight processor

      "Highlight returned field data" > "Always"
      "Exclude fields from excerpt" > No fields selected

      Query build time   1.43 ms
      Query execute time   611.09 ms
      View render time   1240.25 ms
    2. Setting Highlight processor

      2.1 "Highlight returned field data" > Change from "Always" to "If the server returns fields"

      Query build time   1.29 ms
      Query execute time   306.15 ms
      View render time   634.63 ms

      2.2 Select all irrelevant fields under "Exclude fields from excerpt"
      Selecting all fields, except the relevant field makes a big difference.

      Query build time   1.27 ms
      Query execute time   310.67 ms
      View render time   649.05 ms
    3. With both "Highlight returned field data" and "Exclude fields from excerpt" correctly configured

      Query build time   1.02 ms
      Query execute time   21.25 ms
      View render time   85.21 ms
      

    It seems to me like these two changes in settings make no difference in performance, even though they are recommended many places:

    • Enabling "Link this field to its item" for a Title field under "Fields"
    • Enabling "Skip item access checks" under Other > Query settings: Settings

    Maybe if you have 100'000's of nodes it makes a difference?

  • Automatically closed - issue fixed for 2 weeks with no activity.

  • Status changed to Fixed 7 months ago
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States j-barnes

    @Defcon0 - I'm running into the same issue where my times went up significantly after enabling the Highlight processor (with Retrieve result data from Solr" and "Retrieve highlighted snippets" enabled).

    Query build time   49.77 ms
    Query execute time   130.9 ms
    View render time   229.75 ms
    

    to

    Query build time   53.55 ms
    Query execute time   1988.97 ms
    View render time   2075.75 ms
    

    After troubleshooting a bit, I noticed that field / excerpt were not available when the field was not explicitly added to the view. After adding my rendered_html field to the view (hidden), my times were near instant again. (Originally I just had Search: Excerpt and would select my search fields in Search: Fulltext search)

    With render html field added to the view:

    Without render html field:

Production build 0.71.5 2024