Highlight processor performance?

Created on 13 December 2023, 11 months ago
Updated 22 January 2024, 10 months ago

Problem/Motivation

Since updating one of my projects to Drupal 10 and also updating all Search API related modules I have received a lot of feedback from site users reporting that searching is very slow now.

So I decided to try to track/monitor the performance with New Relic (I'm not an expert with it, though, sorry) and it turns out one of the slowest transactions seems to be related to Search API (over 20 seconds per transaction, but not every time). Closer investigation led me to the Highlight processor (as far as I was able to decipher that transaction stack trace from New Relic). And indeed, when I switched off the Highlight processor from my Search Index it looks like the performance was improved a lot. I wasn't even using the highlight anywhere so in my case it was just a mistake that it was even enabled.

Out of curiosity I would like to know have anyone else encountered any performance issues with the highlight processor and I recall there was recently an issue with the html filter processor which was fixed? Could there be similar problems going on with the highlight processor or is the problem in somewhere else in my Search API configurations?

Here's some information from New Relic if this is of any use

code.filepath
/var/www/html/web/modules/contrib/search_api/src/Plugin/search_api/processor/Highlight.php

code.function
postprocessSearchResults

code.lineno
245

code.namespace
Drupal\search_api\Plugin\search_api\processor\Highlight

duration
24831

exclusiveDurationMs
null

name
Custom/Drupal\search_api\Plugin\search_api\processor\Highlight::postprocessSearchResults

As far as I understand the result that particular transaction here took over 24 seconds. After I disabled the highlight processor it seems that New Relic is no longer picking it up in the slowest transactions at all.

Edit: Just a moment ago it seems New Relic picked up one transaction from my search page, and now the transaction duration is 623 ms compared to the previous >24000 ms, and as expected the stack trace doesn't contain anything about Highlight processor because it's disabled now.

💬 Support request
Status

Fixed

Version

1.31

Component

General code

Created by

🇫🇮Finland hartsak

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @hartsak
  • 🇫🇮Finland hartsak

    In case this helps a bit more, here are all the items from the same slow stack. I realized I had previously only added one of these as an example but I don't know how to get this information out of New Relic in a better way.

    Could it be that I have too many (fulltext) fields where the highlight processor should run?

    1. Drupal\search_api\Query\Query::postExecute
    2. Drupal\search_api\Entity\Index::postprocessSearchResults
    3. Drupal\search_api\Plugin\search_api\processor\Highlight::postprocessSearchResults
    4. Drupal\search_api\Plugin\search_api\processor\Highlight::addExcerpts
    5. Drupal\search_api\Plugin\search_api\processor\Highlight::getFulltextFields
    6. Drupal\search_api\Utility\FieldsHelper::extractItemValues
  • Status changed to Postponed: needs info 11 months ago
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks for reporting this issue?
    If the time is mostly spent in FieldsHelper::extractItemValues() it seems like this cannot be a similar problem as the HTML Filter processor, which was about (semi-)parsing HTML. Which the Highlight processor does, too, so it really would have been an option, especially since there was a change in this code not too long ago ( 🐛 A PCRE error (#2) occurred during results highlighting. Fixed ) – if you upgraded from a version before 1.29, then that would have been newly in there.

    However, as it stands, this seems to have more to do with field extraction. Are you maybe using fields for the excerpt which might be very slow to retrieve, e.g., attachments? You should exclude those fields from the excerpt via the “Exclude fields from excerpt” option. (Or, of course, disable the whole processor if you’re not using it anyways.)
    Why this only showed after an update is another mystery in that case, though. Maybe you’re using Solr, upgraded that module, too, and then Solr suddenly failed to retrieve those fields (in a usable form)? For debugging, you could try downgrading just the Search API module and seeing if that fixes the issue. (If you are still interested in investigating this, of course – otherwise, we’ll have to wait and see whether someone else complains about this.)

    In any case, thanks again for reporting, and for the detailed information.

  • Status changed to Fixed 10 months ago
  • 🇫🇮Finland hartsak

    Thank you very much for the reply @drunken monkey!

    It's just as you said - one of the fields enabled in the highlight processor was indeed a "Search api attachments" field. I hadn't paid any attention to those settings as I wasn't the original creator of that particular search index. That might very well be the reason why the highlight filter caused some performance issues in my case. I did a double check of that (it's always good to conduct testing in a production environment...) by enabling the highlight filter again and looking at the transaction data in New Relic.

    After I selected these in the "Exclude fields from excerpt" options the performance issues were gone:
    * rendered html output (might have very well contained some attachments in some cases)
    * search api attachments fields (important!)
    * taxonomy reference fields (not completely sure was this needed too, but it might have helped)

    And just for the record I have the following module versions enabled:
    * search_api: 8.x-1.31
    * search_api_solr: 4.3.1
    * search_api_attachments: 9.0.2
    * Drupal: 10.1.6
    * SOLR: 8.5.2

    So I guess this ticket is no longer needed, as the problem is solved in my case. Thanks a lot for help!

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024