Highlighter processor performance issue when fields are missing from search backend

Created on 17 April 2025, 5 days ago

Problem/Motivation

I performed a performance analysis on a search page built with Views using Xhgui/XHprof. The Highlighter processor emerged as a significant bottleneck. When it cannot find a field in the results returned by the search backend, it loads the original data from the indexing source and attempts to retrieve the value there. In our case, the indexing source was on slow storage, which further impacted performance.

// \Drupal\search_api\Plugin\search_api\processor\Highlight::addExcerpts() ->
// \Drupal\search_api\Plugin\search_api\processor\Highlight::getFulltextFields() ->
// \Drupal\search_api\Utility\FieldsHelper::extractItemValues()
      if ($missing_fields) {
        $this->extractFields($item->getOriginalObject(), $missing_fields);
        foreach ($missing_fields as $property_fields) {
          foreach ($property_fields as $field) {
            $item_values[$field->getFieldIdentifier()] = $field->getValues();
          }
        }
      }

Additionally, search backends do not index or return empty property values. This behavior increases the number of "missing" fields, as these empty fields are always considered missing and trigger additional data loading from the source.

Steps to reproduce

Proposed resolution

One potential solution I considered is to only mark required properties as missing, instead of marking all properties not found in the search backend's result as missing. However, this may not be ideal, since required properties should always be present in the index unless the index is outdated or a property became required after data was created (and existing data did not receive a default value).

So unless there is better idea how to improve the implementation, it would be helpful to display a warning in the create excerpt configuration, advising that only required fields should be included. Otherwise, the Highlighter will load items from the source, which can cause significant performance issues.

Remaining tasks

Feature request
Status

Active

Version

1.0

Component

General code

Created by

🇭🇺Hungary mxr576 Hungary

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

Sign in to follow issues

Comments & Activities

  • Issue created by @mxr576
  • 🇭🇺Hungary mxr576 Hungary
  • 🇭🇺Hungary mxr576 Hungary

    Maybe the excerpt generator could also respect the "highlight returned field data" settings and just skip fields that are not returned by the server...

  • 🇦🇹Austria drunken monkey Vienna, Austria

    Seems one potential fix here would be for the server backend plugin to include empty fields in their result items if they tried to retrieve the field, since this means that the item was indexed without a value for that field and explicitly returning the field as empty makes sense.

    However, it also makes sense to have a similar setting for the excerpt as for highlighting individual fields, for not loading field values from the database if not already included in the results. So, if you want to provide an MR for that, I’d be open to including this functionality. (Just “respecting” the existing setting would break sites relying on the existing behavior so I don’t think we can do that.)

Production build 0.71.5 2024