Problem/Motivation
I performed a performance analysis on a search page built with Views using Xhgui/XHprof. The Highlighter processor emerged as a significant bottleneck. When it cannot find a field in the results returned by the search backend, it loads the original data from the indexing source and attempts to retrieve the value there. In our case, the indexing source was on slow storage, which further impacted performance.
// \Drupal\search_api\Plugin\search_api\processor\Highlight::addExcerpts() ->
// \Drupal\search_api\Plugin\search_api\processor\Highlight::getFulltextFields() ->
// \Drupal\search_api\Utility\FieldsHelper::extractItemValues()
if ($missing_fields) {
$this->extractFields($item->getOriginalObject(), $missing_fields);
foreach ($missing_fields as $property_fields) {
foreach ($property_fields as $field) {
$item_values[$field->getFieldIdentifier()] = $field->getValues();
}
}
}
Additionally, search backends do not index or return empty property values. This behavior increases the number of "missing" fields, as these empty fields are always considered missing and trigger additional data loading from the source.
Steps to reproduce
Proposed resolution
One potential solution I considered is to only mark required properties as missing, instead of marking all properties not found in the search backend's result as missing. However, this may not be ideal, since required properties should always be present in the index unless the index is outdated or a property became required after data was created (and existing data did not receive a default value).
So unless there is better idea how to improve the implementation, it would be helpful to display a warning in the create excerpt configuration, advising that only required fields should be included. Otherwise, the Highlighter will load items from the source, which can cause significant performance issues.
Remaining tasks