Support OpenSearch server highlighting

Created on 22 November 2024, 2 months ago

Problem/Motivation

Search API defines a Highlight SearchApiProcessor plugin, but it works by postprocessing the results in a simplistic way (i.e.: it doesn't recognize stopwords, synonyms, or stemming). See also https://www.drupal.org/project/issues/search_api?text=highlight&status=Open

However, OpenSearch 2.18 has its own highlighter, which understands various locales, matches on synonyms and stems, and handles stopwords correctly.

I'm a maintainer of the Elasticsearch Connector module and when I tried porting the patch from our own version of this ticket ( Highlighting support (leverage Elasticsearch highlighting) Needs review ) to OpenSearch, it worked, so I'm filing this issue here so that both projects can collaborate on the idea!

Proposed resolution

Add a new Search API processor that uses OpenSearch's own highlighter.

More specifically, add a new SearchApiProcessor plugin that:

  1. preprocesses the search query sent to OpenSearch, adding a highlight clause that properly leverages the Highlighting API
  2. postprocesses the search results when they come back from OpenSearch to generate an Excerpt from the highlight clause in each search hit
  3. builds a SearchApiProcessor configuration form that exposes a bunch of options (see below)

Note that, because Search API OpenSeach doesn't yet support adding term_vector fields, the initial version of this patch won't support the Lucene Fast Vector Highlighter type (i.e.: fvh) and related highlighting options.

Remaining tasks

  1. Write a patch for the 3.x branch with tests
  2. Review and feedback
  3. RTBC and feedback
  4. Commit
  5. Backport to 2.x branch (?)

User interface changes

Adds an "OpenSearch Highlighter" processor to the page at /admin/config/search/search-api/index/YOUR_INDEX/processors with options to configure:

  1. the Fields to highlight (i.e.: choose from the list of fields in the index),
  2. the Highlighter type (i.e.: Unified or Plain),
  3. the Boundary scanner to use (if the Unified highlighter is selected; i.e.: Sentence or Word),
  4. the Boundary scanner locale to use (if the Sentence boundary scanner is selected),
  5. the Fragmenter to use (if the Plain highlighter is selected; i.e.: Simple or Span),
  6. the HTML tag to use to highlight the search term in the excerpt
  7. the Snippet encoder (i.e.: No encoding or HTML)
  8. the Maximum number of snippets per field
  9. the Snippet size (in characters)
  10. the Snippet size when there is no match (in characters)
  11. the Snippet order (i.e.: order they appear, or relevance)
  12. whether to show snippets from all fields, or only show snippets from fields that match the query
  13. the text to use to join snippets together when rendering the excerpt

API changes

Only API additions.

Data model changes

Adds a configuration in the plugin.plugin_configuration.search_api_processor.opensearch_highlight config namespace (see MR for details).

Feature request
Status

Active

Version

3.0

Component

Code

Created by

🇨🇦Canada mparker17 UTC-4

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.71.5 2024