Problem/Motivation
Search API defines a Highlight SearchApiProcessor plugin, but it works by postprocessing the results in a simplistic way (i.e.: it doesn't recognize stopwords, synonyms, or stemming). See also
https://www.drupal.org/project/issues/search_api?text=highlight&status=Open →
However, OpenSearch 2.18 has its own highlighter, which understands various locales, matches on synonyms and stems, and handles stopwords correctly.
I'm a maintainer of
the Elasticsearch Connector module →
and when I tried porting the patch from our own version of this ticket (
✨
Highlighting support (leverage Elasticsearch highlighting)
Needs review
) to OpenSearch, it worked, so I'm filing this issue here so that both projects can collaborate on the idea!
Proposed resolution
Add a new Search API processor that uses OpenSearch's own highlighter.
More specifically, add a new SearchApiProcessor plugin that:
- preprocesses the search query sent to OpenSearch, adding a
highlight
clause that properly leverages the Highlighting API
- postprocesses the search results when they come back from OpenSearch to generate an Excerpt from the
highlight
clause in each search hit
- builds a SearchApiProcessor configuration form that exposes a bunch of options (see below)
Note that, because Search API OpenSeach doesn't yet support adding term_vector
fields, the initial version of this patch won't support the Lucene Fast Vector Highlighter type (i.e.: fvh
) and related highlighting options.
Remaining tasks
- Write a patch for the 3.x branch with tests
- Review and feedback
- RTBC and feedback
- Commit
- Backport to 2.x branch (?)
User interface changes
Adds an "OpenSearch Highlighter" processor to the page at /admin/config/search/search-api/index/YOUR_INDEX/processors
with options to configure:
- the Fields to highlight (i.e.: choose from the list of fields in the index),
- the Highlighter type (i.e.: Unified or Plain),
- the Boundary scanner to use (if the Unified highlighter is selected; i.e.: Sentence or Word),
- the Boundary scanner locale to use (if the Sentence boundary scanner is selected),
- the Fragmenter to use (if the Plain highlighter is selected; i.e.: Simple or Span),
- the HTML tag to use to highlight the search term in the excerpt
- the Snippet encoder (i.e.: No encoding or HTML)
- the Maximum number of snippets per field
- the Snippet size (in characters)
- the Snippet size when there is no match (in characters)
- the Snippet order (i.e.: order they appear, or relevance)
- whether to show snippets from all fields, or only show snippets from fields that match the query
- the text to use to join snippets together when rendering the excerpt
API changes
Only API additions.
Data model changes
Adds a configuration in the plugin.plugin_configuration.search_api_processor.opensearch_highlight
config namespace (see MR for details).