Fulltext search with Relevance sorting yields incorrect search results

Created on 6 July 2024, 9 months ago
Updated 19 September 2024, 7 months ago

Setup

  • Solr version:9.6.1
  • Drupal Core version:10.3
  • Search API version:8.x-1.35
  • Search API Solr version:4.3.4
  • Configured Solr Connector:Solr Cloud with Basic Auth

Issue

  • Configure solr on a fresh Drupal site
  • Add the term 'Chocolate' to tags (Structure -> Taxonomy-> Tags)
  • Create an aricle with title 'Article with 'chocolate' as tag'. Add 'Chocolate' tag. Publish
  • Create another article with title 'Article with 'chocolate' in description'. Do not add 'Chocolate' tag. Add the text 'Chocolate' in description instead. Make it H2 (<h2>Chocolate</h2>)
  • Add tag name field to the index (Property path: field_tags:entity:name). Choose 'Fulltext' as type instead of 'string'
  • Add 'Rendered HTML output' field to the index. Choose 'Fulltext' as type.

  • Enable 'HTML Filter' processor for all fields. Keep the default values under 'Tag boosts'

  • Index all contents
  • Create a view for this index. Add an exposed fulltext filter. Choose only 'Content ยป Tags ยป Taxonomy term ยป Name' in searched fields.

  • Add exposed 'Search: Relevance' sort
  • Check the view. Search for 'Chocolate'

Expectation

  • Only aricle with 'Chocolate' tag would come in the search result

Actual result

  • Both articles appear in the search results

๐Ÿ’ฌ Support request
Status

Closed: works as designed

Version

4.3

Component

Code

Created by

๐Ÿ‡ฎ๐Ÿ‡ณIndia Akhil Babu Chengannur

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @Akhil Babu
  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia Akhil Babu Chengannur
  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia Akhil Babu Chengannur
  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia Akhil Babu Chengannur
  • ๐Ÿ‡ฉ๐Ÿ‡ชGermany mkalkbrenner ๐Ÿ‡ฉ๐Ÿ‡ช

    I agree that the result isn't expected. But I don't consider it to be a bug, but an edge case.

    You could implement an event subscriber and add a condition on the term name based on the query. That will add a filter query "fq" which eliminates the false positives.

    Or index the content twice in another Search API Index to only contain the term names. That will automatically lead to a different filter query.

    In general, a lot of boosts, spellcheck, suggestions, term queries and statistics are per core or collection in Solr. So if you're not limited, it it always advisable to create multiple cores or collections for your different use-cases.

    So I would only use tag based boosting if you search rendered content instead of a specific field.

  • Status changed to Closed: works as designed 7 months ago
  • ๐Ÿ‡ฉ๐Ÿ‡ชGermany mkalkbrenner ๐Ÿ‡ฉ๐Ÿ‡ช

    no further feedback

Production build 0.71.5 2024