Mention that boosting on Content Type Based and Recent Dates only works if sorted by score

Created on 28 May 2024, about 1 month ago
Updated 18 June 2024, 9 days ago

I have similar issue as https://www.drupal.org/project/search_api_solr/issues/3350879 πŸ’¬ Boosting Not Working On Content Type Based Fixed

Content Type boosting and Boost more recent dates are not included in the score calculation.

This is my boosting scheme:

I have 4 content types with specific boosts

  • Product group one = 21.00 boost
  • Product group two = 13.00 boost
  • Product group three = 8.00 boost
  • Product group four = 5.00 boost

I have Boost more recent dates on field Authored on 3.00 boost

Body is 1.00 boost and Title is 2.00 boost.

And I'm getting this result in REST json from view, when I type term "eros":

[
{
"title": "Cui Eros Sed",
"created": "Sunday, April 14, 2024 - 00:46",
"type": "Product group one",
"search_api_relevance": "5.255206",
"search_api_solr_score_debugging": "
0.221677 <= weight(tcedgem_X3b_en_body:eros in 0) [SchemaSimilarity], result of:
... 0.221677 <= score(freq=2.0), computed as boost * idf * tf from:
... ... 0.280615 <= idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
... ... ... 35.000000 <= n, number of documents containing term
... ... ... 46.000000 <= N, total number of documents with field
... ... 0.789968 <= tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
... ... ... 2.000000 <= freq, occurrences of term within document
... ... ... 1.200000 <= k1, term saturation parameter
... ... ... 0.750000 <= b, length normalization parameter
... ... ... 536.000000 <= dl, length of field (approximate)
... ... ... 2081.565200 <= avgdl, average length of field

5.033529 <= weight(tcngramm_X3b_en_title:eros in 0) [SchemaSimilarity], result of:
... 5.033529 <= score(freq=1.0), computed as boost * idf * tf from:
... ... 2.000000 <= boost
... ... 3.444683 <= idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
... ... ... 1.000000 <= n, number of documents containing term
... ... ... 46.000000 <= N, total number of documents with field
... ... 0.730623 <= tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
... ... ... 1.000000 <= freq, occurrences of term within document
... ... ... 1.200000 <= k1, term saturation parameter
... ... ... 0.750000 <= b, length normalization parameter
... ... ... 3.000000 <= dl, length of field
... ... ... 39.304348 <= avgdl, average length of field

"
},
{
"title": "Volutpat",
"created": "Sunday, April 14, 2024 - 10:42",
"type": "Product group one",
"search_api_relevance": "0.25831726",
"search_api_solr_score_debugging": " 0.258317 <= weight(tcedgem_X3b_en_body:eros in 35) [SchemaSimilarity], result of:
... 0.258317 <= score(freq=6.0), computed as boost * idf * tf from:
... ... 0.280615 <= idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
... ... ... 35.000000 <= n, number of documents containing term
... ... ... 46.000000 <= N, total number of documents with field
... ... 0.920540 <= tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
... ... ... 6.000000 <= freq, occurrences of term within document
... ... ... 1.200000 <= k1, term saturation parameter
... ... ... 0.750000 <= b, length normalization parameter
... ... ... 504.000000 <= dl, length of field (approximate)
... ... ... 2081.565200 <= avgdl, average length of field

"
},
{
"title": "Plaga",
"created": "Wednesday, April 10, 2024 - 06:59",
"type": "Product group three",
"search_api_relevance": "0.2577701",
"search_api_solr_score_debugging": " 0.257770 <= weight(tcedgem_X3b_en_body:eros in 31) [SchemaSimilarity], result of:
... 0.257770 <= score(freq=6.0), computed as boost * idf * tf from:
... ... 0.280615 <= idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
... ... ... 35.000000 <= n, number of documents containing term
... ... ... 46.000000 <= N, total number of documents with field
... ... 0.918590 <= tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
... ... ... 6.000000 <= freq, occurrences of term within document
... ... ... 1.200000 <= k1, term saturation parameter
... ... ... 0.750000 <= b, length normalization parameter
... ... ... 536.000000 <= dl, length of field (approximate)
... ... ... 2081.565200 <= avgdl, average length of field

"
},
{
"title": "Jugis Roto Singularis Vindico",
"created": "Monday, April 8, 2024 - 11:57",
"type": "Product group three",
"search_api_relevance": "0.2549155",
"search_api_solr_score_debugging": " 0.254916 <= weight(tcedgem_X3b_en_body:eros in 14) [SchemaSimilarity], result of:
... 0.254916 <= score(freq=5.0), computed as boost * idf * tf from:
... ... 0.280615 <= idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
... ... ... 35.000000 <= n, number of documents containing term
... ... ... 46.000000 <= N, total number of documents with field
... ... 0.908417 <= tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
... ... ... 5.000000 <= freq, occurrences of term within document
... ... ... 1.200000 <= k1, term saturation parameter
... ... ... 0.750000 <= b, length normalization parameter
... ... ... 472.000000 <= dl, length of field (approximate)
... ... ... 2081.565200 <= avgdl, average length of field

"
},
{
"title": "Defui",
"created": "Thursday, April 11, 2024 - 21:25",
"type": "Product group two",
"search_api_relevance": "0.2549155",
"search_api_solr_score_debugging": " 0.254916 <= weight(tcedgem_X3b_en_body:eros in 19) [SchemaSimilarity], result of:
... 0.254916 <= score(freq=5.0), computed as boost * idf * tf from:
... ... 0.280615 <= idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
... ... ... 35.000000 <= n, number of documents containing term
... ... ... 46.000000 <= N, total number of documents with field
... ... 0.908417 <= tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
... ... ... 5.000000 <= freq, occurrences of term within document
... ... ... 1.200000 <= k1, term saturation parameter
... ... ... 0.750000 <= b, length normalization parameter
... ... ... 472.000000 <= dl, length of field (approximate)
... ... ... 2081.565200 <= avgdl, average length of field

"
},
{
"title": "Enim Melior Tamen Utinam",
"created": "Thursday, April 11, 2024 - 02:06",
"type": "Product group three",
"search_api_relevance": "0.2507502",
"search_api_solr_score_debugging": " 0.250750 <= weight(tcedgem_X3b_en_body:eros in 10) [SchemaSimilarity], result of:
... 0.250750 <= score(freq=4.0), computed as boost * idf * tf from:
... ... 0.280615 <= idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
... ... ... 35.000000 <= n, number of documents containing term
... ... ... 46.000000 <= N, total number of documents with field
... ... 0.893574 <= tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
... ... ... 4.000000 <= freq, occurrences of term within document
... ... ... 1.200000 <= k1, term saturation parameter
... ... ... 0.750000 <= b, length normalization parameter
... ... ... 408.000000 <= dl, length of field (approximate)
... ... ... 2081.565200 <= avgdl, average length of field

"
},
{
"title": "Genitus Quae",
"created": "Tuesday, April 9, 2024 - 02:08",
"type": "Product group one",
"search_api_relevance": "0.2484465",
"search_api_solr_score_debugging": " 0.248446 <= weight(tcedgem_X3b_en_body:eros in 24) [SchemaSimilarity], result of:
... 0.248446 <= score(freq=4.0), computed as boost * idf * tf from:
... ... 0.280615 <= idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
... ... ... 35.000000 <= n, number of documents containing term
... ... ... 46.000000 <= N, total number of documents with field
... ... 0.885365 <= tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
... ... ... 4.000000 <= freq, occurrences of term within document
... ... ... 1.200000 <= k1, term saturation parameter
... ... ... 0.750000 <= b, length normalization parameter
... ... ... 504.000000 <= dl, length of field (approximate)
... ... ... 2081.565200 <= avgdl, average length of field

"
},
{
"title": "Enim Luptatum Usitas",
"created": "Saturday, April 13, 2024 - 05:20",
"type": "Product group one",
"search_api_relevance": "0.2484465",
"search_api_solr_score_debugging": " 0.248446 <= weight(tcedgem_X3b_en_body:eros in 38) [SchemaSimilarity], result of:
... 0.248446 <= score(freq=4.0), computed as boost * idf * tf from:
... ... 0.280615 <= idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
... ... ... 35.000000 <= n, number of documents containing term
... ... ... 46.000000 <= N, total number of documents with field
... ... 0.885365 <= tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
... ... ... 4.000000 <= freq, occurrences of term within document
... ... ... 1.200000 <= k1, term saturation parameter
... ... ... 0.750000 <= b, length normalization parameter
... ... ... 504.000000 <= dl, length of field (approximate)
... ... ... 2081.565200 <= avgdl, average length of field

"
}
]

Product group one can't have such low score as 5.255206 because it's boost factor is 21.

Also inside content type Product group one can't be older node "Tuesday, April 9, 2024 - 02:08" in front of more recent "Saturday, April 13, 2024 - 05:20"

I'm using:
Drupal 10.2.2
Solr 8.8
Search API Solr 4.3.3
Search API 8.x-1.34

But same issue is on:
9.5.11
Solr 8.8.2
Search API Solr 4.3.1
Search API 8.x-1.30

✨ Feature request
Status

Fixed

Version

4.0

Component

User interface

Created by

πŸ‡§πŸ‡¦Bosnia and Herzegovina drale01 Sarajevo

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @drale01
  • πŸ‡§πŸ‡¦Bosnia and Herzegovina drale01 Sarajevo
  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    It is way more readable if you format the scoring:

    0.221677 <= weight(tcedgem_X3b_en_body:eros in 0) [SchemaSimilarity], result of:
    ... 0.221677 <= score(freq=2.0), computed as boost * idf * tf from:
    ... ... 0.280615 <= idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
    ... ... ... 35.000000 <= n, number of documents containing term
    ... ... ... 46.000000 <= N, total number of documents with field
    ... ... 0.789968 <= tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
    ... ... ... 2.000000 <= freq, occurrences of term within document
    ... ... ... 1.200000 <= k1, term saturation parameter
    ... ... ... 0.750000 <= b, length normalization parameter
    ... ... ... 536.000000 <= dl, length of field (approximate)
    ... ... ... 2081.565200 <= avgdl, average length of field
    
     5.033529 <= weight(tcngramm_X3b_en_title:eros in 0) [SchemaSimilarity], result of:
    ... 5.033529 <= score(freq=1.0), computed as boost * idf * tf from:
    ... ... 2.000000 <= boost
    ... ... 3.444683 <= idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
    ... ... ... 1.000000 <= n, number of documents containing term
    ... ... ... 46.000000 <= N, total number of documents with field
    ... ... 0.730623 <= tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
    ... ... ... 1.000000 <= freq, occurrences of term within document
    ... ... ... 1.200000 <= k1, term saturation parameter
    ... ... ... 0.750000 <= b, length normalization parameter
    ... ... ... 3.000000 <= dl, length of field
    ... ... ... 39.304348 <= avgdl, average length of field
    

    I can't see that you queried any date.
    Can you post the the entire query sent to Solr?
    use search_api_solr_devel to get it.

  • πŸ‡§πŸ‡¦Bosnia and Herzegovina drale01 Sarajevo
  • πŸ‡§πŸ‡¦Bosnia and Herzegovina drale01 Sarajevo

    #mkalkbrenner

    I edited the json.

    I think this is query you need:
    https://www.drupal.org/files/issues/2024-05-28/solr_term_eros.txt β†’

  • πŸ‡§πŸ‡¦Bosnia and Herzegovina drale01 Sarajevo

    When I switch from Solr to Database server, Type-specific boosting works. So, perhaps bug is somewhere in Solr.

    But Boost more recent dates doesn’t work at all on both Servers.

  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ
    "fl":"ss_search_api_id,ss_search_api_language,score,hash",
    "fq":["+index_id:testing_solr_boost",
            "ss_search_api_language:(\"en\" \"und\")"],
    "q":"(tcedgem_X3b_en_body:(+\"eros\")^1 tcedgem_X3b_und_body:(+\"eros\")^1 tcngramm_X3b_en_title:(+\"eros\")^2 tcngramm_X3b_und_title:(+\"eros\")^2)",

    The created field is neither on the list of fields to be returned nor part of the query or filters.
    I agree that there's a difference i the behaviour compared to the database backend, even if the database backend is "cheating".
    Boost are not blindly added to each query. Only if the query has something to do with that field.

    We need to debug this step by step and then decide wether to fix something in the module or not.

    1. Try to add filter like create > 0 in the view.

    2. Enable retrieve results from Solr on the server edit page.

  • πŸ‡§πŸ‡¦Bosnia and Herzegovina drale01 Sarajevo

    These are results directly from Solr server edit page:

    https://www.drupal.org/files/issues/2024-05-29/solr_eros_query.txt β†’

    I tried to add filter create > 0 on view. But I'm getting the same results on REST json like from Solr server edit page:

    https://www.drupal.org/files/issues/2024-05-29/view_rest_eros.txt β†’

    I enabled "Retrieve result data from Solr":

    But results are same with or without "Retrieve result data from Solr".

    I also Inserted "body" field on view now.

  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    post the new devel debug output

  • πŸ‡§πŸ‡¦Bosnia and Herzegovina drale01 Sarajevo

    I tried all versions of Search_api_solr and Drupal 10, 9 and 8.8 and I never made Content type and Recent dates to be boosted.

  • πŸ‡§πŸ‡¦Bosnia and Herzegovina drale01 Sarajevo

    I have solution.

    I created custom module for boosting content types with event subscriber β†’ :

      public function preQuery(PreQueryEvent $event): void {
        $query = $event->getSearchApiQuery();
        $solarium_query = $event->getSolariumQuery();
    
        $boost_queries['product_group_one'] = [
        'query' => 'ss_type:product_group_one^22.0',
        ];
        $boost_queries['product_group_two'] = [
          'query' => 'ss_type:product_group_two^8.0',
        ];
        $boost_queries['product_group_three'] = [
          'query' => 'ss_type:product_group_three^30.0',
        ];
        $boost_queries['product_group_four'] = [
          'query' => 'ss_type:product_group_four^4.0',
        ];
    
        /** @var \Solarium\Component\EdisMax $edismax */
        $edismax = $solarium_query->getEDisMax();
    
        $edismax->addBoostQueries($boost_queries);
    
        $solarium_query->addParam('defType', 'edismax');
      }

    And I also used module Search API Solr Boost by Date β†’ for boosting recent dates.

    Both work for me.

  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    $solarium_query->addParam('defType', 'edismax');

    Good luck with that one. If it works for you, you could use it. But it has many side effects.

  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    BTW I consider to remove support for this manually set query parser as it causes a lot of issues. Usually edismax has to be set per field in case of multiple values, not for the entire query.

    I think I will start by writing warnings to the log.

  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    I verified the code again. The Processor adds a boost factor. But such boost factors only get applied to the query if the first sort criteria is score. It is named Search API Relevance in views.

    It works perfectly well.
    I checked the debug output in https://www.drupal.org/project/search_api_solr/issues/3450331#comment-15... πŸ’¬ Boosting Not Working On Content Type Based and Recent Dates Active gain. It contains no sort parameter and that must be your issue.

  • πŸ‡§πŸ‡¦Bosnia and Herzegovina drale01 Sarajevo

    You were right mkalkbrenner.

    When I'm using Search API Relevance for sorting I'm getting completely new scores.

    Boosting works with content type and recent dates.

    Thanks.

  • Assigned to mkalkbrenner
  • Status changed to Needs work 10 days ago
  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    Thanks for your feedback. We should add a note to the settings form about that.

  • Status changed to Fixed 9 days ago
  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ
Production build 0.69.0 2024