How to boost more recent content with database backend?

Created on 14 March 2023, about 2 years ago

Problem/Motivation

I need to boost content by its created and last changed date. Created is more important than last changed.
Sadly, I could neither find an option for this, nor documentation.

I looked into

I found

Processors: Boost more recent dates: A processor to boost more recent dates.

at https://www.drupal.org/docs/8/modules/search-api-solr/search-api-solr-ho... but it seems to be only existing for search_api_solr? For search_api_db I can't find something like that.

Furthermore I added the created and changed date as indexed field and though perhaps I could use Number field-based boosting for these fields, as they are timestamps. But these fields do not appear (otherwise that might be a nice solution). Just the uid and the "created by" field appears as boostable.

I also found https://drupal.stackexchange.com/questions/107023/how-to-boost-relevance... but think it shouldn't need custom implementations, as more complicated boostings are already possible in the UI.

So what am I missing here?
If this isn't available yet, could this please be changed into a Feature request?

Steps to reproduce

Proposed resolution

Remaining tasks

💬 Support request
Status

Active

Version

1.0

Component

Database backend

Created by

🇩🇪Germany Anybody Porta Westfalica

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @Anybody
  • 🇦🇹Austria drunken monkey Vienna, Austria

    No, this isn’t yet possible, but yes, we can change this into a feature request for it.
    Unfortunately, I don’t have time to work on this myself at the moment (or in the near future), but I’d definitely help reviewing if someone else wants to give it a try.

  • 🇩🇪Germany Anybody Porta Westfalica

    Thank you very much for your reply @drunken monkey!

    Could you give some feedback, if you think it would (not) make sense to simply expose the date fields to the existing number based boosting? At least if the values are timestamps (like created, changed, ...) that sounds relatively simple.

    Some ideas what you think could make sense would be great (if you can say something about it without investing much time, of course). Just to ensure you like the way it goes...

  • 🇨🇦Canada ryanrobinson_wlu

    +1 for this feature request

  • Status changed to Needs review about 2 years ago
  • 🇦🇹Austria drunken monkey Vienna, Austria

    You’re right, if that’s already a “good enough” solution for you, just treating dates as numbers would be pretty easy. It’s implemented in the attached patch – please test/review.
    However, please note that this is a very rough mechanism, which will probably not work as expected for you: since dates are represented as the number of seconds since 1970, scores for dates in the same year will be very close to each other, so this will only have very limited effect. The proper way to boost recent content would be to add a dynamic boost based on not only the indexed date field value, but also the current date. This would, unfortunately, require support from the database backend.

  • 🇩🇪Germany Anybody Porta Westfalica

    Thanks @drunken monkey for your feedback AND work on this incl. tests. Whao! :)

    I'll review this and mark it RTBC if it works as expected (I think it will - mitigated by the limitations you mentioned).

    Shall I create a follow-up feature request for a full-featured integration?

  • 🇦🇹Austria drunken monkey Vienna, Austria

    @Anybody: Any news on this? Would be great to get it committed.

    Regarding the follow-up: I think we might as well leave this one open, right? But I’m fine either way. Very much depends on someone wanting to work on it, anyways – the issue will likely not be the main problem.

  • 🇩🇪Germany Anybody Porta Westfalica

    @drunken monkey sorry for the missing feedback! Too many projects in parallel, I sadly didn't have the time to try this yet. I'll reply as soon as possible. Sorry.

    But I still like the plan and still think this makes a lot of sense.

  • Status changed to RTBC almost 2 years ago
  • 🇩🇪Germany Anybody Porta Westfalica

    Sorry for the delay @drunken monkey works fine for me so far, so RTBC! :) Thank you for the super helpful solution!

  • Status changed to Active almost 2 years ago
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Good to hear, thanks for reporting back!
    Merged.

    As mentioned above, leaving this open in case someone wants to work on a proper solution.

  • 🇧🇪Belgium herved

    Hi everyone, I'm trying to rank newest documents higher according to a date field.
    This patch is interesting but after trying it on that date field it doesn't seem to behave as expected.

    This would be pretty easy with Solr but with this method, I wasn't able to apply a relevance boosting independently from keywords boosting. If the date field is present on all documents, and we search for some keywords, the boosting on the date doesn't seem to have any effect whatsoever since it seems to get merged with the keywords scoring/boosting.
    Am I missing something?

  • 🇩🇪Germany Anybody Porta Westfalica

    Okay I think this is still a highly relevant feature for non-solr Search API configuration

    It's absolutely typical to boost more recent content, but the workaround here isn't a proper fix. With the available number based boosting options you're able to boost the timestamp, but the resulting boost value is something like:

    1708939471 * 1.0 = Boost: 1708939471
    1708939471 * 1.1 = Boost: 18798334181
    and even
    1708939471 * 0.1 = Boost: 170893947.1

    So I think we need a solution similar to what search_api_solr provides: https://www.drupal.org/docs/8/modules/search-api-solr/search-api-solr-ho... for the "created" and "updated" fields. Any ideas? I'm sadly not very experienced in this topic :/

    Hope it's okay to set this feature request to major, as it's a really typical and relevant use-case in my eyes.

  • 🇩🇪Germany Anybody Porta Westfalica
  • 🇧🇪Belgium herved

    #13: this is what I attempted to explain in #12.
    To me the patch from #5 now included is not really what most people would need because it is applied as a multiplier of the overall keyword score.
    Therefore it is not independent from keywords, unlike in solr which gets added to the overall score, see https://git.drupalcode.org/project/search_api_solr/-/blob/4.x/src/Plugin...
    Also, it has no effect if no keywords are given.

    And if we don't search for any keywords, this boosting has no effect at all and will be skipped in \Drupal\search_api_db\Plugin\search_api\backend\Database::createDbQuery

  • 🇬🇧United Kingdom scott_euser

    Yes it looks like we need a separate boost plugin that works at the query stage rather than the process stage so it can be relative to the current date. Then whatever the current score is, add in the date factor to cause decay to the score perhaps:

    So for simplicity if current is
    SELECT item_id, score
    FROM...

    Then we could update this to
    SELECT item_id, (score * 1 / (1 + TIMESTAMPDIFF(MONTH, FROM_UNIXTIME(value), CURDATE()) / 12)) AS score
    FROM...

    This would result in approximately:
    Current year: Score x 1 - no change
    1 year ago: Score x 0.5
    2 years ago: Score x 0.333
    3 years ago: Score x 0.250
    4 years ago: Score x 0.200
    10 years ago: Score x 0.100
    15 years ago: Score x 0.067

    Its not quite like the SOLR one but its a somewhat close approximation I think.

    The 12 is the 'half-life' - ie, how long does the score take to divide in half. So the site builder could configure that, eg to make it divide in half in 60 months (5 years), or in 6 months, or whatever they want.

    Any thoughts on that approach?

  • 🇦🇹Austria drunken monkey Vienna, Austria

    Yes, something like this seems like it would be the way to go. Instead of a processor, it could also be a Views query option (in the SearchApiQuery class). In any case, though, this will need to be declared as a feature so backends can declare whether they support it or not.

    For backends supporting the feature, the processor (or the query plugin) could add an option to the search query containing both the used date field and the “half-life” value. The backend plugin would then need to check that option at search time and modify the result scores accordingly. (One additional hiccup being that, for the DB backend, we’d actually want to support all three DBMSs which I’d guess would all need a slightly different syntax for this.)

  • 🇧🇪Belgium herved

    Ideally it should also support future dates, resulting in a spike curve where the peak is NOW and decays in past or future.
    In mysql at least, it looks like we can achieve the same formula as the one used in solr:
    - solr: product(boost,recip(abs(ms(resolution,field_name)),m,a,b))
    - mysql: boost * ( 1 / (m * ABS(TIMESTAMPDIFF(SECOND, FROM_UNIXTIME(field_name), NOW()) * 1000) + a) + b)

    However I don't really know what would be the best to combine both numbers (keyword score that can be very high 1000-infinite), and this date boost. Maybe some kind normalization would be needed?
    Something like: (score/(min_score+score)) where min_score = 1000
    Then use a = 1, b= 0.1 producing a 0-1 date boost
    and add both results?

    I noticed that when keywords are given, we may need to join the main index table as search_api_db produces a group_by query which also makes things quite complicated.

Production build 0.71.5 2024