- Issue created by @Anybody
- 🇦🇹Austria drunken monkey Vienna, Austria
No, this isn’t yet possible, but yes, we can change this into a feature request for it.
Unfortunately, I don’t have time to work on this myself at the moment (or in the near future), but I’d definitely help reviewing if someone else wants to give it a try. - 🇩🇪Germany Anybody Porta Westfalica
Thank you very much for your reply @drunken monkey!
Could you give some feedback, if you think it would (not) make sense to simply expose the date fields to the existing number based boosting? At least if the values are timestamps (like created, changed, ...) that sounds relatively simple.
Some ideas what you think could make sense would be great (if you can say something about it without investing much time, of course). Just to ensure you like the way it goes...
- Status changed to Needs review
about 2 years ago 8:07pm 25 March 2023 - 🇦🇹Austria drunken monkey Vienna, Austria
You’re right, if that’s already a “good enough” solution for you, just treating dates as numbers would be pretty easy. It’s implemented in the attached patch – please test/review.
However, please note that this is a very rough mechanism, which will probably not work as expected for you: since dates are represented as the number of seconds since 1970, scores for dates in the same year will be very close to each other, so this will only have very limited effect. The proper way to boost recent content would be to add a dynamic boost based on not only the indexed date field value, but also the current date. This would, unfortunately, require support from the database backend. - 🇩🇪Germany Anybody Porta Westfalica
Thanks @drunken monkey for your feedback AND work on this incl. tests. Whao! :)
I'll review this and mark it RTBC if it works as expected (I think it will - mitigated by the limitations you mentioned).
Shall I create a follow-up feature request for a full-featured integration?
- 🇦🇹Austria drunken monkey Vienna, Austria
@Anybody: Any news on this? Would be great to get it committed.
Regarding the follow-up: I think we might as well leave this one open, right? But I’m fine either way. Very much depends on someone wanting to work on it, anyways – the issue will likely not be the main problem.
- 🇩🇪Germany Anybody Porta Westfalica
@drunken monkey sorry for the missing feedback! Too many projects in parallel, I sadly didn't have the time to try this yet. I'll reply as soon as possible. Sorry.
But I still like the plan and still think this makes a lot of sense.
- Status changed to RTBC
almost 2 years ago 10:27am 25 May 2023 - 🇩🇪Germany Anybody Porta Westfalica
Sorry for the delay @drunken monkey works fine for me so far, so RTBC! :) Thank you for the super helpful solution!
-
drunken monkey →
committed eb1431ae on 8.x-1.x
Issue #3347901 by drunken monkey: Added possibility for dates to be used...
-
drunken monkey →
committed eb1431ae on 8.x-1.x
- Status changed to Active
almost 2 years ago 3:42pm 27 May 2023 - 🇦🇹Austria drunken monkey Vienna, Austria
Good to hear, thanks for reporting back!
Merged.As mentioned above, leaving this open in case someone wants to work on a proper solution.
- 🇧🇪Belgium herved
Hi everyone, I'm trying to rank newest documents higher according to a date field.
This patch is interesting but after trying it on that date field it doesn't seem to behave as expected.This would be pretty easy with Solr but with this method, I wasn't able to apply a relevance boosting independently from keywords boosting. If the date field is present on all documents, and we search for some keywords, the boosting on the date doesn't seem to have any effect whatsoever since it seems to get merged with the keywords scoring/boosting.
Am I missing something? - 🇩🇪Germany Anybody Porta Westfalica
Okay I think this is still a highly relevant feature for non-solr Search API configuration
It's absolutely typical to boost more recent content, but the workaround here isn't a proper fix. With the available number based boosting options you're able to boost the timestamp, but the resulting boost value is something like:
1708939471 * 1.0 = Boost: 1708939471
1708939471 * 1.1 = Boost: 18798334181
and even
1708939471 * 0.1 = Boost: 170893947.1So I think we need a solution similar to what search_api_solr provides: https://www.drupal.org/docs/8/modules/search-api-solr/search-api-solr-ho... → for the "created" and "updated" fields. Any ideas? I'm sadly not very experienced in this topic :/
Hope it's okay to set this feature request to major, as it's a really typical and relevant use-case in my eyes.
- 🇧🇪Belgium herved
#13: this is what I attempted to explain in #12.
To me the patch from #5 now included is not really what most people would need because it is applied as a multiplier of the overall keyword score.
Therefore it is not independent from keywords, unlike in solr which gets added to the overall score, see https://git.drupalcode.org/project/search_api_solr/-/blob/4.x/src/Plugin...
Also, it has no effect if no keywords are given.And if we don't search for any keywords, this boosting has no effect at all and will be skipped in \Drupal\search_api_db\Plugin\search_api\backend\Database::createDbQuery
- 🇬🇧United Kingdom scott_euser
Yes it looks like we need a separate boost plugin that works at the query stage rather than the process stage so it can be relative to the current date. Then whatever the current score is, add in the date factor to cause decay to the score perhaps:
So for simplicity if current is
SELECT item_id, score
FROM...Then we could update this to
SELECT item_id, (score * 1 / (1 + TIMESTAMPDIFF(MONTH, FROM_UNIXTIME(value), CURDATE()) / 12)) AS score
FROM...This would result in approximately:
Current year: Score x 1 - no change
1 year ago: Score x 0.5
2 years ago: Score x 0.333
3 years ago: Score x 0.250
4 years ago: Score x 0.200
10 years ago: Score x 0.100
15 years ago: Score x 0.067Its not quite like the SOLR one but its a somewhat close approximation I think.
The 12 is the 'half-life' - ie, how long does the score take to divide in half. So the site builder could configure that, eg to make it divide in half in 60 months (5 years), or in 6 months, or whatever they want.
Any thoughts on that approach?
- 🇦🇹Austria drunken monkey Vienna, Austria
Yes, something like this seems like it would be the way to go. Instead of a processor, it could also be a Views query option (in the
SearchApiQuery
class). In any case, though, this will need to be declared as a feature → so backends can declare whether they support it or not.For backends supporting the feature, the processor (or the query plugin) could add an option to the search query containing both the used date field and the “half-life” value. The backend plugin would then need to check that option at search time and modify the result scores accordingly. (One additional hiccup being that, for the DB backend, we’d actually want to support all three DBMSs which I’d guess would all need a slightly different syntax for this.)
- 🇧🇪Belgium herved
Ideally it should also support future dates, resulting in a spike curve where the peak is NOW and decays in past or future.
In mysql at least, it looks like we can achieve the same formula as the one used in solr:
- solr:product(boost,recip(abs(ms(resolution,field_name)),m,a,b))
- mysql:boost * ( 1 / (m * ABS(TIMESTAMPDIFF(SECOND, FROM_UNIXTIME(field_name), NOW()) * 1000) + a) + b)
However I don't really know what would be the best to combine both numbers (keyword score that can be very high 1000-infinite), and this date boost. Maybe some kind normalization would be needed?
Something like: (score/(min_score+score)) where min_score = 1000
Then use a = 1, b= 0.1 producing a 0-1 date boost
and add both results?I noticed that when keywords are given, we may need to join the main index table as search_api_db produces a group_by query which also makes things quite complicated.