- Issue created by @smokris
- Issue was unassigned.
- Status changed to Needs review
over 1 year ago 6:58pm 4 March 2023 The last submitted patch, 2: 3345999-bigram.patch, failed testing. View results →
- 🇺🇸United States smokris Athens, Ohio, USA
Revised patch to address failure in
testAutocompletion
. - 🇦🇹Austria drunken monkey Vienna, Austria
Thanks a lot, this all sounds reasonable to me.
Had to think a bit about thecleanNumericString()
special case, but yes: When a numeric string becomes empty after stripping minuses and zeroes, then I guess0
really is the only possibly correct token, it should never be empty.
I just added a reference to thecleanNumericString()
method into its test method, as I try to always do that when accessing something via reflection. That way, it’s still possible to find all usages using an IDE.Now, I think only this is missing:
Yes, optional makes sense to me. Should it be enabled or disabled by default?
Hm, since it will bloat the database quite a bit, I guess we’d better keep it disabled by default? Or just disabled for existing indexes, but enabled for new ones, since in the latter case the users have a higher chance of seeing the option?
What do you think? Will users appreciate it more if phrase search “just works” (as most people would expect), or if we use their database storage sparingly? (Now that I write it like that, it more sounds like the former to me … Please just pick the option that makes most sense to you, and we’ll hope for the best.) - 🇺🇸United States tedfordgif
Although Solr is the primary backend we use, I find myself considering the db backend for specific use cases when Solr is not available, sometimes on large sites. I like the option of not touching existing indexes, and enabling it by default for new indexes with a sufficiently noticeable warning. Thanks for your work on this!
- Status changed to Needs work
over 1 year ago 9:38pm 26 March 2023 - 🇦🇹Austria drunken monkey Vienna, Austria
NW for the option.
Are you going to work on that, smokris? Otherwise, I can also put it on my todo list, but no guarantees on when I’ll get to it. - Status changed to Needs review
over 1 year ago 6:20pm 29 March 2023 - 🇺🇸United States smokris Athens, Ohio, USA
Will users appreciate it more if phrase search “just works” (as most people would expect), or if we use their database storage sparingly?
For new installations, enabling it by default sounds great.
For existing installations, if this change will be included in an upcoming 1.x minor-version release, I think it might be disruptive/surprising if it were automatically enabled and suddenly the site's index size grew significantly. (Though the upcoming 2.0 major-version release ( #3126115: Create a 2.0.0 release → ) might be a suitable time to automatically enable it for existing installations too?)
I've attached an updated patch that includes a first draft of the UI for toggling support for phrase indexing/searching, and a screenshot of it with the new option highlighted in pink. It's enabled by default for new installations, and for existing installations I added a
hook_update_N
implementation that disables it. - last update
over 1 year ago 535 pass - 🇦🇹Austria drunken monkey Vienna, Austria
Thanks a lot for your continued work, looks great!
+++ b/modules/search_api_db/src/Plugin/search_api/backend/Database.php @@ -580,6 +592,11 @@ + 'info' => ucfirst($this->configuration['phrase']),
This would fail to be translated on non-English websites. However, in this specific case, it should be fine to just wrap the expression in
$this->t()
(since we those values are already passed as literals in another place).Otherwise, this patch is already very good, there were only a few small details I changed. Please see the attached revision.
Most notably, I changed the description for the “Phrase indexing” option to be split into one description for the setting itself, and separate descriptions for the individual options. It now reads as follows:
Phrase indexing
Choose how phrase queries (queries with double quotes) should be handled. Better matching comes at the price of slower indexing and larger database size.- Disabled: Double-quoted phrases in search queries will be expanded into individual words, so searches will include results that contain those keywords but not necessarily consecutively and not necessarily in the same order. (This is the standard behavior of version 8.x-1.29 and earlier.)
- Bigram: Pairs of consecutive words will be included in the index, so double-quoted searches will only include results that contain those keywords consecutively and in the same order. Warning: this will increase the database index size by about 5x, and will increase indexing time by about 2x.
I’m not convinced that this is the best we can do, but it’s an improvement. Unfortunately, I’m rather bad at writing such descriptions in a way that make them understandable to laypeople.
This would be an alternative (just for the two options) – feedback appreciated on which you think would be better:<ul> <li><strong>Disabled:</strong> This will ignore quoted phrase in search keywords and just look for the individual words separately. For instance, <em>"blue house"</em> would match any item that contains the words “blue” and “house” somewhere in its indexed text, no matter where and in what order.</li> <li><strong>Bigram:</strong> This treats quoted phrases in keywords like it is generally expected, matching only items that contain those words in the exact same order, consecutively. For instance, <em>"blue house"</em> would match only items that contain the word “blue” immediately followed by the word “house”. Please make sure that the associated increase in database size (about 5x) and indexing time (about 2x) is acceptable for your site.</li> </ul>
Any combination of the two, or completely new suggestions, are of course also very welcome!
- 🇩🇰Denmark ressa Copenhagen
This would be an awesome feature in Search API. And I agree, it's something the users expect, so thanks a lot for working on it!
I just tried the latest patch, and it works perfectly.
+1 for enabling it by default in new installations.
About the wording, it is difficult to formulate these concepts, but I think you succeeded with the #2 version, which makes sense to me. Here is a new suggestion, with some slight tweaks:
Disabled: Ignore quotes in searches and just look for the individual words separately. For instance, "blue house" matches any item that contains the words "blue" and "house" somewhere in its indexed text, no matter where and in what order.
Bigram: Treat a quoted phrase in a search like it is generally expected [...] (the rest is great!)
- 🇺🇸United States aharown07
I have applied the patch to a dev site with good success in results so far, with Bigram enabled.
The test site has just over 23,000 items indexed.Seems to work really well with facets and everything as expected.
To echo ressa, thanks for your work on this. I hope it makes it into a future release. Also makes sense to me to enable by default in installs.
- last update
over 1 year ago 535 pass - 🇦🇹Austria drunken monkey Vienna, Austria
Thanks a lot for both your feedback!
Sounds very good so far. I’m just gonna leave it open a few more days to maybe get another review, but then I thinwe can merge this.Attached is a patch with the adapted descriptions suggested in #15, thanks a lot for those as well!
-
drunken monkey →
committed a834e7b3 on 8.x-1.x authored by
smokris →
Issue #3345999 by smokris, drunken monkey, ressa: Added support for...
-
drunken monkey →
committed a834e7b3 on 8.x-1.x authored by
smokris →
- Status changed to Fixed
over 1 year ago 5:25pm 27 May 2023 - 🇦🇹Austria drunken monkey Vienna, Austria
OK, merged.
Thanks again, everyone – especially, thanks a lot, @smokris! - 🇩🇰Denmark ressa Copenhagen
So cool, this is a great feature to have. Thanks @drunken monkey and @smokris!
Automatically closed - issue fixed for 2 weeks with no activity.
- Status changed to Fixed
about 1 year ago 2:11pm 21 September 2023 - 🇺🇸United States ja09
Excellent feature, thanks everyone. One possible addition that could be very useful, in addition to Bigram, would be "Required". This would default to phrase searching but without the user needing to enter the quotes. This would be useful for specific/technical searches, names of places/locations, names of people, leading zeros, etc.
Not sure about the performance ramifications of doing this and if another index would be needed. Seems like it could be done within the Bigram index.
- 🇨🇭Switzerland berdir Switzerland
The release notes are a bit confusing, they say the new feature is disabled by default, but that's not correct, for new servers, it's enabled by default, only existing servers/sites have it disabled through the update function.
That matters with default config in install profiles for example, if you don't explicitly provide the setting, it will default to bigram.