- 🇦🇹Austria drunken monkey Vienna, Austria
Thanks for suggesting this improvement, and sorry it took me a while to get back to you!
Unfortunately, the intricacies of Views’ caching mechanism is still quite a mystery to me, and caching search results (especially in combination with Solr) is very challenging to begin with, so I fear I can’t be of much help here, but have to count on as many reviewers/testers/feedbackers as possible to help make an informed decision.
However, it seems there has been a commit to the same code in the meantime in ✨ Views Integration - Consider compatibility with internal page cache Fixed . Does the problem still occur for you with the latest module version? If so, please re-base your MR and then post it as a patch – testing on issue forks unfortunately doesn’t work in this project, see #3190024: Problem with test dependencies when testing issue forks → .
- 🇫🇷France prudloff Lille
I can still reproduce with search_api 1.29: the view cache does not have the node cache tags.
Here is a reroll based on 1.29. - Status changed to Needs review
over 1 year ago 8:55am 15 May 2023 - last update
over 1 year ago 520 pass, 9 fail The last submitted patch, 6: search_api-3327645-6.diff, failed testing. View results →
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.- last update
12 months ago 546 pass - last update
12 months ago 544 pass, 1 fail - 🇭🇺Hungary huzooka Hungary 🇭🇺🇪🇺
My team also ran into an issue very similar to this; but our case was a bit more specific. Our indexes are configured to re-index items immediately, so we use post request indexing.
We have an SAPI index view using VBO. We had to build a Views Bulk Operation (VBO) action which changes SAPI-indexed entities. Our (content manager) users are using this action on a SAPI index-based Views display. So, these users visit the views page (which is a form), pick some items from the views list, choose the action and submit the VBO form.
What we've seen is that the Views display wasn't ever updated with the right data. Every other SAPI Views list using the same index seemed to be updated accordingly, but this admin view was stuck. The only solution which worked was to force a render cache invalidation.
I've spent days trying to catch the moment when things go wrong. Now, I'm quite confident to say that the bug we ran into is indeed about wrong cache metadata.
Steps to reproduce:
- SAPI index based view + VBO
- Index being configured to index items immediately.
- User performs a VBO action which changes indexed entities.
- After the form is submitted, Drupal displays the same view, changes are not visible on the current VBO Views page
- ...While at every other SAPI index view (using the same index), changes are visible;
- After dropping the render cache, the VBO Views page displays the expected (changed) data.
This is what happened in the PHP threads in our case, after submitting the form, without any patches applied:
If we refresh the SAPI VBO views page after these above:
- Request arrives
- HtmlRenderer starts rendering.
- SearchApiQuery is executed, because the index cache tag was invalidated. But: no views rows are re-rendered: render cache thinks they're all up-to-date!
- Rendered response is sent back to the client.
In my opinion:
- If A.6 and A.7 happens before B.3 (so: indexing + SAPI cache tag invalidation happens before SAPI views row re-rendering), then there will be no cache "corruption"; everything will be up-to-date.
- Otherwise, our users see outdated views results; which are cached "infinite".
Right now, users see the same (outdated) views rows on the VBO views form page until the render caches are rebuilt.
I agree that the problem is about corrupted / wrong cache metadata; but in my honest opinion, we should solve the problem the other way around: every SAPI index-based Views row must also include the cache tag of the index the displayed data belongs to – because that is the source of the data we display in SAPI index based views!
SearchApiQuery seems to be fine: in our case, we put our entities into different indexes, depending on the state of the entity (we have separate indexes for "published" and "unpublished" entities). If the action moves an item into the other index, the corresponding row gets removed from the view real-time.
What we did:We simply added the corresponding
search_api_list:<index-id>
cache tag to the cache tags of the SAPI views row. After doing this, things started to work logically (for me): after performing the VBO action, the first response (still) returned outdated data; but after a page reload, I've seen the right results.Of course, in our case, we were able* to (and we still have to) work around things since we want to display the right results to our users even after the form submission; But I'm sure that the solution to this problem (or at least one part of the solution) is to add the index cache tag to the Views rows too.
With the additional cache tag, the "backtrace" after form submission does not change:
But after reloading the VBO page, we see a bit more things to happen: the corresponding Views row gets re-rendered too!
- Request arrives
- HtmlRenderer starts
- SearchApiQuery::execute starts
- SearchApiQuery::execute finished
- Changed SAPI views field gets re-rendered <- this is what did not happen before!
- HtmlRenderer finishes
- Response sent back to client
What do you think about this fix?
Attaching an old-school test-only, fix-only and complete patch, hoping that they will be tested by CI.
The last submitted patch, 9: sapi-views_cache_metadata-3327645-9--test-only.patch, failed testing. View results →
- 🇦🇹Austria drunken monkey Vienna, Austria
Thanks a lot for your detailed analysis, @huzooka!
What I still don’t understand, though, is how/why the indexed data would affect how individual result rows are rendered?
Are you using Solr and have you configured it to retrieve the results field data from the server – is that the reason? Then I still don’t understand why this fails in the tests, to be honest. Seems I still don’t understand this problem fully, but mostly your description makes a lot of sense and really helps, so thanks again.If this is about using the fields returned from the server, then maybe we should just add the index’s list cache tag when adding such fields to the result row, not just do that always. With your solution, it would seem like this might unnecessarily impact performance for lots of other users.
(It would, of course, be even better to have specific cache tags for individual search items getting indexed, to only place those on the result rows, but that might be a bit overkill for this issue. Would be even more accurate, though, I think.)In any case, one big extra-thanks for providing a regression test, that’s really awesome!
- Status changed to Needs work
12 months ago 6:01pm 8 January 2024 - 🇭🇺Hungary huzooka Hungary 🇭🇺🇪🇺
@drunken monkey, Yeah, you're totally right! I think I got so lost in the details that I lost focus too!
Yes, we had this issue with the (SAPI) fields data fetched from the SAPI index. And your conclusion in regards of adding the cache tag to search API field plugins instead of adding it as the row's cache tag makes more sense.
Moving to NW because of #11.
- Status changed to Needs review
11 months ago 12:31pm 14 January 2024 - last update
11 months ago 544 pass, 1 fail - 🇦🇹Austria drunken monkey Vienna, Austria
Then maybe something like this? (Or is there a better way for actually adding the cache tag? Seems there is no way to do it directly in
SearchApiQuery::addResults()
.) The last submitted patch, 13: 3327645-13--missing_index_list_cache_tag.patch, failed testing. View results →
- 🇫🇷France prudloff Lille
I feel like #9 is reporting a different issue than ours.
We don't return fields from Solr and our problem is not about adding cache tags to those fields, but about adding returned node cache tags to the view. - Status changed to Needs review
11 months ago 11:14am 2 February 2024 - last update
11 months ago 540 pass, 3 fail - 🇦🇹Austria drunken monkey Vienna, Austria
Then maybe a combination of the two?
The last submitted patch, 16: 3327645-16--missing_index_list_cache_tag.patch, failed testing. View results →
- Status changed to Needs review
11 months ago 4:59pm 6 February 2024 - last update
11 months ago 546 pass - 🇦🇹Austria drunken monkey Vienna, Austria
The test fails mostly look like they’d be expected now, so I’m adapting the tests.
More importantly, though, it seems like we are now accidentally half-way to fixing ✨ Add a third Views cache plugin (tag- and time-based) Needs work : we were adding the individual nodes’ cache tags, but still not the search index list cache tag.
As discussed in the other issue, it seems I have always misunderstood the purpose of the time-based cache plugin: it doesn’t invalidate the cache less often than the tag-based one (only based on the time, not on tags) but more often (based on time and tags). Therefore, our current implementation seems incorrect.
I have therefore now corrected our time-based cache plugin to also include the search index list cache tag.However, since this does change behavior that no-one has complained about in over six years (I think?), I’m not entirely sure we should really change it. Instead, as suggested in the other issue, we could also provide a new cache plugin with this functionality. The key question is what users expect – whether they are really happy with the current behavior (time-based cache only invalidating based on time) or whether they just never noticed it and maybe (if they are savvy enough) just automatically assumed it would work like Core’s time-based cache plugin.
Feedback on this very much welcome!What I also wondered was whether this change would add unnecessary cache tags to views that don’t load any entities. However, looking at
SearchApiQuery::getAllEntities()
it seems pretty clear that only cache tags of entities that were actually loaded will be included, so this should work as expected. - 🇳🇱Netherlands ekes
Replying from Slack:
I for sure can think of a few sites, date based things, where it would be helpful if the option was there to invalidate the cache if the list is updated and 'once a day' (or whatever fixed time period).
- 🇩🇪Germany mkalkbrenner 🇩🇪
The initial issue description is about Solr.
It is important to understand that any drupal internal tag based cache invalidation isn't suitable for an "external" system, where you don't have control over when the data gets indexed and when it will be delivered as result (commit strategy, invalidation of searches, etc). This is the case for Solr, Elastic, file systems (especially NFS), Redis, ...
To be concrete, after a node is edited, you would need to track when the search backend delivers that change before you invalidate the cache.
I wrote some lines about that in Search API Solr's README:
In general it's recommended to disable the Views cache. By default the Solr
search index is updated asynchronously from Drupal, and this interferes with the
Views cache. Having the cache enabled will cause stale results to be shown, and
new content might not show up at all.In case you really need caching (for example because you are showing some search
results on your front page) then you use the 'Search API (time based)' cache
plugin. This will make sure the cache is cleared at certain time intervals, so
your results will remain relevant. This can work well for views that have no
exposed filters and are set up by site administrators.Since 8.x-2.0 in combination with Solr 6.6 or higher you can also use the
'Search API (tag based)' cache. But in this case you need to ensure that you
enable "Finalize index before first search" and "Wait for commit after last
finalization" in the "Solr specific index options".But be aware that this will slow down the first search after any modification to
an index. So you have to choose if no caching or tag based caching in
combination with finalization is the better solution for your setup.
The decision depends on how frequent index modification happen or how expensive
your queries are. - last update
10 months ago 547 pass - 🇦🇹Austria drunken monkey Vienna, Austria
Thanks a lot again for weighing in here, Markus!
Then it’s settled, time-based caching needs to stay as-is and we’d have to add a third cache plugin to replicate the behavior of Core’s built-in “Time-based” cache plugin (which is really tag- and time-based).The attached patch revision changes the behavior and tests accordingly. I think in
SearchApiTimeCache::getCacheTags()
instead of calling the parent and then removing some of the tags from there it makes more sense to just specifically return the two cache tags we want to include here.
Please test/review.@ekes: In this case, please go to ✨ Add a third Views cache plugin (tag- and time-based) Needs work and declare your support there. Seems it’s a separate issue then, since we’ll leave the existing time-based caching as-is.
- 🇦🇹Austria drunken monkey Vienna, Austria
Would really appreciate some tests/reviews of this before merging, as I’m loath to change anything regarding caching and/or Views that wasn’t vetted by others.