View recalculated with wrong data from cache

Created on 16 December 2022, over 1 year ago
Updated 24 February 2024, 4 months ago

Problem/Motivation

We noticed that some of our search_api views were not correctly invalidated when a node they are containing is edited.
This happens specifically when the page containing the view is reloaded after the node has been edited and before the Solr index has been updated.

What seems to happen is:

  1. The node is edited so the node:xx cache tag is invalidated.
  2. The page containing the view has the node:xx cache tag so it is invalidated in page cache.
  3. The view does not have this cache tag (it only has search_api_list:foo) so it is not invalidated in data cache.
  4. When the user reloads the page, the node is re-rendered but with outdated data from data cache.
  5. Then the index is updated, which invalidates the search_api_list:foo cache tag (which invalidates both the page and view cache, but not the node render cache).
  6. If the user reloads the page again, the view is recalculated but the node render is not (and still contains the outdated data from before).

This can only happen if nodes are not indexed immediately after they are updated (for example, if indexing is done with a cron job).

Steps to reproduce

  1. Create a search_api index with the index_directly setting disabled.
  2. Create a search_api view listing node view modes that returns at least one node.
  3. Edit the node title (and do not trigger an index update).
  4. Reload the page containing the view: you still see the old node title.
  5. Trigger an index update.
  6. Reload the page containing the view again: you still see the old node title.

Proposed resolution

One solution would be to add the node cache tags to the view data cache.
This way, the node render would be recreated with the correct data at the first reload.

This can be done by returning the entity cache tags in SearchApiQuery::getCacheTags() (similar to what Sql::getCacheTags() does.

🐛 Bug report
Status

Needs review

Version

1.0

Component

Views integration

Created by

🇫🇷France prudloff Lille

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks for suggesting this improvement, and sorry it took me a while to get back to you!

    Unfortunately, the intricacies of Views’ caching mechanism is still quite a mystery to me, and caching search results (especially in combination with Solr) is very challenging to begin with, so I fear I can’t be of much help here, but have to count on as many reviewers/testers/feedbackers as possible to help make an informed decision.

    However, it seems there has been a commit to the same code in the meantime in Views Integration - Consider compatibility with internal page cache Fixed . Does the problem still occur for you with the latest module version? If so, please re-base your MR and then post it as a patch – testing on issue forks unfortunately doesn’t work in this project, see #3190024: Problem with test dependencies when testing issue forks .

  • 🇫🇷France prudloff Lille

    I can still reproduce with search_api 1.29: the view cache does not have the node cache tags.
    Here is a reroll based on 1.29.

  • Status changed to Needs review about 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update about 1 year ago
    520 pass, 9 fail
  • The last submitted patch, 6: search_api-3327645-6.diff, failed testing. View results
    - codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.11 + Environment: PHP 8.2 & sqlite-3.34
    last update 6 months ago
    546 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.11 + Environment: PHP 8.2 & sqlite-3.34
    last update 6 months ago
    544 pass, 1 fail
  • 🇭🇺Hungary huzooka Hungary 🇭🇺🇪🇺

    My team also ran into an issue very similar to this; but our case was a bit more specific. Our indexes are configured to re-index items immediately, so we use post request indexing.

    We have an SAPI index view using VBO. We had to build a Views Bulk Operation (VBO) action which changes SAPI-indexed entities. Our (content manager) users are using this action on a SAPI index-based Views display. So, these users visit the views page (which is a form), pick some items from the views list, choose the action and submit the VBO form.

    What we've seen is that the Views display wasn't ever updated with the right data. Every other SAPI Views list using the same index seemed to be updated accordingly, but this admin view was stuck. The only solution which worked was to force a render cache invalidation.

    I've spent days trying to catch the moment when things go wrong. Now, I'm quite confident to say that the bug we ran into is indeed about wrong cache metadata.

    Steps to reproduce:

    1. SAPI index based view + VBO
    2. Index being configured to index items immediately.
    3. User performs a VBO action which changes indexed entities.
    4. After the form is submitted, Drupal displays the same view, changes are not visible on the current VBO Views page
    1. ...While at every other SAPI index view (using the same index), changes are visible;
    2. After dropping the render cache, the VBO Views page displays the expected (changed) data.

    This is what happened in the PHP threads in our case, after submitting the form, without any patches applied:

    If we refresh the SAPI VBO views page after these above:

    1. Request arrives
    2. HtmlRenderer starts rendering.
    3. SearchApiQuery is executed, because the index cache tag was invalidated. But: no views rows are re-rendered: render cache thinks they're all up-to-date!
    4. Rendered response is sent back to the client.

    In my opinion:

    1. If A.6 and A.7 happens before B.3 (so: indexing + SAPI cache tag invalidation happens before SAPI views row re-rendering), then there will be no cache "corruption"; everything will be up-to-date.
    2. Otherwise, our users see outdated views results; which are cached "infinite".

    Right now, users see the same (outdated) views rows on the VBO views form page until the render caches are rebuilt.

    I agree that the problem is about corrupted / wrong cache metadata; but in my honest opinion, we should solve the problem the other way around: every SAPI index-based Views row must also include the cache tag of the index the displayed data belongs to – because that is the source of the data we display in SAPI index based views!

    SearchApiQuery seems to be fine: in our case, we put our entities into different indexes, depending on the state of the entity (we have separate indexes for "published" and "unpublished" entities). If the action moves an item into the other index, the corresponding row gets removed from the view real-time.

    What we did:

    We simply added the corresponding search_api_list:<index-id> cache tag to the cache tags of the SAPI views row. After doing this, things started to work logically (for me): after performing the VBO action, the first response (still) returned outdated data; but after a page reload, I've seen the right results.

    Of course, in our case, we were able* to (and we still have to) work around things since we want to display the right results to our users even after the form submission; But I'm sure that the solution to this problem (or at least one part of the solution) is to add the index cache tag to the Views rows too.

    With the additional cache tag, the "backtrace" after form submission does not change:

    But after reloading the VBO page, we see a bit more things to happen: the corresponding Views row gets re-rendered too!

    1. Request arrives
    2. HtmlRenderer starts
    3. SearchApiQuery::execute starts
    4. SearchApiQuery::execute finished
    5. Changed SAPI views field gets re-rendered <- this is what did not happen before!
    6. HtmlRenderer finishes
    7. Response sent back to client

    What do you think about this fix?

    Attaching an old-school test-only, fix-only and complete patch, hoping that they will be tested by CI.

  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks a lot for your detailed analysis, @huzooka!

    What I still don’t understand, though, is how/why the indexed data would affect how individual result rows are rendered?
    Are you using Solr and have you configured it to retrieve the results field data from the server – is that the reason? Then I still don’t understand why this fails in the tests, to be honest. Seems I still don’t understand this problem fully, but mostly your description makes a lot of sense and really helps, so thanks again.

    If this is about using the fields returned from the server, then maybe we should just add the index’s list cache tag when adding such fields to the result row, not just do that always. With your solution, it would seem like this might unnecessarily impact performance for lots of other users.
    (It would, of course, be even better to have specific cache tags for individual search items getting indexed, to only place those on the result rows, but that might be a bit overkill for this issue. Would be even more accurate, though, I think.)

    In any case, one big extra-thanks for providing a regression test, that’s really awesome!

  • Status changed to Needs work 6 months ago
  • 🇭🇺Hungary huzooka Hungary 🇭🇺🇪🇺

    @drunken monkey, Yeah, you're totally right! I think I got so lost in the details that I lost focus too!

    Yes, we had this issue with the (SAPI) fields data fetched from the SAPI index. And your conclusion in regards of adding the cache tag to search API field plugins instead of adding it as the row's cache tag makes more sense.

    Moving to NW because of #11.

  • Status changed to Needs review 6 months ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.4 + Environment: PHP 8.2 & sqlite-3.34
    last update 6 months ago
    544 pass, 1 fail
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Then maybe something like this? (Or is there a better way for actually adding the cache tag? Seems there is no way to do it directly in SearchApiQuery::addResults().)

  • 🇫🇷France prudloff Lille

    I feel like #9 is reporting a different issue than ours.
    We don't return fields from Solr and our problem is not about adding cache tags to those fields, but about adding returned node cache tags to the view.

  • Status changed to Needs review 5 months ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.4 + Environment: PHP 8.2 & sqlite-3.34
    last update 5 months ago
    540 pass, 3 fail
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Then maybe a combination of the two?

  • Status changed to Needs review 5 months ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.4 + Environment: PHP 8.2 & sqlite-3.34
    last update 5 months ago
    546 pass
  • 🇦🇹Austria drunken monkey Vienna, Austria

    The test fails mostly look like they’d be expected now, so I’m adapting the tests.

    More importantly, though, it seems like we are now accidentally half-way to fixing Add a third Views cache plugin (tag- and time-based) Needs work : we were adding the individual nodes’ cache tags, but still not the search index list cache tag.

    As discussed in the other issue, it seems I have always misunderstood the purpose of the time-based cache plugin: it doesn’t invalidate the cache less often than the tag-based one (only based on the time, not on tags) but more often (based on time and tags). Therefore, our current implementation seems incorrect.
    I have therefore now corrected our time-based cache plugin to also include the search index list cache tag.

    However, since this does change behavior that no-one has complained about in over six years (I think?), I’m not entirely sure we should really change it. Instead, as suggested in the other issue, we could also provide a new cache plugin with this functionality. The key question is what users expect – whether they are really happy with the current behavior (time-based cache only invalidating based on time) or whether they just never noticed it and maybe (if they are savvy enough) just automatically assumed it would work like Core’s time-based cache plugin.
    Feedback on this very much welcome!

    What I also wondered was whether this change would add unnecessary cache tags to views that don’t load any entities. However, looking at SearchApiQuery::getAllEntities() it seems pretty clear that only cache tags of entities that were actually loaded will be included, so this should work as expected.

  • 🇳🇱Netherlands ekes

    Replying from Slack:

    I for sure can think of a few sites, date based things, where it would be helpful if the option was there to invalidate the cache if the list is updated and 'once a day' (or whatever fixed time period).

  • 🇩🇪Germany mkalkbrenner 🇩🇪

    The initial issue description is about Solr.

    It is important to understand that any drupal internal tag based cache invalidation isn't suitable for an "external" system, where you don't have control over when the data gets indexed and when it will be delivered as result (commit strategy, invalidation of searches, etc). This is the case for Solr, Elastic, file systems (especially NFS), Redis, ...

    To be concrete, after a node is edited, you would need to track when the search backend delivers that change before you invalidate the cache.

    I wrote some lines about that in Search API Solr's README:

    In general it's recommended to disable the Views cache. By default the Solr
    search index is updated asynchronously from Drupal, and this interferes with the
    Views cache. Having the cache enabled will cause stale results to be shown, and
    new content might not show up at all.

    In case you really need caching (for example because you are showing some search
    results on your front page) then you use the 'Search API (time based)' cache
    plugin. This will make sure the cache is cleared at certain time intervals, so
    your results will remain relevant. This can work well for views that have no
    exposed filters and are set up by site administrators.

    Since 8.x-2.0 in combination with Solr 6.6 or higher you can also use the
    'Search API (tag based)' cache. But in this case you need to ensure that you
    enable "Finalize index before first search" and "Wait for commit after last
    finalization" in the "Solr specific index options".

    But be aware that this will slow down the first search after any modification to
    an index. So you have to choose if no caching or tag based caching in
    combination with finalization is the better solution for your setup.
    The decision depends on how frequent index modification happen or how expensive
    your queries are.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.4 + Environment: PHP 8.2 & sqlite-3.34
    last update 5 months ago
    547 pass
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks a lot again for weighing in here, Markus!
    Then it’s settled, time-based caching needs to stay as-is and we’d have to add a third cache plugin to replicate the behavior of Core’s built-in “Time-based” cache plugin (which is really tag- and time-based).

    The attached patch revision changes the behavior and tests accordingly. I think in SearchApiTimeCache::getCacheTags() instead of calling the parent and then removing some of the tags from there it makes more sense to just specifically return the two cache tags we want to include here.
    Please test/review.

    @ekes: In this case, please go to Add a third Views cache plugin (tag- and time-based) Needs work and declare your support there. Seems it’s a separate issue then, since we’ll leave the existing time-based caching as-is.

  • 🇦🇹Austria drunken monkey Vienna, Austria

    Would really appreciate some tests/reviews of this before merging, as I’m loath to change anything regarding caching and/or Views that wasn’t vetted by others.

Production build 0.69.0 2024