APCu returns outdated configs

Created on 16 November 2023, over 1 year ago

Problem/Motivation

cache.config uses the chained_fast backend by default, which leverages APCu. In case of running multiple web servers in parallel, each of them has its own APCu. If a config gets updated on one server, the outdated cahe entries of that config remain valid on the other servers.

Steps to reproduce

Setup drupal with multiple webservers.
The webforms contrib module stores webforms as config entities. Create a webform and ensure to open it via every server.
Now modify the webform on one server and save it. The other servers will still deliver the old invalid webform until all caches get cleared.

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

πŸ› Bug report
Status

Active

Version

10.1 ✨

Component
CacheΒ  β†’

Last updated 12 days ago

Created by

πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @mkalkbrenner
  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ
  • Status changed to Needs review over 1 year ago
  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ
  • last update over 1 year ago
    Custom Commands Failed
  • Status changed to Needs work over 1 year ago
  • The Needs Review Queue Bot β†’ tested this issue.

    While you are making the above changes, we recommend that you convert this patch to a merge request β†’ . Merge requests are preferred over patches. Be sure to hide the old patch files as well. (Converting an issue to a merge request without other contributions to the issue will not receive credit.)

  • Status changed to Needs review over 1 year ago
  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    fixed whitespace

  • last update over 1 year ago
    29,665 pass, 3 fail
  • Status changed to Needs work over 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Recommended to use MRs now as patches are being phased out.

    As a bug will need a test case showing the problem also.

    Thanks

  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    As this involves multiple web servers with individual apcu caches this will likely be hard/impossible to write an automated test for.

    But I'm amazed this hasn't been spotted before if this is a bug on all config objects?

  • πŸ‡ΊπŸ‡ΈUnited States luke.leber Pennsylvania

    We've seen all manner of random, inexplicable weirdness with multi-web-server setups in Acquia Cloud Enterprise. We've blamed Memcache primarily, but this could be equally as likely to toss monkey wrenches around if it can be reproduced.

  • Status changed to Postponed: needs info over 1 year ago
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    This was discussed quite a bit in slack.

    The fix is definitely not correct, and a) *should* not do anything on 10.1 and lower as cache tags are stripped from the fast backend and b) causes a severe performance regression on 10.2 where cache tags are kept, and then each fast lookup would need an extra lookup against the cache tag invalidation service.

    We don't do 1:1 cache tags that are identical to the cache key, just like entity storage caches don't use the cache tag either.

    Please do _not_ use this patch :)

    My only idea is that something is wrong with the setup that causes the fast chained backend to not work as expected.

    Memcache: AFAIK the race condition that was fixed in core/database and redis around cache tag invalidation during database transactions was never fixed in Memcache, so I'd absolutely expect random issues there.

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    So think this one should be closed?

  • πŸ‡¬πŸ‡§United Kingdom catch

    If the consistent backend is memcache, it is more likely to be πŸ› Transaction support for cache (tags) invalidation Needs review .

    The only other possibility I can think of would be significant clock drift between the servers so that the fast backend timestamp doesn't work, but that's not covered by this approach, we'd need to change the timestamp to some kind of checksum/counter.

    Closing this as cannot reproduce.

Production build 0.71.5 2024