[random test failures] Race condition in state when individual keys are set with an empty cache

Issue created by @catch
First commit to issue fork.
Merge request !7334Draft: test repeat → (Open) created by berdir
Merge request !7336Handle the case where a cache item is set in between a cache miss and trying to set → (Open) created by catch
Pipeline finished with Success
over 1 year ago
Total: 1133s
#137839
Pipeline finished with Canceled
over 1 year ago
Total: 927s
#137843
Pipeline finished with Canceled
over 1 year ago
Total: 1941s
#137860
Pipeline finished with Canceled
over 1 year ago
Total: 859s
#137907
Pipeline finished with Success
over 1 year ago
Total: 628s
#137938
Merge request !7338Draft: usleep(300) → (Open) created by catch
Pipeline finished with Success
over 1 year ago
Total: 625s
#137964
Pipeline finished with Failed
over 1 year ago
Total: 171s
#137975
Pipeline finished with Success
over 1 year ago
#137977
Pipeline finished with Failed
over 1 year ago
Total: 632s
#138008
Comment over 1 year ago →
🇬🇧United Kingdom catch
I ran WorkflowUiTest 100 times and got 3-4 fails.

https://git.drupalcode.org/project/drupal/-/jobs/1244616

Then with the fix, I ran it 500 times and got 0 fails:

https://git.drupalcode.org/project/drupal/-/jobs/1244933
https://git.drupalcode.org/project/drupal/-/jobs/1248600
https://git.drupalcode.org/project/drupal/-/jobs/1248679
https://git.drupalcode.org/project/drupal/-/jobs/1248737
https://git.drupalcode.org/project/drupal/-/jobs/1248755
Comment over 1 year ago →
🇬🇧United Kingdom catch
Pipeline finished with Failed
over 1 year ago
Total: 580s
#138466
Status changed to Needs review over 1 year ago3:00pm 5 April 2024
Comment over 1 year ago →
🇬🇧United Kingdom catch
I need to back out the test changes here, they're just for the repeated test run job so that we only run one test method 100 times instead of the entire test, but would be great to get reviews - and leaving it in for now in case we want to re-run that job even more times.
Status changed to Needs work over 1 year ago3:28pm 5 April 2024
Comment over 1 year ago →
🇬🇧United Kingdom longwave UK
Looks good to me - the fixes make sense and the comments definitely help explain what's going on. As was discussed in Slack this was originally a concern with the cache collector but never was an issue in practice until tests started hitting cold caches occasionally in this way. For me the evidence in #6 is enough to commit this, so NW to remove the test changes.
Status changed to Needs review over 1 year ago4:21pm 5 April 2024
Comment over 1 year ago →
🇬🇧United Kingdom catch
Yeah we covered the warm (but invalid) cache situation with the cache collector, which is the main case, but cold + invalid just never came up or seemed realistic. Additionally you would never hit this with a discovery cache (which is most cache collector use cases), because it's unlikely to get invalidated twice in the same request or anything, needs to be extremely high write. I'm not sure a site would even run into this with state because fully cold cache then immediate write situation seems unlikely there too, but we're clearly hitting it in tests.

Rebased to drop the test modifications.
Status changed to RTBC over 1 year ago4:37pm 5 April 2024
Comment over 1 year ago →
🇬🇧United Kingdom longwave UK
MR!7336 looks great.
Comment over 1 year ago →
🇬🇧United Kingdom catch
Started off as a task but definitely a bugfix now.
Comment over 1 year ago →
🇬🇧United Kingdom catch
Had one more look over this and realised the comment in State::set() was slightly out of date and could be clearer. Updated that but leaving RTBC since it's comment-only.
Comment over 1 year ago →
🇬🇧United Kingdom alexpott 🇪🇺🌍
alexpott → changed the visibility of the branch 3438424-random-test-failures to hidden.
Comment over 1 year ago →
🇬🇧United Kingdom alexpott 🇪🇺🌍
alexpott → changed the visibility of the branch 3438424-usleep to hidden.
Status changed to Needs work over 1 year ago12:01pm 6 April 2024
Comment over 1 year ago →
🇬🇧United Kingdom alexpott 🇪🇺🌍
The latest changes have the MR failing tests... see https://git.drupalcode.org/project/drupal/-/merge_requests/7336/pipelines
Status changed to RTBC over 1 year ago2:45pm 6 April 2024
Comment over 1 year ago →
🇬🇧United Kingdom catch
Managed to commit to the wrong branch... dropped from the MR now.
Pipeline finished with Failed
over 1 year ago
Total: 638s
#139601
Status changed to Needs work over 1 year ago3:10pm 6 April 2024
Comment over 1 year ago →
🇬🇧United Kingdom alexpott 🇪🇺🌍
Still failing tests...

core/tests/Drupal/Tests/Core/Cache/CacheCollectorTest.php
core/profiles/standard/tests/src/FunctionalJavascript/StandardPerformanceTest.php

I think having the MR link in the issue messes up d.o so it puts the failing test link in the wrong place... I've removed that from the issue summary so hopefully this is easier to see.
Pipeline finished with Failed
over 1 year ago
Total: 637s
#139966
Pipeline finished with Failed
over 1 year ago
Total: 640s
#139991
Pipeline finished with Failed
over 1 year ago
Total: 575s
#140018
Pipeline finished with Failed
over 1 year ago
Total: 606s
#140025
Comment over 1 year ago →
🇬🇧United Kingdom catch
OK now I can see the test failures, that's helpful.

The unit test failure is very handy, I wasn't sure how to test the race condition here, but turns out we already had a unit test for a cache item being set in the middle of the request. I've split that one method into two, so it now tests both the warm cache situation (already covered but the test wasn't explicitly testing this) and cold cache situation (what we're fixing here and what the test coverage was actually testing but with different expectations).
Pipeline finished with Canceled
over 1 year ago
#140032
Pipeline finished with Failed
over 1 year ago
Total: 573s
#140039
Pipeline finished with Canceled
over 1 year ago
Total: 544s
#140044
Pipeline finished with Failed
over 1 year ago
Total: 573s
#140049
Pipeline finished with Running
over 1 year ago
#140054
Status changed to Needs review over 1 year ago10:17am 7 April 2024
Comment over 1 year ago →
🇬🇧United Kingdom catch
Performance tests are green now - it's one extra state cache set + the related key/value queries collected during that request, not surprising since we're being more conservative about writing to the cache on completely cold starts now.
Status changed to RTBC over 1 year ago2:16pm 9 April 2024
Comment over 1 year ago →
🇬🇧United Kingdom alexpott 🇪🇺🌍
Given catch's changes are to a test to fix and improved comments going to rtbc and commit. Hopefully this will reduce some of the random fails due to the state cache collector from occurring.
Pipeline finished with Success
over 1 year ago
Total: 1031s
#141810
Comment over 1 year ago →
🇬🇧United Kingdom alexpott 🇪🇺🌍
Committed 3a3e618 and pushed to 11.x. Thanks!
Committed af46c48 and pushed to 10.3.x. Thanks!
Comment over 1 year ago →
System Message

alexpott → committed af46c48e on 10.3.x
Issue #3438424 by catch, Berdir, alexpott, longwave: [random test...
Status changed to Fixed over 1 year ago5:01pm 9 April 2024
Comment over 1 year ago →
System Message

alexpott → committed 3a3e6186 on 11.x
Issue #3438424 by catch, Berdir, alexpott, longwave: [random test...
Comment about 1 year ago →
System Message
Automatically closed - issue fixed for 2 weeks with no activity.

[random test failures] Race condition in state when individual keys are set with an empty cache

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Merge Requests

!7336[random test failures] Race condition in state when individual keys are set with an empty cache
Open

!7334[random test failures] Race condition in state when individual keys are set with an empty cache
Open

!7338[random test failures] Race condition in state when individual keys are set with an empty cache
Open

Comments & Activities

[random test failures] Race condition in state when individual keys are set with an empty cache

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Merge Requests

!7336[random test failures] Race condition in state when individual keys are set with an empty cacheOpen

!7334[random test failures] Race condition in state when individual keys are set with an empty cacheOpen

!7338[random test failures] Race condition in state when individual keys are set with an empty cacheOpen

Comments & Activities

!7336[random test failures] Race condition in state when individual keys are set with an empty cache
Open

!7334[random test failures] Race condition in state when individual keys are set with an empty cache
Open

!7338[random test failures] Race condition in state when individual keys are set with an empty cache
Open