- Issue created by @dww
- 🇩🇪Germany donquixote
This use of randomString() looks suspicious.
$this->drupalGet(\Drupal::service('file_url_generator')->generateAbsoluteString($directory . '/' . $this->randomString()));
- Merge request !10701Draft: Resolve #3487371 "2500x imagestylespathandurltest as is" → (Closed) created by spokje
- Merge request !10702Draft: Resolve #3487371 "2500x imagestylespathandurltest key value" → (Closed) created by spokje
- 🇳🇱Netherlands spokje
MR !10701 (which is
Drupal\Tests\image\Functional\ImageStylesPathAndUrlTest
and some pipeline changes to make it, and only it, run 1750 times) gives us 13 fails out of the 1750 runs.These 13 fails breaks down to:
2xtestImageStyleUrlAndPathPrivateUnclean()
10xtestImageStyleUrlAndPathPrivateLanguage()
1xtestImageStyleUrlAndPathPrivate()
MR !10702 shows that changing \Drupal::state()->set()/get() to Drupal::keyValue()->set()/get(); as suggested in the Slack thread passes a 1750 times run without errors.
- 🇺🇸United States smustgrave
Based on the findings in #9, and I re-ran 10703 a few times and never got these random failures, when before I probably least got 1 each time.
- 🇺🇸United States dww
Fantastic work! Great find.
Agree that the changes to
ImageStylesPathAndUrlTest
look good. That much is indeed RTBC.However, grep is finding more
\Drupal::state()
in core/modules/image/tests/src/* and I'm wondering if we want to expand the scope here to fixImageStyleFlushTest
andImageEffectsTest
while we're at it. I don't remember seeing those randomly fail as often, but it seems like we shouldn't just do this conversion when we happen to be nailed by the trouble. I'm in favor of preemptively fixing all the tests to stop using state(). So we should probably fix more than 1 test class per issue. - 🇳🇱Netherlands spokje
Strongly disagree with bundling more issues together, but feel free to add-n-fix whatever you like.
Problem here IMHO is where the scope-creep ends, also more tests to test 2500x, or at least as-many-times-fit-in-an-hour makes things very slow, distinct tests run lower and fail rate/ success prove drop.
Doubt that it's OK to just create an MR that changes state to keyvalue, without the above test-runs, since the mostly (very) low failure rate will probably mean a single run will pass the tests.
To me that would prove absolutely nothing about the need for the fix, but n=1, YMMV etc, etc :)
Let the powers-that-be decide on this one.
In the mean time I will continue to create single test fix issues, and surely not for the easy core credits, which I really don't need in the first place. - 🇺🇸United States nicxvan
In general I would agree with @dww's comment to expand, but here I think!@spokje's reasoning is sound.
Further there is a separate issue to address the race condition and anecdotally I've never seen this other tests fail.
- 🇺🇸United States dww
I was thinking that if release managers are saying “we shouldn’t use state in tests, only keyvalue, that we could do bulk.
But yes, strongly appreciate the long time it takes to do even single tests carefully. Wouldn’t dream of accusing you of lots of issues for credit. You’re a prolific contributor!
I was only wondering if we could add more scope without as much effort and rigor if we want to convert entirely. It’s a bit weird having a mix of both, so I’m hoping to minimize the time it takes to convert.
All that said, yes, following the issue where the underlying race condition in State API is hopefully going to be fixed. So maybe it’s moot to do this conversation?
Yeah, I’m torn, I guess we’ll leave it for the core committers to decide. Back to RTBC.
Thanks!
-Derek - 🇩🇪Germany donquixote
I'd say this issue needs an explanation why the problem occurs with state, but not with keyvalue.
From the comments here I understand it is a race condition, and from looking at the code, I assume it is caused by the cache layer in state, because otherwise state is just a wrapper around keyvalue.as suggested in the Slack thread
So, this should be in the issue summary..
I do find this issue from April 2024, which might be related:
🐛 [random test failures] Race condition in state when individual keys are set with an empty cache Fixed - 🇳🇱Netherlands spokje
@donquixote There's a problem with multiple issues: When you're working on a lot of them, you forget that other people didn't. :)
Here's the issue where the root cause is attacked, hope that helps?
🐛 Race conditions/bad lock logic in CacheCollector/State Active - 🇬🇧United Kingdom catch
Generally on the scope:
I would essentially commit any quick fix to
ImageStylesPathAndUrlTest
to stop it random failing including skipping it, because it is so annoying when it fails all the time.Fixing it instead of skipping it is much better.
I do think we should switch all test usage from state to key/value because it massively simplifies things, however it's currently useful that we haven't done that because it's the only way I know to reproduce 🐛 Race conditions/bad lock logic in CacheCollector/State Active . So would be quite happy to commit individual fixes for the frequently random tests to make the pipelines more reliable, then handle other issues which aren't randomly failing (as often) together in one issue later.
- 🇬🇧United Kingdom catch
Committed/pushed to 11.x, thanks!
This doesn't apply to any previous branches including 11.1.x. I think it would be worth backporting to 10.5, 11.1, and 10.4 (11.0 and 10.3 if it's an easy cherry pick) so that we get less random failures on branch runs.
- 🇳🇱Netherlands spokje
Not sure what either I or this issue has done to deserve waiingt 2:42 minutes for a full spellcheck on all files, but here we are...
- 🇳🇿New Zealand quietone
Updated the proposed resolution with the issue where the root cause was discussed.
Automatically closed - issue fixed for 2 weeks with no activity.