- Issue created by @spokje
- Merge request !9676Draft: Run repeat-class-test as 1500x LayoutBuilderBlocksTest::testBlockPlaceholder β (Closed) created by spokje
- π³π±Netherlands spokje
So MR!9676 (https://git.drupalcode.org/issue/drupal-3477586/-/jobs/2891429) shows that this test fails ~26/1500 runs.
- π³π±Netherlands spokje
At first glance, I can think of two ways the placeholder is not replaced:
- Setting the
$block_content
in\Drupal::state()
goes wrong/causes a race condition - The "magic" in
\Drupal\layout_builder\EventSubscriber\BlockComponentRenderArray::onBuildRender
does something unexpected and we end up in theif ($is_content_empty && $is_placeholder_ready)
where we set the placeholder again, with similar content as the original one.
Let's assert both.
- Setting the
- π³π±Netherlands spokje
So looking at all the test failures being:
Drupal\Tests\layout_builder\Functional\LayoutBuilderBlocksTest::testBlockPlaceholder Behat\Mink\Exception\ResponseTextException: The text "is_content_empty && is_placeholder_ready Placeholder for the "The block label" block" appears in the text of this page, but it should not.
I think we can safely say it's #2.
- π³π±Netherlands spokje
$content = $block->build()
in\Drupal\layout_builder\EventSubscriber\BlockComponentRenderArray::onBuildRender
ends up in\Drupal\layout_builder\Plugin\Block\ExtraFieldBlock::build
.The only way I can see that returning empty would be if we land in this if-branch:
if (!isset($extra_fields['display'][$this->fieldName]))
.Which in turn would mean
\Drupal\Core\Entity\EntityFieldManager::getExtraFields
would return an array without$extra_fields['display'][$this->fieldName]
, which would mean in\Drupal\Core\Entity\EntityFieldManager::getExtraFields
,$this->extraFields[$entity_type_id][$bundle]
would not be set. - π³π±Netherlands spokje
This is where it ends for me, I really have no idea what's happening to cause the random failure.
- Status changed to Needs work
3 months ago 11:22am 26 December 2024 - π³π±Netherlands spokje
Ok, so there seems to be, what;looks like, a delay for setting a value in State.
A lot of the currently randomly failing tests are using State to trigger some kind of behaviour change in the middle of a test.
Which, if the above statement is true, would make them also randomly fail, which seems to happen.Time for some nitty-gritty proof why I think this is this case, dug up from the killing fields of test runs I had on this particular issue:
Looking through comments on π Use cache collector for state Fixed , I think one thing to explore would be to replace all the uses of state with keyvalue when testing block content, such as all the lines that look like
\Drupal::state()->set('block_test.content', ...
\Drupal::state()->get('block_test.content')
and see if random failures for LayoutBuilderBlocksTest go away.
These lines could also be swapped for keyvalue:
\Drupal::state()->set('block_test.attributes', ...
\Drupal::state()->get('block_test.attributes')
#157 π Use cache collector for state Fixed from π Use cache collector for state Fixed might be relevant here:
however we're seeing other test failures in runs which aren't consistent. I think the issue is that any test using state to track.. well.. state between requests, now needs to either switch to WaitTerminateTestTrait or to use the key value service directly without the state wrapper - because the cache will be updated in terminate.
- Merge request !10689Issue #3477586: Replace use of state with keyvalue in block_test content. β (Closed) created by godotislate
Pushed up draft MR 10689 with just the state -> keyvalue changes mentioned in #10. Only can report that the the build did not fail.
- π³π±Netherlands spokje
Thanks @godotislate.
There's a Slack thread about this here: https://drupal.slack.com/archives/C079NQPQUEN/p1735214972315009
- π³π±Netherlands spokje
spokje β changed the visibility of the branch 11.x to hidden.
- Merge request !10692Draft: Resolve #3477586 "2500x layoutbuilderblockstest testblockplaceholder as is" β (Open) created by spokje
- π³π±Netherlands spokje
spokje β changed the visibility of the branch 3477586-random-test-failure to hidden.
- Merge request !10694Draft: Resolve #3477586 "2500x layoutbuilderblockstest testblockplaceholder replaced" β (Open) created by spokje
- π¨πSwitzerland berdir Switzerland
We *did* have random fails in that issue originally as well. But then it's been quiet for quite some time until they got a lot more frequent. Maybe in combination with the wait for terminate stuff and making something else more efficient?
I'm fairly certain that the actual key value write isn't delayed. We also immediately do a cache invalidation. The only thing that's delayed is updating the combined cache. And we do a lot of checks to prevent conflicts and incorrect caches there but apparently something seems to go wrong. It relies a lot on cache write time, but that's in microseconds, so that's very unlikely to happen at the same time.
One thing that however doesn't rely on microseconds and just seconds is cache expiration. \Drupal\Core\Cache\DatabaseBackend::invalidateAll() sets the expiration to request time - 1. I wonder if there's something weird going on there that cache reads in the test have a request time that's still ahead of that? What happens if we set that to -10 instead? Or maybe even just a fixed 1?
The thing is that CacheCollector is basically the _only_ thing in core that uses the invalidate API.
- π³π±Netherlands spokje
MR !10694 shows that changing
\Drupal::state()->set()/get()
toDrupal::keyValue()->set()/get();
as suggested in the Slack thread passes a 2500 times run without errors.Since this involves making changes in
core/modules/block/tests/modules/block_test/src/Plugin/Block/TestCacheBlock.php
, this means more tests are affected.@godotislate was already miles ahead of me and created MR !10689, which is the MR to be committed.
- Merge request !10695Draft: Resolve #3477586 "2500x layoutbuilderblockstest testblockplaceholder invalidate cache minus 10" β (Open) created by spokje
- Merge request !10696Draft: Resolve #3477586 "2500x layoutbuilderblockstest testblockplaceholder invalidate cache fixed 1" β (Closed) created by spokje
- π³π±Netherlands spokje
@berdir:
\Drupal\Core\Cache\DatabaseBackend::invalidateAll()
set the expiration to request time - 10 => MR!1069553 failures out of 2500 runs.\Drupal\Core\Cache\DatabaseBackend::invalidateAll()
set the expiration to 1 => MR!1069682 failures out of 2500 runs.So roughly the same as the unchanged 2500 run.
I'm going to set this to NR for #22 π [random test failure] LayoutBuilderBlocksTest::testBlockPlaceholder failing Active .
godotislate β changed the visibility of the branch 3477586-2500x-LayoutBuilderBlocksTest-testBlockPlaceholder-replaced to hidden.
godotislate β changed the visibility of the branch 3477586-2500x-LayoutBuilderBlocksTest-testBlockPlaceholder-as-is to hidden.
Per discussion on Slack, we can go forward with the state to keyvalue swap for the relevant block tests here. There should be a follow up though to investigate further about what is the cause of the intermittent failures state failures, if only to establish that it's a test-only condition.
The other random test failure issues linked in the meta π± [meta] Known intermittent, random, and environment-specific test failures Active should be looked at as well to see whether there are usages of state that can be replaced with key value instead.
- First commit to issue fork.
- πΊπΈUnited States dww
Excellent work here!
After checking out the issue fork branch, there are still a ton of references to
\Drupal::state()
incore/modules/block/tests/src/*
. Do we want to fix everything? I'm not sure why we're converting some tokeyValue
and leaving others asstate
. Per conversation with catch on Slack, the overall goal is to change everything in tests over to key value, except for tests specifically testing state, because writing to key value will be cheaper than writing to state.
The references changed in the MR were ones specific to
LayoutBuilderBlocksTest
and everything that shared those references. I think it's just a question of how to scope changing over all the state usage across all the tests.- π¬π§United Kingdom catch
I think I found the root cause for the random failures in π Race conditions/bad lock logic in CacheCollector/State Active (an actual race condition in
State
) but we as above we should go ahead here too. - π³π±Netherlands spokje
spokje β changed the visibility of the branch 3477586-2500x-LayoutBuilderBlocksTest-testBlockPlaceholder-invalidate-cache-minus-10 to hidden.
- πΊπΈUnited States nicxvan
I have been following along most of these issues and read through / followed the slack conversations as well as the root cause.
I know one of the issues requested expanding scope to convert more but it was discussed that each issue should tackle one failing test since it takes over 1000 test runs to confirm.
I have confirmed that the only changes are going from state to keyValue.
Further, while this changes more than just one test it is because the layout test relies on block as well so those can affect it.Looks good to me and it would be great to put one of the most common test failures to rest.
- π¬π§United Kingdom catch
This looks good but I'm relying on the random test failure to prove that π Race conditions/bad lock logic in CacheCollector/State Active fixes it, so it would be useful to not commit this for 24-48 hours just to get an extra few test runs on that issue.
- π³π±Netherlands spokje
so it would be useful to not commit this for 24-48 hours just to get an extra few test runs on that issue.
I can see the usefulness, but I also see numerous MRs failing on this (and others).
We _could_ commit this and in your test-MR reverse the changes made here, which leaves you with exactly the same failing MR as before?
- π¬π§United Kingdom catch
#40 is a good point, and hopefully got enough data on the other issue now anyway.
Committed/pushed to 11.x and cherry-picked back through to 10.3.x, thanks!
Automatically closed - issue fixed for 2 weeks with no activity.
- π³πΏNew Zealand quietone
Followup asked for in #30 has been made, π Identify usages in tests of State that can be replaced with key value instead Active