Pass RenderContext around in the Renderer

Created on 9 April 2025, 3 months ago

Problem/Motivation

To enable async rendering, we need to be able to leave and re-enter placeholder rendering using Fiber::suspend()/resume() or their equivalent Revolt wrappers.

This means that we need to pass RenderContext objects around by reference as much as possible, without relying on the static variable in Renderer.

There are several calls to ::executeInRenderContext() in the renderer which call known methods on itself - these could instead pass the render context down and process it, without altering the stack itself, or at least that ought to be possible.

We won't be able to remove the stack entirely until at least ✨ Improve query and cache API so that render() doesn't have to be called to add query cache metadata Active is fixed but we can work through the various cases in multiple issues until there's no calls left.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

πŸ“Œ Task
Status

Active

Version

11.0 πŸ”₯

Component

render system

Created by

πŸ‡¬πŸ‡§United Kingdom catch

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @catch
  • πŸ‡¬πŸ‡§United Kingdom catch

    Looking at this it's a lot more complicated than I outlined in the issue summary because the static render context stack is a final protection against container rebuilds in the middle of rendering etc.

    What I think we can probably do is when rendering a placeholder within a Fiber, when the fiber is suspended, check if the render context stack changed, and if it did, then immediately resume the fiber to keep going - that should prevent cross-leakage of render context then. I don't have an implementation yet, but something along these lines.

  • πŸ‡¬πŸ‡§United Kingdom catch

    OK I did some investigation with πŸ“Œ Try to replace path alias preloading with lazy generation Active and πŸ“Œ Deprecate RendererInterface::render()'s sole $is_root_call parameter Needs work combined, and have a fix.

    This is proper bug in core but because we don't actually have any runtime code that suspends fibers during rendering yet, it's not triggered by anything.

  • Merge request !11785Trap render context inside a fiber. β†’ (Open) created by catch
  • πŸ‡¬πŸ‡§United Kingdom catch
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    This something we should add test coverage for?

  • πŸ‡¬πŸ‡§United Kingdom catch

    @smustgrave once you combine the three patches together we have a lot of implicit test coverage for it (as in core blows up without it).

    I haven't got my head around how to add explicit test coverage yet, or indeed why the current fibers test coverage doesn't trigger the bug.

  • πŸ‡«πŸ‡·France andypost

    is that really a bug? looks more like task kind

  • πŸ‡¬πŸ‡§United Kingdom catch

    @andypost without both fixes, any async code during rendering results in an exception, because we already protect against the escaping but didn't handle it everywhere when adding initial placeholder Fiber support.

  • πŸ‡«πŸ‡·France andypost

    @catch that's why I consider it task as async code was never supposed to be supported

  • The Needs Review Queue Bot β†’ tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

    This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide β†’ to find step-by-step guides for working with issues.

  • Pipeline finished with Failed
    3 months ago
    Total: 366s
    #477123
  • πŸ‡¬πŸ‡§United Kingdom catch
  • Pipeline finished with Failed
    3 months ago
    Total: 617s
    #477129
  • Pipeline finished with Failed
    2 months ago
    Total: 543s
    #492741
  • Pipeline finished with Failed
    about 2 months ago
    Total: 576s
    #496828
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Sorry to be that guy can the issue summary be filled out please.

  • Status changed to Needs review 25 days ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    @andypost we already support async rendering via πŸ“Œ Add Fibers support to Drupal\Core\Render\Renderer Fixed and πŸ“Œ Add PHP Fibers support to BigPipe RTBC however because core doesn't have any async code yet this bug wasn't hit.

    @smustgrave I've updated the issue summary, this is a hard issue to explain (it was also hard to track down and fix), so feedback on whether it's readable would be useful.

    #7 is still the current state of test coverage - it's easy to reproduce with the two blocked issues but I'm still struggling to come up with a self-contained test for it. Might have another poke around though

  • Pipeline finished with Failed
    22 days ago
    Total: 228s
    #527108
  • Pipeline finished with Failed
    22 days ago
    Total: 200s
    #527283
  • πŸ‡¬πŸ‡§United Kingdom catch

    Was able to come up with a unit test that simulates the assertion failure in ::executeInRenderContext() during big pipe rendering when πŸ“Œ Try to replace path alias preloading with lazy generation Active is applied, which was otherwise only visible in a couple of functional tests. Coming up with a minimal render array was not trivial - it needs both nested calls to ::executeInRenderContext() and also multiple Fiber::suspends() to trigger the assertion.

  • Pipeline finished with Failed
    22 days ago
    Total: 254s
    #527300
  • Pipeline finished with Canceled
    22 days ago
    Total: 72s
    #527325
  • Pipeline finished with Canceled
    22 days ago
    Total: 202s
    #527327
  • Pipeline finished with Failed
    22 days ago
    Total: 442s
    #527331
  • πŸ‡ΊπŸ‡ΈUnited States moshe weitzman Boston, MA

    MR looks good to me. I am not too familiar with fibers but the comments do a decent job of explaining why this is needed ... There is currently a test failure in Drupal\Tests\demo_umami\FunctionalJavascript\OpenTelemetryFrontPagePerformance

  • Pipeline finished with Success
    22 days ago
    Total: 379s
    #527361
  • πŸ‡¬πŸ‡§United Kingdom catch
  • Pipeline finished with Canceled
    21 days ago
    Total: 127s
    #527876
  • Pipeline finished with Canceled
    21 days ago
    Total: 71s
    #527877
  • Pipeline finished with Success
    21 days ago
    Total: 386s
    #527878
  • Pipeline finished with Failed
    21 days ago
    #527883
  • πŸ‡¬πŸ‡§United Kingdom catch

    Not sure why there's an extra cache get in the performance test yet, but did a file_put_contents() of the performance_data object then diffed it, and this is the extra one (would be nice if it looked this nice in the diff output, there's also a load of hashes in cids that are different for probably no good reason):

                                 [21] => ckeditor5.langcodes
                                 [22] => ckeditor5.langcodes
                                 [23] => ckeditor5.langcodes
                                 [24] => ckeditor5.langcodes
    +                            [25] => ckeditor5.langcodes
    

    That comes from _ckeditor5_get_langcode_mapping(). Will open a spin-off to move that to a service and keep the mappings in a property etc.

  • Interesting test failure with somehow the attribute order changing:

    Drupal\Tests\ckeditor5\FunctionalJavascript\CKEditor5MarkupTest::testAttributeEncoding
    Failed asserting that two strings are equal.
    --- Expected
    +++ Actual
    @@ @@
    -'...e-images/image-test.png" width="40" height="20" alt="</em> Kittens & llamas are cute">'
    +'...e-images/image-test.png" alt="</em> Kittens & llamas are cute" width="40" height="20">'
    
    core/modules/ckeditor5/tests/src/FunctionalJavascript/CKEditor5MarkupTest.php:110
    
  • πŸ‡¬πŸ‡§United Kingdom catch

    #18 might not be right, debugging I can't see an extra cache get - just that there's too many. So possibly that call just moved around. Will need to look more.

  • πŸ‡¬πŸ‡§United Kingdom catch

    ckeditor5 was a red herring, although that's definitely a small optimisation we can make in another issue.

    The extra cache get is from this in Drupal\Core\Theme\Registry

        // If called from inside a Fiber, suspend it, this may allow another code
        // path to begin an asynchronous operation before we do the CPU-intensive
        // task of building the theme registry.
        if (\Fiber::getCurrent() !== NULL) {
          \Fiber::suspend();
          // When the Fiber is resumed, check the cache again since it may have been
          // built in the meantime, either in this process or via a different
          // request altogether.
          if ($cached = $this->cacheGet()) {
            return $cached;
          }
        }
    
    

    Because the execution is now inside a fiber, this suspend happens, then when it's resumed the cache is requested again.

    This was added in πŸ“Œ Add PHP Fibers support to BigPipe RTBC .

    I think there's a question of how useful this is without πŸ“Œ Use revolt to prewarm caches during lock waits Active or async database queries etc., but this explains the extra cache get. For me that means there's nothing to do in this issue although we might want to revisit that elsewhere.

    To get a useful diff, I had to remove the hash salt and private key from the user permissions hash - I think we might also want to open a spin-off for that because not only are the permissions hashes never really exposed anywhere, but also there's no useful information in a hash as such anyway - this would allow us to compare cache operations much easier when they're part of cache contexts.

  • Because the execution is now inside a fiber, this suspend happens, then when it's resumed the cache is requested again.

    Nice find.

    To get a useful diff, I had to remove the hash salt and private key from the user permissions hash - I think we might also want to open a spin-off for that because not only are the permissions hashes never really exposed anywhere, but also there's no useful information in a hash as such anyway - this would allow us to compare cache operations much easier when they're part of cache contexts.

    That comes from _ckeditor5_get_langcode_mapping(). Will open a spin-off to move that to a service and keep the mappings in a property etc.

    Adding "Needs followup" tag as a reminder for these two.

    I think only things left outstanding are the test failures in core/profiles/demo_umami/tests/src/FunctionalJavascript/OpenTelemetryNodePagePerformanceTest.php and core/modules/ckeditor5/tests/src/FunctionalJavascript/CKEditor5MarkupTest.php. (The SettingsTrayBlockFormTest seems like a recurring intermittent failure.)

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    Comments are much clear now.

    I have two real questions.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Opened πŸ“Œ Refactor _ckeditor5_get_langcode_mapping() Active for the ckeditor5 issue.

    Revived πŸ“Œ Implement xxHash for non-cryptographic use-cases Needs work with a new MR and added the permissions hash generator change there - that issue might need splitting but at least it documents the thought.

    OpenTelemetryNodePagePerformanceTest
    

    I think this is a timing issue with the test - it looks like the image style request didn't finish before the performance data started recording. Added a couple of sleep(1) calls.

    CKEditor5MarkupTest.php
    

    Re-ran this test and it looks random.

  • Pipeline finished with Failed
    20 days ago
    Total: 636s
    #528449
  • Pipeline finished with Success
    20 days ago
    Total: 683s
    #528500
  • Changes look good. The Build test failed, but probably just needs re-running. Also would be good to see the test-only job run and fail, then I think it's good for RTBC.

  • Pipeline finished with Success
    19 days ago
    Total: 3085s
    #529633
  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    I agree!

    Test only is failing as expected.

    I moved the follow ups to the the remaining tasks.

    There is a line: Explicit test coverage would be

    I assume that was going to say difficult, but there is a test now. If this is wrong please let me know.

    All feedback had been addressed and this is blocking a few other issues, I think this looks ready to me.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Removed the line about test coverage - it was indeed difficult but they exist now.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Thanks for the reviews and RTBC. I've just realised one small simplification we can make here.

    Also debugging a last but also quite horrible bug via πŸ“Œ Entity lazy multiple front loading Active which may or may not still be related to the logic this is trying to fix.

    Will definitely update the MR for the first part, there might be something for the second part but still in the middle of figuring out what's going wrong. So moving back to NR quickly.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Pushed the simplification.

    I think the logic here is probably OK, I'm still stuck in πŸ“Œ Entity lazy multiple front loading Active but pretty sure that bug is not introduced here, just this issue gets us far enough along to find the next bug hopefully.

  • Pipeline finished with Success
    18 days ago
    Total: 554s
    #530008
  • The simplification is straightforward. Re-ran test-only job and it fails as expected. LGTM.

    I took a peek at the MR on πŸ“Œ Entity lazy multiple front loading Active to double check if it's related to anything here, and I agree it seems unlikely. I also added a couple comments on that MR about possible bugs.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Left a comment. Mind you I am very new to fibers, but from what I understand we are manipulating fibers that we didn't start and that seems like it might cause a deadlock if the code that spawned said fiber isn't sturdy enough to deal with our suspension.

    Going to leave at RTBC, but would be nice if my mind were put at ease :)

  • πŸ‡¬πŸ‡§United Kingdom catch

    The code that creates and runs a fiber needs to run it until completion (unless it's got some special reason not to bother finishing it, but that would more likely be at a very low level, not page/placeholder rendering).

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Yeah that's what I also reckoned from the information I could find. I just felt I'd raise it in case my inexperience gave me false confidence and there was, in fact, an issue with the MR.

  • πŸ‡¬πŸ‡§United Kingdom catch

    The other way to look at it that prior to the MR, if a fiber suspended during this code execution, that would be the equivalent of the 'parent fiber' in the MR, so suspending that here is retaining the existing behaviour (that code below this suspending a fiber will reach the code above that started it).

  • πŸ‡¬πŸ‡§United Kingdom catch

    catch β†’ changed the visibility of the branch 1237636-with-3518179 to hidden.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    One thing I'm left pondering is whether this points to problematic design. Like instead of doing this should the stack work better with fibers and not allow the pollution - I'm not sure what that would look like but I'm guessing the problem is the static::$contextCollection.

    I'm also wondering about other places in Drupal with similar logic that will need fixing for fibers - like the request stack.

  • πŸ‡¬πŸ‡§United Kingdom catch

    One thing I'm left pondering is whether this points to problematic design. Like instead of doing this should the stack work better with fibers

    Yes it definitely does point to problematic design. There are existing issues like ✨ Improve query and cache API so that render() doesn't have to be called to add query cache metadata Active / #3052553: Entity query alter with cacheable metadata leaks and triggers LogicException β†’ which are trying to reduce the need for the stack. I also opened πŸ“Œ Refactor the render context stack to avoid a static property Active to see if we can refactor the stack itself. But no idea how to get through all of those issues to a good end point. If we have something that's ugly but that actually works, then it might help us to get there.

    I'm also wondering about other places in Drupal with similar logic that will need fixing for fibers - like the request stack.

    The request stack might be a problem in some cases, but it's not implicit in the same way - e.g. I can get a request from the request stack, and once I've got that request as a local variable, it's fine with Fibers. The issue would be with code popping things on and off the request stack in weird places and then other code retrieving requests from it. We could potentially do something like decorate the http kernel, and then add some similar 'fiber trapping' code in the ::handle() method. These are also problems for #2218651: [meta] Make Drupal compatible with persistent app servers like ReactPHP, PHP-PM, PHPFastCGI, FrankenPHP, Swoole β†’ .

  • Pipeline finished with Failed
    17 days ago
    Total: 636s
    #530878
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Ticking issue credit boxes...

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Only putting this in 11.x because while you could consider this a bug nothing is using this yet.

    Committed df7ee1d and pushed to 11.x. Thanks!

    • alexpott β†’ committed df7ee1db on 11.x
      Issue #3518179 by catch, godotislate, alexpott, kristiaanvandeneynde,...
  • Pipeline finished with Success
    17 days ago
    Total: 1511s
    #530904
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024