[PP-2] Migrate BigPipe and Renderer fiber implementation to use Revolt Event Loop

Created on 3 March 2024, 9 months ago
Updated 25 March 2024, 8 months ago

Problem/Motivation

In πŸ“Œ Add PHP Fibers support to BigPipe RTBC Fiber support for BigPipe was implemented which performs its own mini event loop. The same was done for the Renderer in πŸ“Œ Add Fibers support to Drupal\Core\Render\Renderer RTBC . This requires quite a bit of boilerplate to make sure that we track which fibers are not yet completed as well as to make sure that if BigPipe/Renderer is run along side other async tasks that it plays nicely with those.

In 🌱 Adopt the Revolt event loop for async task orchestration Active we outlined the proposal to adopt the Revolt event loop to take care of those things for us, allowing code like that in BigPipe and the Renderer to be simplified and focus on the tasks that need to happen.

Steps to reproduce

Proposed resolution

The individual BigPipe tasks can fail but we don't want those to break the entire BigPipe processing. To achieve that we'll need the Result type introduced in ✨ Implement a Result type in Drupal Core Needs review . Additionally we'll need the revolt/event-loop package which will be added in πŸ“Œ Add revoltphp/event-loop dependency to core Active .

In the Revolt Playground on GitHub that I've created there is a BigPipe demo to show how the different tasks can be deferred and can indicate when they're done after which they can be individually processed (and sent to the browser).

In Drupal we'll want to adopt the stream utility which takes a set of asynchronous operations and allows looping over the results as they become available (which might be a different order than in which they were started).

We use the same strategy to clean up the Renderer loop although there we throw an exception if any of the renders fails which keeps the behavior consistent with a resumption at the previously existing $fiber->suspend() calls throwing any errors that happened in the fiber.

It was initially tried to first migrate BigPipe but it was found that the Renderer's event loop would run inside of tasks for BigPipe which would cause the Renderer loop to resume tasks that were pending for the Revolt event loop (this is why async implementations say there should only be a single event loop per application). With that in mind the Renderer rewrite was included in this issue.

Remaining tasks

  • Merge blocking issues and rebase MR
  • Write release notes snippet

User interface changes

API changes

Data model changes

Release notes snippet

πŸ“Œ Task
Status

Postponed

Version

11.0 πŸ”₯

Component
BigPipeΒ  β†’

Last updated about 11 hours ago

Created by

πŸ‡³πŸ‡±Netherlands kingdutch

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @kingdutch
  • πŸ‡³πŸ‡±Netherlands kingdutch

    I've created a branch to start work onto which I've cherry-picked the proposed changes for the issues that are blocking this. Those commits will need to be removed in a rebase when other issues land.

    The proposal for the BigPipe rewrite is in https://git.drupalcode.org/issue/drupal-3425212/-/commit/cd1c3fb87507dc5... which only looks at the BigPipe implementation and does not yet do anything for the tests themselves which contain some Fiber specific code that will also need adapting.

  • Pipeline finished with Failed
    8 months ago
    Total: 173s
    #114971
  • Pipeline finished with Failed
    8 months ago
    Total: 206s
    #114984
  • Pipeline finished with Failed
    8 months ago
    Total: 545s
    #114985
  • Pipeline finished with Failed
    8 months ago
    Total: 174s
    #115119
  • Pipeline finished with Failed
    8 months ago
    #115120
  • Pipeline finished with Failed
    8 months ago
    Total: 612s
    #115130
  • Pipeline finished with Success
    8 months ago
    #115137
  • Status changed to Postponed 8 months ago
  • πŸ‡³πŸ‡±Netherlands kingdutch

    Merging this is postponed on the two other issues landing but the proposed MR includes the code of the other two issues to show that altogether all the test coverage that exists is green and good to go.

    As stated in the Revolt documentation "every application making use of cooperative multitasking needs a single scheduler (also called event loop), which this package provides." which is also backed up by Node.js's event loop implementation or the different native PHP event loop implementations that Revolt delegates to for us.

    While working on converting the BigPipe mini-event loop into a set of tasks on Revolt's event loop this necessity was discovered in that the mini-event loop in the Renderer caused tasks to be resumed in placeholder tasks on the renderer. To fix this we'll have to convert both at the same time and I've adjusted this issue accordingly.

    There are still Fiber references in the Drupal code-base because they are for suspending on single-tasks and thus don't act as mini-event loops. This causes them to work with the Revolt event loop code. We probably want to convert those to Revolt's event loops primitives in the future, but I've left everything that wasn't required for the test coverage to function outside of the scope of this issue to reduce the review surface.

  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada

    First of all: feel free to dismiss my concern. My view on things are warped because I spend little time building from the UI, not that much writing new code -- most of my time is spent with debugging. So when I see this:

      foreach ($operations as $key => $operation) {
        EventLoop::defer(function () use ($suspension, $key, $operation) {
          try {
            $result = $operation();
            $suspension->resume([$key, Result::ok($result)]);
          }
          catch (\Throwable $e) {
            $suspension->resume([$key, Result::error($e)]);
          }
        });
      }
    

    this makes me afraid it'll make code simply undebuggable. I have been burnt by the GraphQL module before using similar patterns. You put a breakpoint somewhere, get a stacktrace and it's completely useless because it comes from this thing which merely executes something assembled by something else and the two thing are so far removed you can't stitch it together again. I very well recognize this is a pattern used in other programming languages but that is a theoretical reassurance that doesn't help debugging Drupal code.

  • πŸ‡³πŸ‡±Netherlands kingdutch

    Thanks for sharing your concern!

    Without a specific example of the problematic GraphQL stack-trace it's hard to comment on the specific issue (perhaps sharing this in the GraphQL issue queue if you haven't done so already can help us find improvements there). For now I'm going to assume that the trouble you experience with GraphQL is specifically because we build the schema in one place and then execute against the schema later in another place. While doing that we use a user-land promise style which might complicate this a lot.

    There's also the fact that Drupal's exception handler delegates to Error::decodeException which only handles the top level exception and truncates any nested exceptions

    For example the following code in a PHP sandbox

    try {
        throw new \RuntimeException("Foo");
    }
    catch (\Exception $e) {
        throw new \Exception("Exception thrown", 0, $e);
    }
    

    Properly shows both stack-traces

    Fatal error: Uncaught RuntimeException: Foo in /in/Q5PRm:4
    Stack trace:
    #0 {main}
    
    Next Exception: Exception thrown in /in/Q5PRm:7
    Stack trace:
    #0 {main}
      thrown in /in/Q5PRm on line 7
    
    Process exited with code 255.
    

    That same implementation in Drupal would only log the \Exception which is thrown in the catch block and doesn't actually contain the useful information. Ensuring that Drupal prints the entire chain of exceptions would be a big developer experience win and might solve (some of) your debugging issues.

    With those two things in mind there's two changes for the implementation that we're discussing here.

    Firstly I made sure not to wrap any caught exception (or exception contained in the Error result) so that Drupal doesn't hide the origin of the exception. Instead I throw them as-is.

    Secondly the implementation we use here is built on Fibers which doesn't suffer the problem that GraphQL's promise-based implementation might have because PHP itself keeps track of where the work was postponed and can thus also restore the callstack. I created a small reproduction of what an exception would look like for the bigpipe code in my Revolt Playground. The resulting stacktrace is copied below and I've prepended numbers so I can annotate them below.

    [1] at revolt-playground/src/Demo/Stacktrace/Command.php:43
    [2] Kingdutch\RevoltPlayground\Demo\Stacktrace\Command->throwException() at revolt-playground/src/Demo/Stacktrace/Command.php:25
    [3] Kingdutch\RevoltPlayground\Demo\Stacktrace\Command->Kingdutch\RevoltPlayground\Demo\Stacktrace\{closure}() at revolt-playground/Drupal/Core/Async/functions.php:41
    [4] Drupal\Core\Async\{closure}() at revolt-playground/vendor/revolt/event-loop/src/EventLoop/Internal/AbstractDriver.php:597
    [5] Revolt\EventLoop\Internal\AbstractDriver->Revolt\EventLoop\Internal\{closure}() at n/a:n/a
    [6] Fiber->resume() at revolt-playground/vendor/revolt/event-loop/src/EventLoop/Internal/AbstractDriver.php:497
    [7] Revolt\EventLoop\Internal\AbstractDriver->invokeCallbacks() at revolt-playground/vendor/revolt/event-loop/src/EventLoop/Internal/AbstractDriver.php:553
    [8] Revolt\EventLoop\Internal\AbstractDriver->Revolt\EventLoop\Internal\{closure}() at n/a:n/a
    [9] Fiber->resume() at revolt-playground/vendor/revolt/event-loop/src/EventLoop/Internal/AbstractDriver.php:94
    [10] Revolt\EventLoop\Internal\AbstractDriver->Revolt\EventLoop\Internal\{closure}() at revolt-playground/vendor/revolt/event-loop/src/EventLoop/Internal/DriverSuspension.php:117
    [11] Revolt\EventLoop\Internal\DriverSuspension->suspend() at revolt-playground/Drupal/Core/Async/functions.php:55
    [12] Drupal\Core\Async\stream() at revolt-playground/src/Demo/Stacktrace/Command.php:28
    [13] Kingdutch\RevoltPlayground\Demo\Stacktrace\Command->execute() at revolt-playground/vendor/symfony/console/Command/Command.php:279
    [14] Symfony\Component\Console\Command\Command->run() at revolt-playground/vendor/symfony/console/Application.php:1031
    [15] Symfony\Component\Console\Application->doRunCommand() at revolt-playground/vendor/symfony/console/Application.php:318
    [16] Symfony\Component\Console\Application->doRun() at revolt-playground/vendor/symfony/console/Application.php:169
    [17] Symfony\Component\Console\Application->run() at revolt-playground/index.php:21
    

    1 and 2 is where the actual exception occurs.
    3 is the indirection of wrapping our function call in a closure to be able to prepare it as a deferrable operation.
    4-11 is Revolt internals which take care of the scheduling of the asynchronous operations.
    12 is where we loop over the operations (thus invoking them) using stream
    13-17 is Symfony's application and console command handling.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Did a first pass review of the MR here. The BigPipe tests caught loads of bugs when I was working on the Fibers implementation, so them passing here makes me very confident this is all working as expected too. It definitely reduces a lot of duplication of the Fiber looping.

    The Revolt API looks 'new' and it's a bit bigger than the raw PHP Fibers API, however having worked on the initial Fibers implementation, first me until I actually tried to use it, then loads of other people I talked to about it, got very confused about what Fiber::suspend() and Fiber::resume() etc. actually do, so having a more commonly used API with its own high level docs around some of those things will probably help.

    Additionally, hardly anyone is going to be writing async code for Drupal. The actual async code we're likely to add in core will be in ✨ [PP-1] Async database query + fiber support Active and once that's added, Views should use it ✨ [PP-1] Add async query execution support to views Active , and that will mean it runs a lot but without anyone needing to write async queries themselves let alone a separate async implementation. There are other use cases (Guzzle http requests, sending e-mails etc.) but not that many because it needs to be an i/o heavy operation to be worthwhile.

  • Pipeline finished with Success
    8 months ago
    Total: 572s
    #118452
  • Pipeline finished with Failed
    8 months ago
    Total: 204s
    #121598
  • πŸ‡³πŸ‡±Netherlands kingdutch

    I've rebased the code on top of 11.x and re-applied the changes from πŸ“Œ Add revoltphp/event-loop dependency to core Active and ✨ Implement a Result type in Drupal Core Needs review so that their code is up-to-date according to the reviews from those issues.

    By larowlan

    I feel strongly either way about the use of namespaced functions. I would like to see some more documentation here of examples of what operations might be.

    e.g. some async, some delayed, some deferred

    I'm not entirely sure what examples to provide. Delayed and Deferred functions are examples of async functions. The semantics of what is "delayed" vs "deferred" are those matching with the Revolt Event Loop.

    There is no hard requirement that an operation is actually asynchronous either. In case the operations themselves are actually synchronous then the form of stream (by scheduling with EventLoop::defer) ensures that other operations can still run in between synchronous operations as needed. In case the operations themselves suspend then thanks to the event loop this automatically provides places for other work to occur. The important part for stream is that we want to process tasks that are done as soon as possible, so that they may be out of order if a task started first takes longer.

    I created an example of this in kingdutch/revolt-playground github repository. One way to show what would happen for synchronous operations would be to extend that example with placeHolderSync that uses sleep for a certain period of time, rather than using the EventLoop::delay function.

    I hope that example can help clarify of what kind of additional documentation you'd like to see :D

  • Pipeline finished with Success
    8 months ago
    Total: 610s
    #121603
  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada
        EventLoop::defer(function () use ($suspension, $key, $operation) {
          try {
            $result = $operation();
            $suspension->resume([$key, Result::ok($result)]);
          }
          catch (\Throwable $e) {
            $suspension->resume([$key, Result::error($e)]);
          }
        });
    

    I really dislike this pattern. I complained in Slack this doesn't feel like PHP to me and later when digging around for the event dispatcher issue have I found https://www.drupal.org/project/drupal/issues/1972300#comment-9216241 β†’ which expresses the exact same concern about the same pattern.

    But whether it's typical to PHP or not, it's certainly alien to Drupal.

    And we have a lot of examples of what happens when brilliant people (and Kingdutch is one without a doubt) introduce patterns familiar to them.

    We see it in Views which was literally written by a wizard (hi Merlin :) ) and it is still a PHP4 codebase mostly because no one has the slightest idea any more what's going in there. If you disagree, please document what items are in MultiItemsFieldHandlerInterface , thanks :)

    We see it in entity query where Tables.php is an eldritch abomination written by me (sigh) and the moment I stopped working on πŸ› Entity field relationship queries for multi column field items stored in shared tables are broken Needs work the issue died.

    We see it in Typed Data where the number of bugfixes per year is below one (not counting the mechanical upkeep with latest upstream/Drupal). And that's not because it's bug free.

    The moment Kingdutch stops working on this, this component will die too.

    This is why I object.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Worse, it builds on fibers which are, by themselves, a very new concept and not at all familiar to anyone.

    Fibers was added to PHP 8.1, which was released in 2021, we're now three years past that.

    PDO was added to PHP in 2004, the dbtng issue was committed in 2008, four years later.

    The diffstat here is 400 lines, the dbtng patch was nearly 400kb. Every contrib module had to rewrite their database queries due to dbtng, exactly zero contrib modules would be required to change anything because of the change here.

    It's true that there are some large parts of core that are either not well maintained, or are mostly only receiving bugfixes - but you could have said that about BigPipe until the Fibers patch landed too. Part of maintaining things is updating them to take advantage of new language features etc.

    And the worst thing is, it won't be core that will go sideways, no it'll be when you have a large codebase five years down the line having thirty calls to stream() -- what then?

    We're already using Fibers in core, but in literally two places, which are mutually exclusive on runtime - the renderer, and the BigPipe renderer that replaces it when you enable big_pipe module. There is one other place that we might add a fiber loop/stream call, which is in πŸ“Œ Add a cache prewarm API and use it to distribute cache rebuids after cache clears / during stampedes Needs work .

    I do not think any codebase will find thirty places to call stream() in fact even for complex code-bases, I would expect the number of places they would call it themselves would be between zero and 1.

    That's because by using the existing render placeholder system (and the cache prewarm fallback) as the entry point, there are few other opportunities to do so (and that's what makes the bigpipe implementation so encouraging from a cold cache performance, once we have async database query support to actually exploit it). What we will eventually do is find more places to call Fiber::suspend() or the revolt equivalent but that's also not going to be many.

    The exception would be very low-level, cli-only processes - for example something like aggregator module refreshing feeds on cron, or maybe update status's XML parsing - but those are inherently self-contained operations that aren't happening during normal page rendering, so the arbitrary callable issue doesn't really come up there in the same way.

Production build 0.71.5 2024