[PP-1] Ensure asynchronous tasks that run beyond the length of a request have the chance to complete before process exit

Created on 3 March 2024, about 1 year ago

Problem/Motivation

With πŸ“Œ Add revoltphp/event-loop dependency to core Active we enable core and contrib developers to easily kick-off asynchronous tasks. In a lot of use-cases (e.g. BigPipe) where the request code waits for the tasks that were kicked-off to be completed this doesn't matter because the tasks are shorter than the request. However it is also possible to kick-off tasks (e.g. search engine indexing) where the result is not important for the current request.

In the case that the tasks take longer to complete than it takes to send the response for a request and to terminate the kernel, in the current Drupal implementation, the tasks may get discarded as the request process exits.

To demonstrate this and other Revolt related things I've created a Revolt Playground repository on GitHub. For this problem compare the two examples in the Bootstrap demo.

Steps to reproduce

Proposed resolution

Add EventLoop::run() to the end of the index.php file so that async tasks that were started but not yet completed before we get to that line have the opportunity to complete.

This placement is under the assumption that $kernel->terminate($request, $response); does not throw away any data that these asynchronous tasks may need.

Remaining tasks

This is postponed on πŸ“Œ Add revoltphp/event-loop dependency to core Active

  • Agree on the placement of EventLoop::run or optionally provide a way for async tasks to signal they need to complete before $kernel->terminate($request, $response); and provide a synchronisation point in Drupal core
  • Create the merge request
  • Write the release notes

User interface changes

API changes

Data model changes

Release notes snippet

πŸ“Œ Task
Status

Active

Version

11.0 πŸ”₯

Component
BaseΒ  β†’

Last updated 43 minutes ago

Created by

πŸ‡³πŸ‡±Netherlands kingdutch

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @kingdutch
  • πŸ‡³πŸ‡±Netherlands kingdutch

    πŸ“Œ Add revoltphp/event-loop dependency to core Active has been merged so this is no longer blocked.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Briefly discussed this with @Kingdutch in slack. I think the last point in the request where something can or should run is at the very end of DrupalKernel::terminate().

    e.g. a needs termination service like automated cron could (and probably should) start off an async task - like sending an email from a queue could be async and interleave with other things, or cron might call out to something that doesn't know it's running in cron but makes use of async things. So it makes sense to make the point of no return after that rather than before.

  • πŸ‡³πŸ‡±Netherlands kingdutch

    Linking ✨ Use symfony/runtime for less bespoke bootstrap/compatibility with varied runtime environments Active while I try to figure out if it's a requirement or not. The feeling from initial trials of placing EventLoop::run in the kernel is that this shouldn't be a Kernel responsibility, but the runtime itself should handle the event loop.

    This is less obvious in a single request environment which Drupal currently primarily works in. However, it becomes more obvious when extrapolated to other environments like Drush or applications that handle multiple requests where there should still be a single event loop at the root of the application. The DrupalKernel is based on Symfony's HttpKernel whose job it is to "Turn a request into a response". However, this suggests that an application that handles multiple requests might want an instance of the kernel for each of them so that they're properly isolated. This becomes clearer when looking at TerminableInterface which terminates with a specific request/response pair.

    We could likely get to the EventLoop at root by just putting it in the index.php. However, that means everything that deals with Drupal needs to run EventLoop::run itself. That becomes much easier when standardising on a runtime pattern like Symfony's runtime.

  • πŸ‡¬πŸ‡§United Kingdom catch

    We could likely get to the EventLoop at root by just putting it in the index.php. However, that means everything that deals with Drupal needs to run EventLoop::run itself.

    If the event loop is in index.php, does that mean that e.g. drush is required to run it and will break without it, or does it mean that drush would need to implement it to support the event loop? If it's the latter, that seems fine to get things going?

    Alternatively if putting it in the kernel enables it everywhere, it feels like that would then enable us to move it to Symfony runtime transparently later.

  • πŸ‡³πŸ‡±Netherlands kingdutch

    If the event loop is in index.php, does that mean that e.g. drush is required to run it and will break without it, or does it mean that drush would need to implement it to support the event loop? If it's the latter, that seems fine to get things going?

    Anything that ends before the request would be fine. So if you use Revolt to schedule multiple async tasks and then suspend until they're completed, then that works fine. This is the case for most tasks (like async database queries) that are a dependency for the final request to be sent.

    Tasks that run longer than the request lifecycle would be killed by the process exiting because the EventLoop::run() is needed to block the process. This would be the case for things like optimistic cache warmups that were scheduled before the request completed but are not yet done.

    To support those kinds of tasks, Drush would have to implement it itself.

    Alternatively if putting it in the kernel enables it everywhere, it feels like that would then enable us to move it to Symfony runtime transparently later.

    I guess you're right. The biggest issue I had with putting it in the Kernel would be that we'd still block the possibility for people to create a long running process that might handle multiple requests at the same time, since in that case each request would call EventLoop::run() with its kernel, which would conflict.

    With that in mind I prioritised ✨ Use symfony/runtime for less bespoke bootstrap/compatibility with varied runtime environments Active . I hope that there's some support to get that finalised and merged and then this can go on top relatively quickly as an adjustment to DrupalKernelRunner. Drush could then either provide its own runtime for an application, or Drupal could provide one for e.g. ✨ CLI entry point in Drupal Core (Dex) Needs review that Drush could also benefit from.

    After that has landed, they can provide the modified kernel or application and we're free to provide any environment changes that may be needed without requiring them to adjust code.

Production build 0.71.5 2024