Exceptions in batch no longer are shown on the page when Javascript is disabled

Created on 6 October 2023, about 2 years ago
Updated 14 September 2024, about 1 year ago

Problem/Motivation

Discovered while working on Add Alpha level Experimental Automatic Updates module Needs work . We have functional tests for our updates via the form that use the batch system. We test that if there are exception in the process they are shown.

There seems to be 2 problems going on that are demonstrated in the tests that ran on the patch on #9

  1. In 10.0.x both the Functional and FunctionalJavascript tests pass. So the functionality works with both non-js and js users
  2. In 10.1.x the Functional tests pass but the FunctionalJavascript does not pass.

    When testing this manually with a browser that has javascript enabled(unlike the functional test).
    There is javascript error

    Uncaught TypeError: Cannot read properties of null (reading 'removeAttribute')
    at Drupal.Message.defaultWrapper (message.js?v=11.0-dev:43:17)
    at new Drupal.Message (message.js?v=11.0-dev:26:46)
    at new Drupal.AjaxError (ajax.js?v=11.0-dev:184:35)
    at Object.error (progress.js?v=11.0-dev:157:25)
    at c (jquery.min.js?v=3.7.0:2:25266)
    at Object.fireWith [as rejectWith] (jquery.min.js?v=3.7.0:2:26015)
    at l (jquery.min.js?v=3.7.0:2:77746)
    at XMLHttpRequest. (jquery.min.js?v=3.7.0:2:80204)

    Here is the lines

    wrapper = document.querySelector('[data-drupal-messages-fallback]');
            wrapper.removeAttribute('data-drupal-messages-fallback');
            wrapper.setAttribute('data-drupal-messages', '');
            wrapper.classList.remove('hidden');
    

    so it appears document.querySelector('[data-drupal-messages-fallback]'); does not return a result.

  3. On 11.x both tests fail.

    When testing with Javascript disabled
    on it appears to briefly so the progress and then a white screen

Proposed resolution

The functional JavaScript test's failure has a different cause than the functional test's failure. This issue only deals with the functional test. The JavaScript test has its own issue; see #21 for details.

The cause of the functional test failure is a conflict between output buffering and the Content-Length header that gets added to every response as of 🐛 Post-response task running (destructable services) are actually blocking; add test coverage and warn for common misconfiguration Fixed . The good news is, it's probably only breaking tests, not real sites, since the problem also involves a specific quirk of cURL and how it handles Content-Length headers. (Real-world browsers probably do not behave this way, although I'm just speculating here.)

A full explanation of the bug may be found in #17.

The proposed resolution is, as yet, uncertain. There are several possible paths forward, but framework manager review is probably needed to decide the right approach.

Remaining tasks

Test coverage is already written, so we just need to decide on the right fix here and implement it.

User interface changes

None.

API changes

TBD, but none expected.

Data model changes

None.

Release notes snippet

TBD

🐛 Bug report
Status

Needs work

Version

11.0 🔥

Component
Batch 

Last updated about 2 months ago

Created by

🇺🇸United States tedbow Ithaca, NY, USA

Live updates comments and jobs are added and updated live.
  • Needs framework manager review

    It is used to alert the framework manager core committer(s) that an issue significantly impacts (or has the potential to impact) multiple subsystems or represents a significant change or addition in architecture or public APIs, and their signoff is needed (see the governance policy draft for more information). If an issue significantly impacts only one subsystem, use Needs subsystem maintainer review instead, and make sure the issue component is set to the correct subsystem.

Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @tedbow
  • Open on Drupal.org →
    Environment: PHP 8.2 & MySQL 8
    last update about 2 years ago
    Not currently mergeable.
  • last update about 2 years ago
    Custom Commands Failed
  • 🇺🇸United States tedbow Ithaca, NY, USA

    uploading a patch version to demonstrate that this passes against 10.1.x 🤞🏻

  • last update about 2 years ago
    CI aborted
  • last update about 2 years ago
    2 fail
  • 🇺🇸United States tedbow Ithaca, NY, USA
  • last update about 2 years ago
    8 pass
  • 🇺🇸United States tedbow Ithaca, NY, USA
  • last update about 2 years ago
    CI error
  • last update about 2 years ago
    CI error
  • 🇺🇸United States tedbow Ithaca, NY, USA
  • last update about 2 years ago
    CI error
  • last update about 2 years ago
    CI error
  • last update about 2 years ago
    CI error
  • last update about 2 years ago
    CI error
  • last update about 2 years ago
    Build Successful
  • last update about 2 years ago
    3 pass, 3 fail
  • Status changed to Needs review about 2 years ago
  • last update about 2 years ago
    Build Successful
  • 🇺🇸United States tedbow Ithaca, NY, USA
  • last update about 2 years ago
    Build Successful
  • Status changed to Needs work about 2 years ago
  • 🇺🇸United States tedbow Ithaca, NY, USA
  • last update about 2 years ago
    515 pass, 1 fail
  • last update about 2 years ago
    491 pass
  • last update about 2 years ago
    512 pass, 3 fail
  • 🇺🇸United States xjm

    This blocks AU in core and is therefore critical.

  • 🇺🇸United States tedbow Ithaca, NY, USA

    @xjm ok thanks. I thought it might be critical

    i updated the summary to explain the different test failures on the different branches

    I started to debug this but so far I have not figured it out.

  • 🇺🇸United States phenaproxima Massachusetts

    @tedbow and I spent some time tracking this down with git bisect. We discovered that #3295790] broke the functional test, but not the functional JavaScript test. That leads us to believe there are actually two bugs here.

    We don't know what's breaking the functional JS test, but we did discover that, in the functional test, the problem is that, once you start the batch job, the output from Drupal is just...cut off. You get an incomplete snippet of an HTML page. If you modify ProcessingTest so that it looks like this:

    +    $edit = ['batch' => 'batch_8'];
    +    $this->drupalGet('batch-test');
    +    $this->submitForm($edit, 'Submit');
    +    $this->assertSession()->assertNoEscaped('<');
    +    print_r($this->getSession()->getPage()->getContent());
    

    You'll see the first bit of an HTML page, just cut off somewhere.

    You can do one of two things to make the test pass:

    • Disable \Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader() (modify getSubscribedEvents() to do this).
    • In _batch_progress_page(), remove the call to ob_start().

    In conclusion, we don't know what the actual bug is here...but it seems to have something to do with output buffering initiated by the batch system not playing nicely with what's happening in FinishResponseSubscriber::setContentLengthHeader()...if the batch operation callback raises an exception.

    We're hoping someone who knows more than us can quickly identify the nature of the problem, based on what we've discovered here.

  • 🇺🇸United States phenaproxima Massachusetts
  • 🇺🇸United States phenaproxima Massachusetts

    I did some digging around through the core HTTP system and I discovered something. This gets right to the heart of how the kernel works.

    I thought a bit about what I said in #12:

    You'll see the first bit of an HTML page, just cut off somewhere

    I also noticed what happens when there's an exception while producing a response -- other event subscribers are invoked, handling the \Symfony\Component\HttpKernel\KernelEvents::EXCEPTION event.

    \Drupal\Core\EventSubscriber\FinalExceptionSubscriber::onException() is one of the handlers invoked. It does this:

        $content = $this->t('The website encountered an unexpected error. Try again later.');
        $content .= $message ? '<br><br>' . $message : '';
        $response = new Response($content, 500, ['Content-Type' => $content_type]);
    
        // ...some probably-irrelevant stuff
    
        $event->setResponse($response);
    

    So it's replacing the response we were going to send with a new response containing the details of the exception.

    That made me wonder: is it possible that the Content-Length header is being set using the exception response, rather than the actual response?? That would explain the discrepancy.

    To test this, I did this in \Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader():

        if ($response instanceof StreamedResponse || $response->getStatusCode() >= 400) {
          return;
        }
    

    ...and lo, the functional test passed.

    And also, this might help explain why removing the ob_start() call in _batch_progress_page() fixes it -- because output buffering still sends the headers right away! PHP's ob_start documentation is subtle, but clear on this point (emphasis mine):

    This function will turn output buffering on. While output buffering is active no output is sent from the script (other than headers), instead the output is stored in an internal buffer.

    Content-Length is one of the headers. So maybe that's going out early, with a too-small value that it's deriving from the exception response.

    I know these thoughts aren't very organized, but this might explain, at least partially, what's going on.

  • 🇺🇸United States phenaproxima Massachusetts

    Crediting @tedbow for writing the test coverage, and myself for the debugging efforts.

  • 🇺🇸United States phenaproxima Massachusetts

    Did a little more investigating:

    1. I modified \Drupal\Core\EventSubscriber\FinalExceptionSubscriber::onException() to add a X-Testing: yesh header to the response.
    2. Then I put this into ProcessingTest::testBatchForm():
      print_r($this->getSession()->getResponseHeaders());
      print_r($this->getSession()->getPage()->getContent());
      $this->assertSession()->linkExists('the error page');
      

    Here's the test output:

    Array
    (
        [Server] => Array
            (
                [0] => nginx/1.25.2
            )
    
        [Content-Type] => Array
            (
                [0] => text/html; charset=UTF-8
            )
    
        [Transfer-Encoding] => Array
            (
                [0] => chunked
            )
    
        [Connection] => Array
            (
                [0] => keep-alive
            )
    
        [X-Powered-By] => Array
            (
                [0] => PHP/8.1.24
            )
    
        [Cache-Control] => Array
            (
                [0] => must-revalidate, no-cache, private
            )
    
        [Date] => Array
            (
                [0] => Fri, 20 Oct 2023 14:24:01 GMT
            )
    
        [X-Testing] => Array
            (
                [0] => yesh
            )
    
        [Content-language] => Array
            (
                [0] => en
            )
    
        [X-Content-Type-Options] => Array
            (
                [0] => nosniff
            )
    
        [X-Frame-Options] => Array
            (
                [0] => SAMEORIGIN
            )
    
        [Expires] => Array
            (
                [0] => Sun, 19 Nov 1978 05:00:00 GMT
            )
    
        [X-Generator] => Array
            (
                [0] => Drupal 11 (https://www.drupal.org)
            )
    
    )
    <!DOCTYPE html>
    <html lang="en" dir="ltr">
      <head>
        <meta charset="utf-8" />
    <meta name="Generator" content="Drupal 11 (https://www.drupal.org)" />
    <meta name="MobileOptimized" content="width" />
    <meta name="HandheldFriendly" content="true" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <link rel="icon" href="/core/misc/favicon.ico" type="image/vnd.microsoft.icon" />
    
        <title>Processing | Drupal</title>
        <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/ajax-progress.module.css?s2tzz3" />
    <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/align.module.css?s2tzz3" />
    <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/autocomplete-loading.module.css?s2tzz3" />
    <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/fieldgroup.module.css?s2tzz3" />
    <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/container-inline.module.css?s2tzz3" />
    <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/clearfix.module.css?s2tzz3" />
    <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/details.module.css?s2tzz3" />
    <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/hidden.module.css?s2tzz3" />
    <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/item-list.module.css?s2tzz3" />
    <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/js.module.css?s2tzz3" />
    <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/nowrap.module.css?s2tzz3" />
    <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/position-container.module.css?s2tzz3" />
    <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/progress.module.css?s2tzz3" />
    <link r
    

    (Yes, that's the end of the output from the server. It just stops there.)

    So what I'm seeing here is the output of the regular batch page, but the headers from the exception response!

    Is the output buffer getting flushed in some way before the exception response is handed to the kernel?

  • 🇺🇸United States phenaproxima Massachusetts

    I finally have a complete, and satisfying, explanation for what is breaking the functional test!

    First of all, this may not actually be breaking real sites in the real world. I don't know if you'd be able to reproduce this in manual testing (I doubt it).

    Why? Because we are running afoul of a quirk in cURL -- it enforces the Content-Length header. What this means that if the server tells cURL "hey, I'm about to send you 500 bytes", but you actually send it 1000, cURL will give you back the first 500 bytes, and throw out the rest! (There does not seem to be a way to override this behavior from PHP.)

    Now, previously Drupal didn't send the Content-Length header with its responses. That changed in 🐛 Post-response task running (destructable services) are actually blocking; add test coverage and warn for common misconfiguration Fixed .

    The problem is that the new \Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader() event handler doesn't account for the possibility that output buffering might be active, and that there might already be stuff in the output buffer!

    Anything in the output buffer will already be sent in the response. The batch system explicitly activates output buffering in _batch_process_page(), specifically for error handling, and sends some output. It's a very old and clunky form of error handling. (yes, very old - the lines concerned were written in #127539: batch - progressive operations (à la update.php) , which was committed by Gábor in 2007. The year I started using Drupal. Sheesh.)

    Thus, the sequence of events emerges:

    1. You start your batch job. _batch_process_page() turns on output buffering, and puts some output into it:
          ob_start();
          // ...some other stuff...
          print $fallback;
      
    2. The batch operation throws an exception!
    3. The exception bubbles up to the Drupal kernel, setting off a new chain of processing that eventually ends up in \Drupal\Core\EventSubscriber\FinalExceptionSubscriber::onException(), which in turn replaces the entire response with the standard HTTP 500 error ("the website has encountered an unexpected error...") that we know and love.
    4. The response, having been replaced, is handed off to the response event subscribers. Eventually we enter \Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader(), which dutifully sets the Content-Length header...but it's using the length of the exception response, not the original response.
    5. Which, okay...except that, because output buffering is active, and there's stuff in the buffer we've effectively already sent part of the original response!
    6. Therefore, we are actually sending a too-small Content-Length header.
    7. In the functional test, cURL -- which is the engine underlying all requests made to the test site -- dutifully receives the Content-Length header, and reads that many bytes from Drupal, disregarding everything after that.
    8. But since the Content-Length isn't accounting for the already-sent data (the stuff that was in the output buffer)...it's a lie. cURL cuts it off, the response is malformed, and the test fails hard.

    This kind of shit is why they pay me the big bucks.

    I think the proper fix, at least for the functional test, is to make \Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader() account for the possibility that output buffering is active. You can do this:

        $length = strlen($response->getContent());
        if (ob_get_level()) {
          $length += strlen(ob_get_contents());
        }
        $response->headers->set('Content-Length', $length, TRUE);
    

    ...and the test passes.

  • 🇨🇭Switzerland znerol

    Let's get that straight:

    • The _batch_process() function is called repeatedly in order to perform the work of a batch.
    • If the browser supports JavaScript (i.e., pretty much always), _batch_process() is called from _batch_do().
    • If the browser doesn't support JavaScript, _batch_process() is called from _batch_page()
    • In the non-JS case (i.e., the fallback case which virtually never happens on production), _batch_page() tries very much to present a prettier error page for fatal errors than the rest of Drupal.

    Is that more or less correct? Couldn't we just remove that buggy fallback error mechanism from _batch_page()? Basically anything between (and including) ob_start() and ob_end_clean() and replace it with [$percentage, $message, $label] = _batch_process();.

  • 🇺🇸United States phenaproxima Massachusetts

    That seems correct to me, but I am absolutely not an authority on the batch system, so I don't know if it's appropriate to remove that error handling code. I'd like to see it go, personally, but a more knowledgeable person than I should make the final decision.

    I guess the downside is that, if the batch job fails outright, you don't get any kind of error page - I believe you would just get a WSOD if error reporting was silenced. (But we could confirm or deny that with a test -- and the batch system's test coverage is lacking, which is why this problem was not caught before.)

    The upshot here is that adjusting the batch system would not solve the root problem, which is that the Content-Length provided by FinishResponseSubscriber can be flat-out wrong if an uncaught exception is thrown while output buffering is activated.

  • 🇨🇭Switzerland znerol

    I agree @phenaproxima, FinishResponseSubscriber::setContentLengthHeader() was introduced to optimize the performance of pages where some significant workload is executing in the terminate event. For sure we do not lose too much if this optimization is only applied to successful page loads.

  • 🇺🇸United States bnjmnm Ann Arbor, MI

    Info on the FunctionalJavaScript test failure:

    • The test failure is in fact due to a change introduced 🐛 Ajax errors are not communicated through the UI Fixed .
    • Prior to that change, AJAX errors were only reported in the JS console. Nothing reported in the Drupal UI. This was addressed by adding core/drupal.message as a dependency to core/drupal.ajax
    • Apparently core/drupal.message assumes theres a '#type' => 'status_messages' render element on the page. \Drupal\Tests\system\FunctionalJavascript\Batch\ProcessingTest is failing because it expects the markup that would be provided by the status_messages render element.

    Clearly something needs to be addressed in core/drupal.message since JS libraries can't declare dependencies on render elements. It's safe to say that isn't something that needs to be addressed in this issue's scope. Adding a '#type' => 'status_messages' element to the test page should get the FJS test working, and the underlying issue has its own issue now 🐛 The core/drupal.message library requires a status_messages render element Active

  • 🇺🇸United States phenaproxima Massachusetts
  • last update about 2 years ago
    CI error
  • 🇺🇸United States tedbow Ithaca, NY, USA

    @bnjmnm thanks for the explanation!

    Adding a '#type' => 'status_messages' element to the test page should get the FJS test working,

    We actually aren't using a test page this if failing on the regular batch system. We just use a test form to start the batch process. But then it gets directed to the core batch controller. Thats is why this would cause a problem any core or contrib usage of the batch form system when JS enabled(also without JS but the other reasons described on this issue), any exception that happened in batch callbacks would no longer show up unless they specifically were caught and handled in the callbacks.

    So this just needed '#type' => 'status_messages' in \Drupal\system\Controller\BatchController::batchPage see https://git.drupalcode.org/project/drupal/-/merge_requests/4957/diffs#9d... in the MR

    This gets the JS test passing for locally. I pushed up this change to see if we have any core tests that will fail if the status message element. I wouldn't think so but you never know.

    I think we should make fixing the JS version its own issue if it is really this simple

  • 🇺🇸United States phenaproxima Massachusetts

    Since it's not super clear how to fix this yet...this need not be a hard blocker for Automatic Updates.

    Automatic Updates could replace the finish_response_subscriber with its own version that disables the Content-Length header if output buffering is enabled. Then when this issue is resolved, it could remove its special subscriber and that would be that.

  • 🇬🇧United Kingdom catch

    If we revert 🐛 Post-response task running (destructable services) are actually blocking; add test coverage and warn for common misconfiguration Fixed the workaround shouldn't be necessary, but we could potentially keep this open specifically to change the batch error handling.

  • 🇺🇸United States phenaproxima Massachusetts
  • last update about 2 years ago
    30,420 pass, 8 fail
  • 🇺🇸United States tedbow Ithaca, NY, USA
  • 🇺🇸United States tedbow Ithaca, NY, USA

    wrong issue

  • 🇺🇸United States tedbow Ithaca, NY, USA

    Updating the title to reflect this issue only affects non-JS users

    🐛 Exceptions in batch no longer are shown on the page: Javascript error Fixed handles JS users

    but this issue still would affect all non-JS test, which we have a lot of in Add Alpha level Experimental Automatic Updates module Needs work

  • 🇬🇧United Kingdom catch

    There's a green MR on 🐛 Only set content-length header in specific situations Fixed now, is that issue sufficient to unblock automatic updates, or is there more here that's not related to the content length changs?

  • Status changed to Postponed almost 2 years ago
  • 🇬🇧United Kingdom catch

    Postponing this on 🐛 Only set content-length header in specific situations Fixed but assuming that issue fixes this bug, we should still keep this issue open to add the additional test coverage.

  • Status changed to Needs work almost 2 years ago
  • 🇺🇸United States tedbow Ithaca, NY, USA

    Need to check if the tests now pass since 🐛 Only set content-length header in specific situations Fixed is fixed

  • 🇺🇸United States damienmckenna NH, USA

    Is the test coverage the only thing not already committed from 🐛 Only set content-length header in specific situations Fixed ?

  • 🇬🇧United Kingdom catch

    @tedbow has this fixed the automatic updates tests?

Production build 0.71.5 2024