Exceptions in batch no longer are shown on the page when Javascript is disabled

Issue created by @tedbow
Merge request !4957Issue #3392196: Exceptions in batch no longer are shown on the page → (Open) created by tedbow
Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
Not currently mergeable.
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
Custom Commands Failed
Comment almost 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
uploading a patch version to demonstrate that this passes against 10.1.x 🤞🏻
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
CI aborted
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
2 fail
Comment almost 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
8 pass
Comment almost 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
CI error
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
CI error
Comment almost 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
CI error
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
CI error
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
CI error
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
CI error
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
Build Successful
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
3 pass, 3 fail
Status changed to Needs review almost 2 years ago7:40pm 16 October 2023
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
Build Successful
Comment almost 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
Build Successful
Comment almost 2 years ago →
System Message
The last submitted patch, 7: 3392196-exceptions-in-batch-7.patch, failed testing. View results →
Status changed to Needs work almost 2 years ago7:50pm 16 October 2023
Comment almost 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
515 pass, 1 fail
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
491 pass
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
512 pass, 3 fail
Comment almost 2 years ago →
🇺🇸United States xjm
This blocks AU in core and is therefore critical.
Comment almost 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
@xjm ok thanks. I thought it might be critical

i updated the summary to explain the different test failures on the different branches

I started to debug this but so far I have not figured it out.
Comment almost 2 years ago →
🇺🇸United States phenaproxima Massachusetts
@tedbow and I spent some time tracking this down with git bisect. We discovered that #3295790] broke the functional test, but not the functional JavaScript test. That leads us to believe there are actually two bugs here.

We don't know what's breaking the functional JS test, but we did discover that, in the functional test, the problem is that, once you start the batch job, the output from Drupal is just...cut off. You get an incomplete snippet of an HTML page. If you modify ProcessingTest so that it looks like this:

+ $edit = ['batch' => 'batch_8']; + $this->drupalGet('batch-test'); + $this->submitForm($edit, 'Submit'); + $this->assertSession()->assertNoEscaped('<'); + print_r($this->getSession()->getPage()->getContent());
You'll see the first bit of an HTML page, just cut off somewhere.

You can do one of two things to make the test pass:

Disable \Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader() (modify getSubscribedEvents() to do this).

In _batch_progress_page(), remove the call to ob_start().

In conclusion, we don't know what the actual bug is here...but it seems to have something to do with output buffering initiated by the batch system not playing nicely with what's happening in FinishResponseSubscriber::setContentLengthHeader()...if the batch operation callback raises an exception.

We're hoping someone who knows more than us can quickly identify the nature of the problem, based on what we've discovered here.
Comment almost 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Comment almost 2 years ago →
🇺🇸United States phenaproxima Massachusetts
I did some digging around through the core HTTP system and I discovered something. This gets right to the heart of how the kernel works.

I thought a bit about what I said in #12:

You'll see the first bit of an HTML page, just cut off somewhere

I also noticed what happens when there's an exception while producing a response -- other event subscribers are invoked, handling the \Symfony\Component\HttpKernel\KernelEvents::EXCEPTION event.

\Drupal\Core\EventSubscriber\FinalExceptionSubscriber::onException() is one of the handlers invoked. It does this:

$content = $this->t('The website encountered an unexpected error. Try again later.'); $content .= $message ? '<br><br>' . $message : ''; $response = new Response($content, 500, ['Content-Type' => $content_type]); // ...some probably-irrelevant stuff $event->setResponse($response);
So it's replacing the response we were going to send with a new response containing the details of the exception.

That made me wonder: is it possible that the Content-Length header is being set using the exception response, rather than the actual response?? That would explain the discrepancy.

To test this, I did this in \Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader():

if ($response instanceof StreamedResponse || $response->getStatusCode() >= 400) { return; }
...and lo, the functional test passed.

And also, this might help explain why removing the ob_start() call in _batch_progress_page() fixes it -- because output buffering still sends the headers right away! PHP's ob_start documentation is subtle, but clear on this point (emphasis mine):

This function will turn output buffering on. While output buffering is active no output is sent from the script (other than headers), instead the output is stored in an internal buffer.

Content-Length is one of the headers. So maybe that's going out early, with a too-small value that it's deriving from the exception response.

I know these thoughts aren't very organized, but this might explain, at least partially, what's going on.
Comment almost 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Crediting @tedbow for writing the test coverage, and myself for the debugging efforts.

🇺🇸United States phenaproxima Massachusetts

Did a little more investigating:

I modified \Drupal\Core\EventSubscriber\FinalExceptionSubscriber::onException() to add a X-Testing: yesh header to the response.

Then I put this into ProcessingTest::testBatchForm():

print_r($this->getSession()->getResponseHeaders());
print_r($this->getSession()->getPage()->getContent());
$this->assertSession()->linkExists('the error page');

Here's the test output:

Array
(
    [Server] => Array
        (
            [0] => nginx/1.25.2
        )

    [Content-Type] => Array
        (
            [0] => text/html; charset=UTF-8
        )

    [Transfer-Encoding] => Array
        (
            [0] => chunked
        )

    [Connection] => Array
        (
            [0] => keep-alive
        )

    [X-Powered-By] => Array
        (
            [0] => PHP/8.1.24
        )

    [Cache-Control] => Array
        (
            [0] => must-revalidate, no-cache, private
        )

    [Date] => Array
        (
            [0] => Fri, 20 Oct 2023 14:24:01 GMT
        )

    [X-Testing] => Array
        (
            [0] => yesh
        )

    [Content-language] => Array
        (
            [0] => en
        )

    [X-Content-Type-Options] => Array
        (
            [0] => nosniff
        )

    [X-Frame-Options] => Array
        (
            [0] => SAMEORIGIN
        )

    [Expires] => Array
        (
            [0] => Sun, 19 Nov 1978 05:00:00 GMT
        )

    [X-Generator] => Array
        (
            [0] => Drupal 11 (https://www.drupal.org)
        )

)
<!DOCTYPE html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="utf-8" />
<meta name="Generator" content="Drupal 11 (https://www.drupal.org)" />
<meta name="MobileOptimized" content="width" />
<meta name="HandheldFriendly" content="true" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="icon" href="/core/misc/favicon.ico" type="image/vnd.microsoft.icon" />

    <title>Processing | Drupal</title>
    <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/ajax-progress.module.css?s2tzz3" />
<link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/align.module.css?s2tzz3" />
<link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/autocomplete-loading.module.css?s2tzz3" />
<link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/fieldgroup.module.css?s2tzz3" />
<link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/container-inline.module.css?s2tzz3" />
<link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/clearfix.module.css?s2tzz3" />
<link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/details.module.css?s2tzz3" />
<link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/hidden.module.css?s2tzz3" />
<link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/item-list.module.css?s2tzz3" />
<link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/js.module.css?s2tzz3" />
<link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/nowrap.module.css?s2tzz3" />
<link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/position-container.module.css?s2tzz3" />
<link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/progress.module.css?s2tzz3" />
<link r

(Yes, that's the end of the output from the server. It just stops there.)

So what I'm seeing here is the output of the regular batch page, but the headers from the exception response!

Is the output buffer getting flushed in some way before the exception response is handed to the kernel?

Comment almost 2 years ago →
🇺🇸United States phenaproxima Massachusetts
I finally have a complete, and satisfying, explanation for what is breaking the functional test!

First of all, this may not actually be breaking real sites in the real world. I don't know if you'd be able to reproduce this in manual testing (I doubt it).

Why? Because we are running afoul of a quirk in cURL -- it enforces the Content-Length header. What this means that if the server tells cURL "hey, I'm about to send you 500 bytes", but you actually send it 1000, cURL will give you back the first 500 bytes, and throw out the rest! (There does not seem to be a way to override this behavior from PHP.)

Now, previously Drupal didn't send the Content-Length header with its responses. That changed in 🐛 Post-response task running (destructable services) are actually blocking; add test coverage and warn for common misconfiguration Fixed .

The problem is that the new \Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader() event handler doesn't account for the possibility that output buffering might be active, and that there might already be stuff in the output buffer!

Anything in the output buffer will already be sent in the response. The batch system explicitly activates output buffering in _batch_process_page(), specifically for error handling, and sends some output. It's a very old and clunky form of error handling. (yes, very old - the lines concerned were written in #127539: batch - progressive operations (à la update.php) → , which was committed by Gábor in 2007. The year I started using Drupal. Sheesh.)

Thus, the sequence of events emerges:

You start your batch job. _batch_process_page() turns on output buffering, and puts some output into it:

ob_start(); // ...some other stuff... print $fallback;

The batch operation throws an exception!

The exception bubbles up to the Drupal kernel, setting off a new chain of processing that eventually ends up in \Drupal\Core\EventSubscriber\FinalExceptionSubscriber::onException(), which in turn replaces the entire response with the standard HTTP 500 error ("the website has encountered an unexpected error...") that we know and love.

The response, having been replaced, is handed off to the response event subscribers. Eventually we enter \Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader(), which dutifully sets the Content-Length header...but it's using the length of the exception response, not the original response.

Which, okay...except that, because output buffering is active, and there's stuff in the buffer we've effectively already sent part of the original response!

Therefore, we are actually sending a too-small Content-Length header.

In the functional test, cURL -- which is the engine underlying all requests made to the test site -- dutifully receives the Content-Length header, and reads that many bytes from Drupal, disregarding everything after that.

But since the Content-Length isn't accounting for the already-sent data (the stuff that was in the output buffer)...it's a lie. cURL cuts it off, the response is malformed, and the test fails hard.

This kind of shit is why they pay me the big bucks.

I think the proper fix, at least for the functional test, is to make \Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader() account for the possibility that output buffering is active. You can do this:

$length = strlen($response->getContent()); if (ob_get_level()) { $length += strlen(ob_get_contents()); } $response->headers->set('Content-Length', $length, TRUE);
...and the test passes.
Comment almost 2 years ago →
🇨🇭Switzerland znerol
Let's get that straight:

The _batch_process() function is called repeatedly in order to perform the work of a batch.

If the browser supports JavaScript (i.e., pretty much always), _batch_process() is called from _batch_do().

If the browser doesn't support JavaScript, _batch_process() is called from _batch_page()

In the non-JS case (i.e., the fallback case which virtually never happens on production), _batch_page() tries very much to present a prettier error page for fatal errors than the rest of Drupal.

Is that more or less correct? Couldn't we just remove that buggy fallback error mechanism from _batch_page()? Basically anything between (and including) ob_start() and ob_end_clean() and replace it with [$percentage, $message, $label] = _batch_process();.
Comment almost 2 years ago →
🇺🇸United States phenaproxima Massachusetts
That seems correct to me, but I am absolutely not an authority on the batch system, so I don't know if it's appropriate to remove that error handling code. I'd like to see it go, personally, but a more knowledgeable person than I should make the final decision.

I guess the downside is that, if the batch job fails outright, you don't get any kind of error page - I believe you would just get a WSOD if error reporting was silenced. (But we could confirm or deny that with a test -- and the batch system's test coverage is lacking, which is why this problem was not caught before.)

The upshot here is that adjusting the batch system would not solve the root problem, which is that the Content-Length provided by FinishResponseSubscriber can be flat-out wrong if an uncaught exception is thrown while output buffering is activated.
Comment almost 2 years ago →
🇨🇭Switzerland znerol
I agree @phenaproxima, FinishResponseSubscriber::setContentLengthHeader() was introduced to optimize the performance of pages where some significant workload is executing in the terminate event. For sure we do not lose too much if this optimization is only applied to successful page loads.
Comment almost 2 years ago →
🇺🇸United States bnjmnm Ann Arbor, MI
Info on the FunctionalJavaScript test failure:

The test failure is in fact due to a change introduced 🐛 Ajax errors are not communicated through the UI Fixed .

Prior to that change, AJAX errors were only reported in the JS console. Nothing reported in the Drupal UI. This was addressed by adding core/drupal.message as a dependency to core/drupal.ajax

Apparently core/drupal.message assumes theres a '#type' => 'status_messages' render element on the page. \Drupal\Tests\system\FunctionalJavascript\Batch\ProcessingTest is failing because it expects the markup that would be provided by the status_messages render element.

Clearly something needs to be addressed in core/drupal.message since JS libraries can't declare dependencies on render elements. It's safe to say that isn't something that needs to be addressed in this issue's scope. Adding a '#type' => 'status_messages' element to the test page should get the FJS test working, and the underlying issue has its own issue now 🐛 The core/drupal.message library requires a status_messages render element Active
Comment almost 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
CI error
Comment almost 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
@bnjmnm thanks for the explanation!

Adding a '#type' => 'status_messages' element to the test page should get the FJS test working,

We actually aren't using a test page this if failing on the regular batch system. We just use a test form to start the batch process. But then it gets directed to the core batch controller. Thats is why this would cause a problem any core or contrib usage of the batch form system when JS enabled(also without JS but the other reasons described on this issue), any exception that happened in batch callbacks would no longer show up unless they specifically were caught and handled in the callbacks.

So this just needed '#type' => 'status_messages' in \Drupal\system\Controller\BatchController::batchPage see https://git.drupalcode.org/project/drupal/-/merge_requests/4957/diffs#9d... in the MR

This gets the JS test passing for locally. I pushed up this change to see if we have any core tests that will fail if the status message element. I wouldn't think so but you never know.

I think we should make fixing the JS version its own issue if it is really this simple
Comment almost 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Since it's not super clear how to fix this yet...this need not be a hard blocker for Automatic Updates.

Automatic Updates could replace the finish_response_subscriber with its own version that disables the Content-Length header if output buffering is enabled. Then when this issue is resolved, it could remove its special subscriber and that would be that.
Comment almost 2 years ago →
🇬🇧United Kingdom catch
If we revert 🐛 Post-response task running (destructable services) are actually blocking; add test coverage and warn for common misconfiguration Fixed the workaround shouldn't be necessary, but we could potentially keep this open specifically to change the batch error handling.
Comment almost 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
30,420 pass, 8 fail
Comment over 1 year ago →
🇺🇸United States tedbow Ithaca, NY, USA
Created 🐛 Exceptions in batch no longer are shown on the page when Javascript is disabled Needs work for the javascript issue
Comment over 1 year ago →
🇺🇸United States tedbow Ithaca, NY, USA
wrong issue
Comment over 1 year ago →
🇺🇸United States tedbow Ithaca, NY, USA
Updating the title to reflect this issue only affects non-JS users

🐛 Exceptions in batch no longer are shown on the page: Javascript error Fixed handles JS users

but this issue still would affect all non-JS test, which we have a lot of in ✨ Add Alpha level Experimental Automatic Updates module Needs work
Comment over 1 year ago →
🇬🇧United Kingdom catch
There's a green MR on 🐛 Only set content-length header in specific situations Fixed now, is that issue sufficient to unblock automatic updates, or is there more here that's not related to the content length changs?
Status changed to Postponed over 1 year ago9:20am 31 December 2023
Comment over 1 year ago →
🇬🇧United Kingdom catch
Postponing this on 🐛 Only set content-length header in specific situations Fixed but assuming that issue fixes this bug, we should still keep this issue open to add the additional test coverage.
Status changed to Needs work over 1 year ago9:15pm 4 January 2024
Comment over 1 year ago →
🇺🇸United States tedbow Ithaca, NY, USA
Need to check if the tests now pass since 🐛 Only set content-length header in specific situations Fixed is fixed
Comment over 1 year ago →
🇺🇸United States damienmckenna NH, USA
Is the test coverage the only thing not already committed from 🐛 Only set content-length header in specific situations Fixed ?
Comment 12 months ago →
🇬🇧United Kingdom catch
@tedbow has this fixed the automatic updates tests?

Exceptions in batch no longer are shown on the page when Javascript is disabled

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Merge Requests

!4957Exceptions in batch no longer are shown on the page when Javascript is disabled
Open

Comments & Activities

Exceptions in batch no longer are shown on the page when Javascript is disabled

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Merge Requests

!4957Exceptions in batch no longer are shown on the page when Javascript is disabledOpen

Comments & Activities

!4957Exceptions in batch no longer are shown on the page when Javascript is disabled
Open