- Issue created by @tedbow
- Merge request !4957Issue #3392196: Exceptions in batch no longer are shown on the page → (Open) created by tedbow
- Open on Drupal.org →Environment: PHP 8.2 & MySQL 8last update
about 1 year ago Not currently mergeable. - last update
about 1 year ago Custom Commands Failed - 🇺🇸United States tedbow Ithaca, NY, USA
uploading a patch version to demonstrate that this passes against 10.1.x 🤞🏻
- last update
about 1 year ago CI aborted - last update
about 1 year ago 2 fail - last update
about 1 year ago 8 pass - last update
about 1 year ago CI error - last update
about 1 year ago CI error - last update
about 1 year ago CI error - last update
about 1 year ago CI error - last update
about 1 year ago CI error - last update
about 1 year ago CI error - last update
about 1 year ago Build Successful - last update
about 1 year ago 3 pass, 3 fail - Status changed to Needs review
about 1 year ago 7:40pm 16 October 2023 - last update
about 1 year ago Build Successful - last update
about 1 year ago Build Successful The last submitted patch, 7: 3392196-exceptions-in-batch-7.patch, failed testing. View results →
- Status changed to Needs work
about 1 year ago 7:50pm 16 October 2023 - last update
about 1 year ago 515 pass, 1 fail - last update
about 1 year ago 491 pass - last update
about 1 year ago 512 pass, 3 fail - 🇺🇸United States tedbow Ithaca, NY, USA
@xjm ok thanks. I thought it might be critical
i updated the summary to explain the different test failures on the different branches
I started to debug this but so far I have not figured it out.
- 🇺🇸United States phenaproxima Massachusetts
@tedbow and I spent some time tracking this down with
git bisect
. We discovered that #3295790] broke the functional test, but not the functional JavaScript test. That leads us to believe there are actually two bugs here.We don't know what's breaking the functional JS test, but we did discover that, in the functional test, the problem is that, once you start the batch job, the output from Drupal is just...cut off. You get an incomplete snippet of an HTML page. If you modify ProcessingTest so that it looks like this:
+ $edit = ['batch' => 'batch_8']; + $this->drupalGet('batch-test'); + $this->submitForm($edit, 'Submit'); + $this->assertSession()->assertNoEscaped('<'); + print_r($this->getSession()->getPage()->getContent());
You'll see the first bit of an HTML page, just cut off somewhere.
You can do one of two things to make the test pass:
- Disable
\Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader()
(modifygetSubscribedEvents()
to do this). - In
_batch_progress_page()
, remove the call toob_start()
.
In conclusion, we don't know what the actual bug is here...but it seems to have something to do with output buffering initiated by the batch system not playing nicely with what's happening in
FinishResponseSubscriber::setContentLengthHeader()
...if the batch operation callback raises an exception.We're hoping someone who knows more than us can quickly identify the nature of the problem, based on what we've discovered here.
- Disable
- 🇺🇸United States phenaproxima Massachusetts
I did some digging around through the core HTTP system and I discovered something. This gets right to the heart of how the kernel works.
I thought a bit about what I said in #12:
You'll see the first bit of an HTML page, just cut off somewhere
I also noticed what happens when there's an exception while producing a response -- other event subscribers are invoked, handling the
\Symfony\Component\HttpKernel\KernelEvents::EXCEPTION
event.\Drupal\Core\EventSubscriber\FinalExceptionSubscriber::onException()
is one of the handlers invoked. It does this:$content = $this->t('The website encountered an unexpected error. Try again later.'); $content .= $message ? '<br><br>' . $message : ''; $response = new Response($content, 500, ['Content-Type' => $content_type]); // ...some probably-irrelevant stuff $event->setResponse($response);
So it's replacing the response we were going to send with a new response containing the details of the exception.
That made me wonder: is it possible that the Content-Length header is being set using the exception response, rather than the actual response?? That would explain the discrepancy.
To test this, I did this in
\Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader()
:if ($response instanceof StreamedResponse || $response->getStatusCode() >= 400) { return; }
...and lo, the functional test passed.
And also, this might help explain why removing the
ob_start()
call in_batch_progress_page()
fixes it -- because output buffering still sends the headers right away! PHP'sob_start
documentation is subtle, but clear on this point (emphasis mine):This function will turn output buffering on. While output buffering is active no output is sent from the script (other than headers), instead the output is stored in an internal buffer.
Content-Length
is one of the headers. So maybe that's going out early, with a too-small value that it's deriving from the exception response.I know these thoughts aren't very organized, but this might explain, at least partially, what's going on.
- 🇺🇸United States phenaproxima Massachusetts
Crediting @tedbow for writing the test coverage, and myself for the debugging efforts.
- 🇺🇸United States phenaproxima Massachusetts
Did a little more investigating:
- I modified
\Drupal\Core\EventSubscriber\FinalExceptionSubscriber::onException()
to add aX-Testing: yesh
header to the response. - Then I put this into ProcessingTest::testBatchForm():
print_r($this->getSession()->getResponseHeaders()); print_r($this->getSession()->getPage()->getContent()); $this->assertSession()->linkExists('the error page');
Here's the test output:
Array ( [Server] => Array ( [0] => nginx/1.25.2 ) [Content-Type] => Array ( [0] => text/html; charset=UTF-8 ) [Transfer-Encoding] => Array ( [0] => chunked ) [Connection] => Array ( [0] => keep-alive ) [X-Powered-By] => Array ( [0] => PHP/8.1.24 ) [Cache-Control] => Array ( [0] => must-revalidate, no-cache, private ) [Date] => Array ( [0] => Fri, 20 Oct 2023 14:24:01 GMT ) [X-Testing] => Array ( [0] => yesh ) [Content-language] => Array ( [0] => en ) [X-Content-Type-Options] => Array ( [0] => nosniff ) [X-Frame-Options] => Array ( [0] => SAMEORIGIN ) [Expires] => Array ( [0] => Sun, 19 Nov 1978 05:00:00 GMT ) [X-Generator] => Array ( [0] => Drupal 11 (https://www.drupal.org) ) ) <!DOCTYPE html> <html lang="en" dir="ltr"> <head> <meta charset="utf-8" /> <meta name="Generator" content="Drupal 11 (https://www.drupal.org)" /> <meta name="MobileOptimized" content="width" /> <meta name="HandheldFriendly" content="true" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <link rel="icon" href="/core/misc/favicon.ico" type="image/vnd.microsoft.icon" /> <title>Processing | Drupal</title> <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/ajax-progress.module.css?s2tzz3" /> <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/align.module.css?s2tzz3" /> <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/autocomplete-loading.module.css?s2tzz3" /> <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/fieldgroup.module.css?s2tzz3" /> <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/container-inline.module.css?s2tzz3" /> <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/clearfix.module.css?s2tzz3" /> <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/details.module.css?s2tzz3" /> <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/hidden.module.css?s2tzz3" /> <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/item-list.module.css?s2tzz3" /> <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/js.module.css?s2tzz3" /> <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/nowrap.module.css?s2tzz3" /> <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/position-container.module.css?s2tzz3" /> <link rel="stylesheet" media="all" href="/core/themes/stable9/css/system/components/progress.module.css?s2tzz3" /> <link r
(Yes, that's the end of the output from the server. It just stops there.)
So what I'm seeing here is the output of the regular batch page, but the headers from the exception response!
Is the output buffer getting flushed in some way before the exception response is handed to the kernel?
- I modified
- 🇺🇸United States phenaproxima Massachusetts
I finally have a complete, and satisfying, explanation for what is breaking the functional test!
First of all, this may not actually be breaking real sites in the real world. I don't know if you'd be able to reproduce this in manual testing (I doubt it).
Why? Because we are running afoul of a quirk in cURL -- it enforces the
Content-Length
header. What this means that if the server tells cURL "hey, I'm about to send you 500 bytes", but you actually send it 1000, cURL will give you back the first 500 bytes, and throw out the rest! (There does not seem to be a way to override this behavior from PHP.)Now, previously Drupal didn't send the
Content-Length
header with its responses. That changed in 🐛 Post-response task running (destructable services) are actually blocking; add test coverage and warn for common misconfiguration Fixed .The problem is that the new
\Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader()
event handler doesn't account for the possibility that output buffering might be active, and that there might already be stuff in the output buffer!Anything in the output buffer will already be sent in the response. The batch system explicitly activates output buffering in
_batch_process_page()
, specifically for error handling, and sends some output. It's a very old and clunky form of error handling. (yes, very old - the lines concerned were written in #127539: batch - progressive operations (à la update.php) → , which was committed by Gábor in 2007. The year I started using Drupal. Sheesh.)Thus, the sequence of events emerges:
- You start your batch job.
_batch_process_page()
turns on output buffering, and puts some output into it:
ob_start(); // ...some other stuff... print $fallback;
- The batch operation throws an exception!
- The exception bubbles up to the Drupal kernel, setting off a new chain of processing that eventually ends up in
\Drupal\Core\EventSubscriber\FinalExceptionSubscriber::onException()
, which in turn replaces the entire response with the standard HTTP 500 error ("the website has encountered an unexpected error...") that we know and love. - The response, having been replaced, is handed off to the response event subscribers. Eventually we enter
\Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader()
, which dutifully sets theContent-Length
header...but it's using the length of the exception response, not the original response. - Which, okay...except that, because output buffering is active, and there's stuff in the buffer we've effectively already sent part of the original response!
- Therefore, we are actually sending a too-small
Content-Length
header. - In the functional test, cURL -- which is the engine underlying all requests made to the test site -- dutifully receives the
Content-Length
header, and reads that many bytes from Drupal, disregarding everything after that. - But since the
Content-Length
isn't accounting for the already-sent data (the stuff that was in the output buffer)...it's a lie. cURL cuts it off, the response is malformed, and the test fails hard.
This kind of shit is why they pay me the big bucks.
I think the proper fix, at least for the functional test, is to make
\Drupal\Core\EventSubscriber\FinishResponseSubscriber::setContentLengthHeader()
account for the possibility that output buffering is active. You can do this:$length = strlen($response->getContent()); if (ob_get_level()) { $length += strlen(ob_get_contents()); } $response->headers->set('Content-Length', $length, TRUE);
...and the test passes.
- You start your batch job.
- 🇨🇭Switzerland znerol
Let's get that straight:
- The
_batch_process()
function is called repeatedly in order to perform the work of a batch. - If the browser supports JavaScript (i.e., pretty much always),
_batch_process()
is called from_batch_do()
. - If the browser doesn't support JavaScript,
_batch_process()
is called from_batch_page()
- In the non-JS case (i.e., the fallback case which virtually never happens on production),
_batch_page()
tries very much to present a prettier error page for fatal errors than the rest of Drupal.
Is that more or less correct? Couldn't we just remove that buggy fallback error mechanism from
_batch_page()
? Basically anything between (and including)ob_start()
andob_end_clean()
and replace it with[$percentage, $message, $label] = _batch_process();
. - The
- 🇺🇸United States phenaproxima Massachusetts
That seems correct to me, but I am absolutely not an authority on the batch system, so I don't know if it's appropriate to remove that error handling code. I'd like to see it go, personally, but a more knowledgeable person than I should make the final decision.
I guess the downside is that, if the batch job fails outright, you don't get any kind of error page - I believe you would just get a WSOD if error reporting was silenced. (But we could confirm or deny that with a test -- and the batch system's test coverage is lacking, which is why this problem was not caught before.)
The upshot here is that adjusting the batch system would not solve the root problem, which is that the Content-Length provided by FinishResponseSubscriber can be flat-out wrong if an uncaught exception is thrown while output buffering is activated.
- 🇨🇭Switzerland znerol
I agree @phenaproxima,
FinishResponseSubscriber::setContentLengthHeader()
was introduced to optimize the performance of pages where some significant workload is executing in the terminate event. For sure we do not lose too much if this optimization is only applied to successful page loads. - 🇺🇸United States bnjmnm Ann Arbor, MI
Info on the FunctionalJavaScript test failure:
- The test failure is in fact due to a change introduced 🐛 Ajax errors are not communicated through the UI Fixed .
- Prior to that change, AJAX errors were only reported in the JS console. Nothing reported in the Drupal UI. This was addressed by adding
core/drupal.message
as a dependency tocore/drupal.ajax
- Apparently
core/drupal.message
assumes theres a'#type' => 'status_messages'
render element on the page.\Drupal\Tests\system\FunctionalJavascript\Batch\ProcessingTest
is failing because it expects the markup that would be provided by thestatus_messages
render element.
Clearly something needs to be addressed in
core/drupal.message
since JS libraries can't declare dependencies on render elements. It's safe to say that isn't something that needs to be addressed in this issue's scope. Adding a'#type' => 'status_messages'
element to the test page should get the FJS test working, and the underlying issue has its own issue now 🐛 The core/drupal.message library requires a status_messages render element Active - last update
about 1 year ago CI error - 🇺🇸United States tedbow Ithaca, NY, USA
@bnjmnm thanks for the explanation!
Adding a '#type' => 'status_messages' element to the test page should get the FJS test working,
We actually aren't using a test page this if failing on the regular batch system. We just use a test form to start the batch process. But then it gets directed to the core batch controller. Thats is why this would cause a problem any core or contrib usage of the batch form system when JS enabled(also without JS but the other reasons described on this issue), any exception that happened in batch callbacks would no longer show up unless they specifically were caught and handled in the callbacks.
So this just needed
'#type' => 'status_messages'
in\Drupal\system\Controller\BatchController::batchPage
see https://git.drupalcode.org/project/drupal/-/merge_requests/4957/diffs#9d... in the MRThis gets the JS test passing for locally. I pushed up this change to see if we have any core tests that will fail if the status message element. I wouldn't think so but you never know.
I think we should make fixing the JS version its own issue if it is really this simple
- 🇺🇸United States phenaproxima Massachusetts
Since it's not super clear how to fix this yet...this need not be a hard blocker for Automatic Updates.
Automatic Updates could replace the finish_response_subscriber with its own version that disables the Content-Length header if output buffering is enabled. Then when this issue is resolved, it could remove its special subscriber and that would be that.
- 🇬🇧United Kingdom catch
If we revert 🐛 Post-response task running (destructable services) are actually blocking; add test coverage and warn for common misconfiguration Fixed the workaround shouldn't be necessary, but we could potentially keep this open specifically to change the batch error handling.
- last update
about 1 year ago 30,420 pass, 8 fail - 🇺🇸United States tedbow Ithaca, NY, USA
Created 🐛 Exceptions in batch no longer are shown on the page when Javascript is disabled Needs work for the javascript issue
- 🇺🇸United States tedbow Ithaca, NY, USA
Updating the title to reflect this issue only affects non-JS users
🐛 Exceptions in batch no longer are shown on the page: Javascript error Fixed handles JS users
but this issue still would affect all non-JS test, which we have a lot of in ✨ Add Alpha level Experimental Automatic Updates module Needs work
- 🇬🇧United Kingdom catch
There's a green MR on 🐛 Only set content-length header in specific situations Fixed now, is that issue sufficient to unblock automatic updates, or is there more here that's not related to the content length changs?
- Status changed to Postponed
12 months ago 9:20am 31 December 2023 - 🇬🇧United Kingdom catch
Postponing this on 🐛 Only set content-length header in specific situations Fixed but assuming that issue fixes this bug, we should still keep this issue open to add the additional test coverage.
- Status changed to Needs work
12 months ago 9:15pm 4 January 2024 - 🇺🇸United States tedbow Ithaca, NY, USA
Need to check if the tests now pass since 🐛 Only set content-length header in specific situations Fixed is fixed
- 🇺🇸United States DamienMcKenna NH, USA
Is the test coverage the only thing not already committed from 🐛 Only set content-length header in specific situations Fixed ?