Add functional test that proves there is reasonable UX whenever a stage event subscriber has an exception

Issue created by @wim leers
Comment over 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
Comment over 2 years ago →
🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺
Comment over 2 years ago →
🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺
I now wonder if this overlaps with 🐛 Unhandled Composer Stager exceptions leave the update process in an indeterminate state Fixed — AFAICT this would provide the end-to-end test coverage that that issue would exercise. But perhaps that's overkill?

If 100% of composer interactions are happening via php-tuf/composer-stager, then that ought to be sufficient.

We know for a fact that ComposerInspector will call the installed composer binary, but … we already have ComposerValidator to ensure that foundational things work fine.

So … I'm now thinking that this is perhaps overkill? OTOH, 100% of our current Build tests test only the happy path. That just seems wrong. The sole exception is \Drupal\Tests\automatic_updates\Build\CoreUpdateTest::assertReadOnlyFileSystemError(), but that tests an error triggered by one of our validators.

Conclusion: I think this would still be valuable.
Comment over 2 years ago →
🇺🇸United States effulgentsia
Do we currently have tests for, or does 🐛 Unhandled Composer Stager exceptions leave the update process in an indeterminate state Fixed include exceptions for, a timeout (e.g., due to a short max-execution-time) that occurs during either copying files from active to stage, or from stage to active? That latter one would be especially concerning if it ends up only copying half of the updated files from stage to active.
Comment over 2 years ago →
🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺
I know of no such tests. \Drupal\Tests\automatic_updates\Functional\DeleteExistingUpdateTest::testDeleteExistingUpdate() comes closest, but does definitely not simulate what happens in case of an execution timeout. (Or for that matter: a sudden power loss, which would behave equivalently.)
Assigned to omkar.podey
Comment over 2 years ago →
🇮🇳India omkar.podey
Comment over 2 years ago →
🇮🇳India omkar.podey
Comment over 2 years ago →
🇺🇸United States effulgentsia
I remembered about 📌 Inform the site admin if Composer Stager's committer failed, possibly leaving the site in a half-updated state Fixed , so I asked a similar question as #5 in #3291770-19: Inform the site admin if Composer Stager's committer failed, possibly leaving the site in a half-updated state → as well.
@omkarpodey opened merge request.
Comment over 2 years ago →
🇮🇳India omkar.podey

So thought of multiple ways but everything had some issue or other so i finally settled on making individual install packages with event subscriber that fail the test on different events.

So we can start testing in a loop for the last firing event like PostDestroyEvent and then PreDestroyEvent but the problem i'm stuck on is clearing the errors caused by the previous fired events , is there any way to clear them in a build test
Comment over 2 years ago →
🇮🇳India omkar.podey
Okay so from last what I remember, we settled on writing functional tests to observe the hard composer failures through UI, and dropped the idea of having a build test for patch failing to apply from composer-patches as it doesn't seem possible as described in 📌 Add build test to test cweagans/composer-patches end-to-end Closed: won't fix
Comment over 2 years ago →
🇮🇳India omkar.podey
Comment over 2 years ago →
System Message
omkar.podey → closed merge request !747
@omkarpodey opened merge request.
Comment over 2 years ago →
🇮🇳India omkar.podey

There is no point in testing PreDestroyEvent, PostDestroyEvent, PostApply as they are not subscribed by any validators with composer operations.

It's very difficult to test PreCreateEvent, I tried to set the priorities to -999 and PHP_INT_MAX - 1 but either I run into a StageException or the StageManipulator .

StatusCheckEvent also needs to be tested
Status changed to Needs review over 2 years ago12:32pm 24 March 2023
Comment over 2 years ago →
🇮🇳India omkar.podey
Issue was unassigned.
Comment over 2 years ago →
🇮🇳India omkar.podey
Status changed to Needs work over 2 years ago8:54am 27 March 2023
Comment over 2 years ago →
🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺
Assigned to omkar.podey
Comment over 2 years ago →
🇮🇳India omkar.podey
Issue was unassigned.
Status changed to Needs review over 2 years ago9:40am 28 March 2023
Comment over 2 years ago →
🇮🇳India omkar.podey
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Status changed to Needs work over 2 years ago2:25pm 4 April 2023
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
I like the general idea and approach here. I feel like it could be a little clearer in some places...
Status changed to Needs review over 2 years ago10:45am 5 April 2023
Comment over 2 years ago →
🇮🇳India omkar.podey
Status changed to Needs work over 2 years ago1:03pm 5 April 2023
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
This is shaping up, but there are still some outstanding things.

The other question is something I must discuss with @tedbow: adding a functional test proves...what, exactly? We already have a bunch of functional tests that prove we catch validation errors during batch jobs. A Composer hard failure is the same basic problem. If that's all we're trying to prove -- that a validation error is treated the same way as a Composer problem -- then this test will do the trick. But I want to confirm that's what we're aiming for here.
Assigned to tedbow
Comment over 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
@phenaproxima was asking me about this current approach which I think is good but want to look at to make sure I understand
Issue was unassigned.
Comment over 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
I liked the general approach. I left a couple of comments but didn't give it a full review
Assigned to omkar.podey
Comment over 2 years ago →
🇮🇳India omkar.podey
Issue was unassigned.
Status changed to Needs review over 2 years ago9:25am 10 April 2023
Comment over 2 years ago →
🇮🇳India omkar.podey
Assigned to phenaproxima
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Self-assigning for review.
Assigned to omkar.podey
Status changed to Needs work over 2 years ago2:13pm 10 April 2023
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
OK. As I dove into this to try and understand it, my biggest question quickly emerged: why are we only testing some events? ANY event subscriber could run into a hard Composer failure, so why don't we test all events over the full update life cycle?

If it's just that we didn't ask for that when this issue was filed, then I'm asking for it now :) But if there's a specific why we don't test all events, then that will need an explanation and comment.

Sorry to kick this back, but I think this test coverage is useful and we should have it be as robust as we can make it.
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
@omkar.podey explained why we're not testing all events, way back in #16:

There is no point in testing PreDestroyEvent, PostDestroyEvent, PostApply as they are not subscribed by any validators with composer operations.

It's very difficult to test PreCreateEvent, I tried to set the priorities to -999 and PHP_INT_MAX - 1 but either I run into a StageException or the StageManipulator

I do not agree with this reasoning. Although it's true that we don't really use PreDestroy, PostDestroy, and PostApply ourselves, external code may very well listen to those events, and interact with Composer. Therefore, we need to prove that we're correctly handling hard Composer failures for those events as well.

As for PreCreateEvent...we need to test that one as well. If it's hard to test, I think we should probably pair on it and do what's needed to ensure it's tested. One way suggested by @tedbow in Slack:

I think for PreCreateEvent you don’t have actually use a subscriber at all. Just get the user to the start page and then delete the composer.json manually. then have the user click “Update”
Issue was unassigned.
Status changed to Needs review over 2 years ago12:32pm 11 April 2023
Comment over 2 years ago →
🇮🇳India omkar.podey
Comment over 2 years ago →
🇮🇳India omkar.podey
I was able to cover all the events the PostDestroyEvent is the only one that doesn't lead to nicely formatted page.
Status changed to Needs work over 2 years ago1:08pm 11 April 2023
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Thanks for the expanded coverage! I think this still needs some cleaning up for clarity's sake, but this is definitely the right level of coverage I was hoping for.
Comment over 2 years ago →
🇮🇳India omkar.podey
The two Events that stand out are

PreCreateEvent as it fails on ComposerNotReadyException where every other event fails on StageEventException.

PostDestroyEvent as it doesn't lead to a properly formatted page, just on another exception page with messageThis operation was already canceled.
Assigned to phenaproxima
Status changed to Needs review over 2 years ago3:23pm 12 April 2023
Comment over 2 years ago →
🇮🇳India omkar.podey
Comment over 2 years ago →
🇮🇳India omkar.podey
I know it's still not super clear but it does follow the flow of the events from up to down PreCreate to PostDestroy. I was able to remove another else if but that's about it for now.
Status changed to Needs work over 2 years ago5:18pm 12 April 2023
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
OK, the current failure is due to a pre-existing bug that this test has turned up! 🎉

Here's the problem: when an exception is thrown by PostDestroyEvent, the batch processor tries to return us to the UpdateReady form. The first thing that form does is try to claim the stage. However, because the stage has already been destroyed, the stage can't be claimed -- and thus, we get a WSOD.

I'm not exactly sure how to fix this, but it's an existing bug and therefore I'm thinking we should simply not test PostDestroyEvent in this issue, and file a follow-up to add that coverage and fix the bug.
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Comment over 2 years ago →
🇮🇳India omkar.podey
Everything else looks good also Comment#36 →
The failure during the PreCreateEvent is a ComposerNotReadyException while it's StageEventException for everything else , it's because of an unhandled call in \Drupal\automatic_updates\UpdateStage::begin where it uses composerInspector
Assigned to tedbow
Status changed to Needs review over 2 years ago3:00pm 13 April 2023
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Over to @tedbow for final review.
Status changed to Needs work over 2 years ago4:34pm 13 April 2023
Comment over 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
See MR comments
Status changed to Needs review over 2 years ago4:51pm 13 April 2023
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Addressed your feedback.
Status changed to Needs work over 2 years ago5:52pm 13 April 2023
Comment over 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
See MR comments

I think there was misunderstanding because the issue summary just quoted another issue but did not give the full context. But to be fair I think before this issue was worked on at all the issue summary should have been updated and in general we should not work on issues unless the issue summary actually say what should be done

Leaving assigned to myself because I want to see how the current test coverage overlaps with what we already have.
Comment over 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
Because there has been so much work on this issue it doesn't cover the original problem I think we should rescope this issue to what it actually covers and create a new issue for the original intent of the issue

The current test overlaps with \Drupal\Tests\automatic_updates\Functional\UpdateErrorTest::testStageDestroyedOnError()
Although that issue has

Tests that the update stage is destroyed if an error occurs during require.

It is not actually simulating an error in require but PostRequire, So again covering Events not actual error cause we require via composer.

So I think we could probably remove that test and fold it's assumptions into the current test.

It also overlaps with \Drupal\Tests\automatic_updates\Functional\UpdateErrorTest::testUpdateErrors
Particularly this section

// Make the validator throw an exception during pre-create. $error = new \Exception('The update exploded.'); TestSubscriber1::setException($error, PreCreateEvent::class);

For function tests I don't really think using \Drupal\automatic_updates_test\EventSubscriber\TestSubscriberComposerHardFailure::setEvent actually that much different than using \Drupal\package_manager_test_validation\EventSubscriber\TestSubscriber::setException but allow you to trigger an error at particular event. `TestSubscriberComposerHardFailure` just means this error with because by `Composer validate` but since we catch `Throwable` in StageBase::dispatch I don't think this will cause much of UX difference.

In the new issue when we cover the original problem I don't think we should use deleting the composer.json because the intent was to cover the case where we run a composer command in the stage. But deleting the composer.json has the side effect making every other composer interaction after that in the stage also fail
Comment over 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
Retitling

also 📌 If an error occurs during an attended process, delete the stage on behalf of the user Needs review is related because our test should check if the "Update" button is present once and "delete existing update" is not when you are redirected to the start after an error. If that is already the case we can close #3346644

Otherwise we can solve it in that issue
Assigned to phenaproxima
Comment over 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
Created 📌 Add functional test that proves there is reasonable UX whenever Composer Stager operations have a hard failure Fixed for original intent of this issue and assigned to @omkar.podey

The description here still needs to be updated. Removing the old one because we know that is not what the issue does
Comment over 2 years ago →
🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺
I don't understand. The scope is crystal clear to me. The original title Add functional test that proves there is reasonable UX whenever Composer has a hard failure is more precise than the one you specified in #48.

The whole point of this issue was to prove that there was no way that a Composer error could occur without the user getting informed about it through the UI. At any point in the stage lifecycle/during any interaction with Composer. Because that was not yet tested.

This issue discovered that that was not yet the case. Wonderful, that was the exact intent!

The scope ballooned because we tried to fix some of what was not yet working in this issue. That was reasonable at first, but perhaps that became too complex. Fine, then we should limit the scope to not fixing anything and just committing the parts that are already working, and spinning out the fixes into a follow-up.

What am I missing here? 🤔
Comment over 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
re #50

What am I missing here? 🤔

Will explain.

This issue discovered that that was not yet the case. Wonderful, that was the exact intent!

The issue discovered any exception thrown by a StatusCheckEvent subscriber or a PostDestoryEvent subscriber would cause a UX problem. I just happened to be that test forced a Composer related error error. The fact that test forced a Composer exception in particular made it very unclear that any exception would cause this same problem.

The scope is crystal clear to me. The original title Add functional test that proves there is reasonable UX whenever Composer has a hard failure

To me "whenever" is not clear. It was obviously interpreted by @omkar.podey and @phenaproxima as in any StageEvent subscriber.
But from the original quote from the issue on the issue summary

whenever composer has a hard failure — no matter whether that is due to a patch failing to apply from composer-patches or whether that is from a networking error, wrong requirement, whatever

The test on this issue would not have caught the original problem we were looking to test "patch failing to apply from composer-patches". This does not happen in a StageEvent subscriber. This happens when we pass control over to composer stager to do the update in \Drupal\package_manager\StageBase::require specifically

$this->stager->stage($command, $active_dir, $stage_dir, NULL, $timeout);

I don't think you can then say that "whenever" meant the StageEvent subscribers and $this->stager->stage() because then you are not covering the direct calls to \Drupal\package_manager\ComposerInspector::getInstalledPackagesList() in \Drupal\automatic_updates\Form\UpdateReady and \Drupal\automatic_updates\UpdateStage::begin places that also could have a hard Composer failure.

I think "whenever" is broad and even if all of these places were intended it is too much for one test.

The fact nobody has mentioned the 2 direct calls to \Drupal\package_manager\ComposerInspector::getInstalledPackagesList() that I mentioned in this comment(I had just now thought of them) also makes the point that original issue summary did not have enough context and was not clear.
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Assigned to tedbow
Status changed to Needs review over 2 years ago5:35pm 14 April 2023
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Open in Jenkins → Open on Drupal.org →
Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
last update over 2 years ago
713 pass
Open in Jenkins → Open on Drupal.org →
Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
last update over 2 years ago
Custom Commands Failed
Assigned to wim leers
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
Thanks to a brilliant insight from Wim during a Zoom, I figured out why PostApplyEvent and PreDestroyEvent are failing: it's because StageBase::postApply() calls drupal_flush_all_caches(), which calls drupal_static_reset().

Our batch processor is storing the error messages in &$context, which gets reset by drupal_static_reset() -- basically, because it's a reference, the reset makes it point to memory that's no longer used by Drupal! So when we store error messages in it, we're really just throwing them into the abyss.

The solution, IMHO, is to store the error messages in the session, which is not affected by drupal_static_reset(). The batch processor already stores certain things in the session, so there's precedent for that. This makes PostApplyEvent and PreDestroyEvent behave properly, and renders 🐛 [PLACEHOLDER] Exception messages raised during BatchProcessor::postApply() are not propagated to the error page Closed: duplicate unnecessary.
Open in Jenkins → Open on Drupal.org →
Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
last update over 2 years ago
715 pass
Open in Jenkins → Open on Drupal.org →
Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
last update over 2 years ago
715 pass
Open in Jenkins → Open on Drupal.org →
Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
last update over 2 years ago
715 pass
Open in Jenkins → Open on Drupal.org →
Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
last update over 2 years ago
715 pass
Open in Jenkins → Open on Drupal.org →
Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
last update over 2 years ago
715 pass
Open in Jenkins → Open on Drupal.org →
Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
last update over 2 years ago
715 pass
Open in Jenkins → Open on Drupal.org →
Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
last update over 2 years ago
715 pass
Status changed to RTBC over 2 years ago7:35pm 17 April 2023
Comment over 2 years ago →
🇺🇸United States tedbow Ithaca, NY, USA
@phenaproxima thanks! Looks good!
Open in Jenkins → Open on Drupal.org →
Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
last update over 2 years ago
715 pass
Comment over 2 years ago →
System Message

phenaproxima → committed af056ae6 on 3.0.x authored by omkar.podey →
Issue #3338666 by omkar.podey, phenaproxima, tedbow, Wim Leers: Add...
Status changed to Fixed over 2 years ago7:48pm 17 April 2023
Comment over 2 years ago →
🇺🇸United States phenaproxima Massachusetts
What a nightmare this was to get done. Finally! Now, to the follow-ups...
Comment over 2 years ago →
🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺
See you in 🐛 If a PostDestroyEvent subscriber throws an exception, users are redirected to a WSOD Fixed & 📌 If an error occurs during an attended process, delete the stage on behalf of the user Needs review !
Issue was unassigned.
Comment over 2 years ago →
🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺
Comment about 2 years ago →
System Message
Automatically closed - issue fixed for 2 weeks with no activity.

Add functional test that proves there is reasonable UX whenever a stage event subscriber has an exception

Problem/Motivation

Proposed resolution

Comments & Activities