If an error occurs during an attended process, delete the stage on behalf of the user

Created on 8 March 2023, over 1 year ago
Updated 7 July 2023, over 1 year ago

Problem/Motivation

In 🐛 Unhandled Composer Stager exceptions leave the update process in an indeterminate state Fixed we are wrapping the line $this->beginner->begin($active_dir, $stage_dir, new PathList($event->getExcludedPaths()), NULL, $timeout); and $this->stager->stage($command, $active_dir, $stage_dir, NULL, $timeout);in try-catch structures which can handle Composer Stager exceptions.

But, if an event subscriber to the stage lifecycle throws an exception, it can result in bad, incongruous UX. For example, if a PostRequireEvent subscriber throws an exception, the user is still redirected to the confirmation page to continue the update (/admin/automatic-update-ready). They will see the error message, and they won't be able to continue the update...but if a PreRequireEvent subscriber throws an exception, users are redirected to the start page!

Worse yet, there may not be any way to recover from that error. The only recourse is to press the "Delete existing update" button and try again, and hope it works this time. That sucks.

Also there is a inconsistency in handling of composer stager exceptions in StageBase that is the require() catches all throwable exceptions and then catch it checks for the two classes( InvalidArgumentException and PreconditionException) while in apply() there are two catches one for the two classes and other for the all throwable exceptions

Proposed resolution

When an exception is thrown by a stage lifecycle event subscriber, we should destroy the stage on behalf of the user, before returning them to an appropriate page for them to continue their work. Change StageBase::create(), StageBase::require() and StageBase::postApply() to destroy stage when there is an error in dispatching the events. With this change the user will now won't have to click "Cancel update" button on the UpdateReady when there is an error because we now have destroyed the stage for them.

Currently BatchProcessor::postApply() throws the exception which causes it the site to be redirected to UpdateReady form when there is an error. Change this behaviour to match with code>BatchProcessor::clean() because if we got successfully till BatchProcessor::postApply() the update is already done and we don't need to redirect it back to update ready form when there is an error.

Also make handling of composer stager exceptions in StageBase::require() consistent with StageBase::apply().

Remaining tasks

file a follow-up to repeat the solution for Automatic Updates Extensions

📌 Task
Status

Needs review

Version

3.0

Component

Code

Created by

🇮🇳India kunal.sachdev

Live updates comments and jobs are added and updated live.
  • Usability

    Makes Drupal easier to use. Preferred over UX, D7UX, etc.

Sign in to follow issues

Comments & Activities

  • Issue created by @kunal.sachdev
  • 🇺🇸United States tedbow Ithaca, NY, USA
  • 🇺🇸United States phenaproxima Massachusetts

    Merging in some stuff from 🐛 If a PostRequireEvent subscriber throws an exception, users are redirected to the confirmation page Closed: duplicate , which I'm closing as a duplicate.

  • 🇺🇸United States phenaproxima Massachusetts

    Re-titling.

  • Assigned to kunal.sachdev
  • Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    Not currently mergeable.
  • @kunalsachdev opened merge request.
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    796 pass, 1 fail
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    800 pass
  • 🇮🇳India kunal.sachdev

    I think the problem that we want to solve in this issue is that if there are any errors during PostCreateEvent or PostRequireEvent then we want to destroy the stage in UpdateStage for the user. If that's right then why shouldn’t we do it in StageBase so that the same thing will not have to be repeated for Project Browser @tedbow, @phenaproxima? Actually @Wim Leers and I were discussing about this issue and he suggested this.

  • Status changed to Needs work over 1 year ago
  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    Actually @Wim Leers and I were discussing about this issue and he suggested this.

    Yep.

    The rationale for doing it only for UpdateStage is not documented on this issue nor on 🐛 Unhandled Composer Stager exceptions leave the update process in an indeterminate state Fixed .

    Furthermore, @tedbow wrote at #3277034:

    […] because we know in Automatic Updates we don't have a UI to recover from an error here to in Updater we would still want to delete the stage

    … but implementing it in UpdateStage affects both manually triggered updates (i.e. "user is present") as well unattended updates, because CronUpdateStage extends UpdateStage.

    Tagging for clarifying this.

    @kunal.sachdev: Can you think of a reason why it should be implemented as you did, @kunal.sachdev? Like I explained on our meeting earlier, that'd help inform the discussion.

    #8 is too succinct: it doesn't try to articulate a rationale for the current approach. I expected more detail.

  • 🇮🇳India kunal.sachdev

    @Wim Leers I don't think I had any reason for implementing it this way, I just followed the same way I implemented this in 🐛 Unhandled Composer Stager exceptions leave the update process in an indeterminate state Fixed in which @tedbow while reviewing initially requested changes in UpdateStage( Updater) https://git.drupalcode.org/project/automatic_updates/-/merge_requests/63... but later in https://git.drupalcode.org/project/automatic_updates/-/merge_requests/74... he suggested we should do this in separate issue and that is the current issue.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    800 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    789 pass, 3 fail
  • 🇮🇳India kunal.sachdev

    Discussed with @tedbow and @Wim Leers and decided that we' should destroy the stage in the StageBase only. I reverted the changes in UpdateStage and added the logic to destroy the stage in StageBase itself.

  • 🇮🇳India kunal.sachdev

    \Drupal\Tests\package_manager\Kernel\LockFileValidatorTest::testLockFileChanged and \Drupal\Tests\package_manager\Kernel\LockFileValidatorTest::testLockFileDeleted is on Pre-require event while asserting status check results because now we are deleting stage on error during Pre-require event in StageBase see

        $on_error = $destroy_on_failure ? [$this, 'destroy'] : NULL;
    
        $this->dispatch(new PreRequireEvent($this, $runtime, $dev), $on_error);
    

    and we pass the same stage while checking the status check results in test see

    $stage = $this->assertResults([$result], $event_class);
    // A status check should agree that there is an error here.
    $this->assertStatusCheckResults([$result], $stage);
    

    but now that stage is not available the error is not added. I think this behaviour is right as already during pre-require event we have shown the error and destroy the stage so we should have error added during status-check event after that.

    So for now I think instead of

    $stage = $this->assertResults([$result], $event_class);
    // A status check should agree that there is an error here.
    $this->assertStatusCheckResults([$result], $stage);
    

    we should do

        $stage = $this->assertResults([$result], $event_class);
        $this->assertStatusCheckResults($event_class === PreRequireEvent::class
          // A status check should not surface this error, because the stage has
          // automatically been destroyed.
          ? []
          // A status check should agree that there is an error here.
          : [$result],
          $stage);
    

    because this method test pre-apply event also during which we don't destroy the stage and we should see error during status check event.

    Also while debugging this I found some inconsistencies handling composer stager exceptions in \Drupal\package_manager\StageBase::require and \Drupal\package_manager\StageBase::apply i.e. require() does

          try {
            $this->stager->stage($command, $active_dir, $stage_dir, NULL, $timeout);
          }
          catch (\Throwable $e) {
            // If the caught exception isn't InvalidArgumentException or
            // PreconditionException, a Composer operation was actually attempted,
            // and failed. The stage should therefore be destroyed, because it's in
            // an indeterminate and possibly unrecoverable state.
            assert(!$e instanceof InvalidArgumentException, sprintf("%s is calling Composer Stager with invalid arguments: this indicates a bug in your stage. The message was: %s.", get_class(), $e->getMessage()));
            assert(!$e instanceof InvalidArgumentException, sprintf("A precondition wasn't fulfilled while calling Composer Stager in %s: this indicates a bug in your stage. The message was: %s.", get_class(), $e->getMessage()));
            if ($destroy_on_failure) {
              $this->destroy();
            }
            elseif (!$e instanceof InvalidArgumentException && !$e instanceof PreconditionException) {
              $this->destroy();
            }
            $this->rethrowAsStageException($e);
          }
        };

    while apply() does

        try {
          $this->committer->commit($stage_dir, $active_dir, $paths_to_exclude, NULL, $timeout);
        }
        catch (InvalidArgumentException | PreconditionException $e) {
          // The commit operation has not started yet, so we can clear the failure
          // marker.
          $this->failureMarker->clear();
          $this->rethrowAsStageException($e);
        }
        catch (\Throwable $throwable) {
          // The commit operation may have failed midway through, and the site code
          // is in an indeterminate state. Release the flag which says we're still
          // applying, because in this situation, the site owner should probably
          // restore everything from a backup.
          $this->setNotApplying()();
          // Update the marker file with the information from the throwable.
          $this->failureMarker->write($this, $this->getFailureMarkerMessage(), $throwable);
          throw new ApplyFailedException($this, $this->failureMarker->getMessage(), $throwable->getCode(), $throwable);
        }
    

    . The inconsistency is that the require() catches all throwable exceptions and then catch it checks for the two classes( InvalidArgumentException and PreconditionException) while in apply() there are two catches one for the two classes and other for the all throwable exceptions. So tagging this issue as Needs architectural review.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    799 pass, 2 fail
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    800 pass
  • Issue was unassigned.
  • Status changed to Needs review over 1 year ago
  • 🇺🇸United States phenaproxima Massachusetts

    After reading #12, I too am wondering why they act differently. On its face, I can't see any reason for the discrepancy.

    I'm going to open a separate MR on this same issue fork to change it and see what breaks. I think that what apply() does is cleaner.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    800 pass
  • @phenaproxima opened merge request.
  • 🇺🇸United States phenaproxima Massachusetts

    Looks like changing require() to be consistent with the way apply() works won't break anything. So let's go ahead and make them consistent, either in this issue or a follow-up.

  • Assigned to kunal.sachdev
  • Status changed to Needs work over 1 year ago
  • 🇮🇳India kunal.sachdev

    I think we make require() and apply() consistent in this issue. So, updating the issue summary.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    802 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    802 pass
  • Issue was unassigned.
  • Status changed to Needs review over 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    810 pass
  • 🇺🇸United States tedbow Ithaca, NY, USA

    I don't think the "Proposed resolution" actually covers what this issue is doing. It talks about "lifecycle event subscriber" but we are only handling "require". Is that right?

    What happens if there is error on preApply()? In PostApply?

    I think we should handle the other events in this issue or rescope the "Proposed resolution" make follow-ups for other events. I think we should handle them here.

    Also even if we only handled require I think an issue like this would need functional test coverage

    from the summary

    For example, if a PostRequireEvent subscriber throws an exception, the user is still redirected to the confirmation page to continue the update (/admin/automatic-update-ready).

    If that is not the case anymore we should have test coverage confirming this.

    \Drupal\Tests\automatic_updates\Functional\UpdateErrorTest::testUpdateStoppedByEventSubscriber might be good place for this test coverage if we don't have it anywhere else. We actually already have todo in the test referencing this issue

    In the "Proposed resolution" we should detail what the changes will be from the user's perspective now that they we will be deleting the stage for them. Will they be redirected to different page? Will they not have to click the "Cancel update" button on the UpdateReady from because we now have destroyed the stage for them?

    for instance I debugged testUpdateStoppedByEventSubscriber. If an error happens PostApply the user is redirected back to UpdateReady and there only option is "Cancel update". They also get the LockFileValidator error because the stage has already been applied. In this case I think it seems reasonable that we would just delete the stage for them.

    testUpdateStoppedByEventSubscriber also showed me that if an error happens in PreApply the user is also redirected back to UpdateReady but they have the option to "Continue" or "Cancel Update". In this case I think it is reasonable not to delete the stage automatically because the validation error might be something they could fix. But we may want to look into updating \Drupal\automatic_updates\BatchProcessor::finishCommit to tell the user they should try to resolve this issue if possible before continuing. That can be another issue.

    Also in PostCreate if we get an error we don't delete the stage. There is already a @todo for this in the test to solve it in this issue.

    Basically we need to figure out the situations where the only thing the user can do is "Cancel Update" and start over. In those situations we should probably be deleting the stage for them.

    I am including a patch I used to output the page for all the test cases in testUpdateStoppedByEventSubscriber where the user is left. This can be used to determine when we should delete the stage.

    Also including the zip file of the debug output I got from the current MR with this patch. It has all 14 test cases. Some of them are probably fine but I think at least PostApply and PostCreate are not great UX

  • Assigned to kunal.sachdev
  • Status changed to Needs work over 1 year ago
  • 🇺🇸United States tedbow Ithaca, NY, USA
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    749 pass, 6 fail
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    766 pass, 4 fail
  • 🇮🇳India kunal.sachdev

    Created [#3371076} as per discussion with @tedbow during scrum.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    766 pass, 4 fail
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    770 pass, 2 fail
  • 🇮🇳India kunal.sachdev

    I think we still need to destroy stage in \Drupal\automatic_updates\BatchProcessor::stage() for the case if $this->stager->stage($command, $active_dir, $stage_dir, NULL, $timeout); in \Drupal\package_manager\StageBase::require() with InvalidArgumentException or PreconditionException where we don't destroy stage.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    811 pass
  • Issue was unassigned.
  • Status changed to Needs review over 1 year ago
  • Assigned to kunal.sachdev
  • Status changed to Needs work over 1 year ago
  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    A new quarter, time to increase the expectations! 🤓 I know you can do it! 😊

    1. #21 added . Why is this marked if the tag is still present? Are the necessary tests present now?
    2. I spotted a few small issues in the MR, and have a few questions.
    3. Finally: I tagged this for something very specific in #9 >3 weeks ago. That clarification is still necessary. If you're able to push this forward, you're also able to address this, which would make reviews much more efficient.
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    811 pass
  • 🇮🇳India kunal.sachdev

    Created 📌 Tell the user to try resolve the issue if possible before continuing if there is an error in PreApply Active as per #22 📌 If an error occurs during an attended process, delete the stage on behalf of the user Needs review

    testUpdateStoppedByEventSubscriber also showed me that if an error happens in PreApply the user is also redirected back to UpdateReady but they have the option to "Continue" or "Cancel Update". In this case I think it is reasonable not to delete the stage automatically because the validation error might be something they could fix. But we may want to look into updating \Drupal\automatic_updates\BatchProcessor::finishCommit to tell the user they should try to resolve this issue if possible before continuing. That can be another issue.
  • 🇮🇳India kunal.sachdev

    The test is already there i.e. \Drupal\Tests\automatic_updates\Functional\UpdateErrorTest::testUpdateStoppedByEventSubscriber which is updated accordingly.

  • 🇮🇳India kunal.sachdev

    Updated the issue summary.

  • Issue was unassigned.
  • Status changed to Needs review over 1 year ago
  • Status changed to Needs work over 1 year ago
  • 🇺🇸United States tedbow Ithaca, NY, USA

    looking good

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    Custom Commands Failed
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 8
    last update over 1 year ago
    817 pass
  • Status changed to Needs review over 1 year ago
Production build 0.71.5 2024