For web cron updates run each stage life cycle phase in a different request

Created on 16 May 2023, over 1 year ago
Updated 13 June 2023, over 1 year ago

Problem/Motivation

Currently we run all stages of the update life cycle in 1 cron request. This might cause timeouts in some cases. See #3357632-2: [META] Update doc comment on BatchProcessor to specify why use the batch system for background info.

Proposed resolution

Cron is probably not the right mechanism for these unattended update considering that they could be split over 3 cron runs and the default value for Automated cron 3 hours? (see Remaining tasks)

Proposal

  1. Create a new service that subscribes to both KernelEvents::TERMINATE and KernelEvents::REQUEST
  2. On KernelEvents::TERMINATE if no update was in progress it would just check if new updates had been checked in the last X minutes(probably 1 hour default)
  3. If needs to check for updates it would check and if no updates return
  4. If there is an update to start it would do the "begin" stage and set a state value that "require" should be done in the next request with the stage id
  5. In KernelEvents::TERMINATE If there state value say that "require" should be done then it will do "require" and then set a state value that "apply" should be done in the next request and put the site in maintenance mode
  6. In KernelEvents::REQUEST it will check if there is state value that "apply" should be done.
    If so it will do "apply" and then do the sub-request to handle "post-apply" as \Drupal\automatic_updates\CronUpdateStage::triggerPostApply does now.
    When post-apply is finished it should take the site out of maintenance mode
  7. @todo In request that does "apply" nothing else should happen even the user an access the site in maintenance mode.

    Not sure what the best way to do this is. We could just set a response in KernelEvents::REQUEST and it looks like \Symfony\Component\HttpKernel\HttpKernel::handleRaw() will then just return the response instead of calling the controller that would normally handle the page. Will have to investigate how maintenance itself works

Deleting the stage

The above steps would have update applied but the stage would not be deleted. There are few ways we could handle deleting

  1. Just delete the stage in KernelEvents::REQUEST when we apply the update.

    Con: might take to long and have time out issues

  2. Flag the stage for deletion in KernelEvents::REQUEST after it has been applied and the actually delete it in the next request during KernelEvents::TERMINATE

    Con: if the next request is UI that is going to try to claim the stage they would get a notice that there is an existing stage

    We could mitigate this by having stages flag themselves as "ready to destroy" and then if another stage tried to claim it it would automatically be destroyed.

  3. Idea from @phenaproxima

    Change the way \Drupal\package_manager\StageBase::destroy() works to NOT actually delete the directory but flag it for deletion or rename with a DELETE_ prefix to avoid a more complex system. Because each stage directory has a unique name under a site unique staging directory leaving the directory will not affect the next use of Package Manager. Then in package_manager_cron we would actually do the directory deletion.

    This change should make the call to \Drupal\package_manager\StageBase::destroy() very quick. Then we could probably leave the delete in \Drupal\automatic_updates\CronUpdateStage::handlePostApply()

    This would have the side benefit of speeding up UI interactions. For instance in AutoUpdates we could probably combine \Drupal\automatic_updates\BatchProcessor::postApply() and BatchProcessor::clean() and do both post-apply and destroy in 1 step.

    Cons: can't think of any

Once we have 📌 Add Drush command to allow running cron updates via console and by a separate user, for defense-in-depth Fixed we could also document you should set up a cron job for this too. Making 2 external cron jobs is not twice as hard as making 1, considering for many people the hard part is figuring out how to make it at all. This allow people to set regular cron for 3 hours and ours for much more frequently

Previous Cron idea

Use the cron queue work system to run each stage in its own item in queue. We need to determine if the queue system would allow us to run all items in the request, for instance if we determine that create and apply were completed under a certain time.

There are number problems with using the cron system documented Remaining tasks.

Additionally if use the cron queue system it is run after all the implementations of hook_cron. Since we don't know how long those will take then we still might have timeouts. We could use the queue system but use directly in our implementation of hook_cron and make sure our implementation of hook_cron runs first. But still that could mean that things that run after us are more likely to timeout because as far as I can tell you can't tell cron not to run items after you.

If we did use the queue system we could make sure hours run first.

Remaining tasks

  1. Determine how this would affect how quickly critical security updates will happen. Right now in CronFrequencyValidator we warn if you have cron set to run every 3 hours and error if you have it set to run every 24 hours.

    This would mean if you have cron set to run every 23 hours it could take 3 days to apply a critical security update if it takes 3 cron runs. There are 4 stages but in Destroy the update has already been applied.

  2. Determine if there are other UX considerations. I think now if cron updates are running you could not start one via the UI unless you delete the current update(which you can't do if it is currently applying)

    If the update is going to take 3 cron runs even if you have cron set to run every hour we still might want to show the user a prompt like "An unattended update is in progress, complete this update now" instead of waiting for the 3 cron runs. This could trigger a batch process that either destroyed the cron stage and started from scratch or just do the remaining stages of the cron stage. Doing the remaining stages of cron stage might be tricky as \Drupal\package_manager\StageBase::claim checks that the update was started with the same session.

    We may want to prompt the user on the update form or if the site is currently not on a secure version prompt them on every page.

    If we wrote a CronUpdateInProgressValidator that subscribed to StatusCheckEvent then we could inform the user and provide a link to complete the update. We could vary the message based on whether the site was currently insecure. If site was NOt insecure we could also provide a link to delete the cron update. This would allow other Package Manager UIs to be used even if there was a cron update that had been started.

  3. Should we do the first "create" stage of the update in cron even if there is not an available update? This would mean that when a new update arrives if each stage of the life-cycle took its own cron run we would only need 2 cron runs to apply the update. This could very important for critical updates when cron is set to run infrequently

    The logic in LockFileValidator could tell use if the existing stage that was created in previous cron was still valid. If it was not we could recreate the stage.

User interface changes

API changes

Data model changes

📌 Task
Status

Closed: won't fix

Version

3.0

Component

Code

Created by

🇺🇸United States tedbow Ithaca, NY, USA

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.71.5 2024