Vastly improve reliability of minor updates

Created on 18 September 2018, almost 6 years ago
Updated 16 January 2024, 5 months ago

Problem

Despite our efforts, we keep introducing regressions with every minor release. This is problematic for many (obvious) reasons:

  • Harms Drupal's reputation.
  • Holds back people from upgrading timely, which in turn implies additional support burden, since more sites are on older/no-longer supported releases, but still try to make them work by asking help in the d.o. queues, where contrib module owners might not be interested in supporting them.
  • Increases the chance of security issues.
  • All of which lead to more harm to Drupal’s reputation and possibly to adoption issues.

On top of that, regressions in minor upgrades don't only affect complex websites, that are likely to have developers taking care of them, in fact they hit harder one of the audiences Drupal has been historically good at serving: moderately complex websites with low budget and lack or scarcity of technical expertise. Some of the 8.6.0 update issues → were even reported by sites with very simple setups.

And again these regressions happen despite our best efforts. Some update issues can be caught on review or via automated tests, although the most complex updates are normally part of big patches, so it’s easy for problems to go unnoticed → . On the other hand some bugs are hard or nearly impossible to catch by simply writing a regular automatic test: #2997982: Orphan term hierarchy records can cause taxonomy_update_8502 to enter an infinite loop → is a glaring example.

Last but not least, automatic updates → are going to exacerbate these issues, if we don't take action.

Proposed solution

Significantly widen the tested scenarios.

This could be accomplished in many ways, that aren’t mutually exclusive:

  • Launch a “test the upgrade” campaign to increase awareness around the value of testing the upgrade before the stable release is out, highlighting that by doing this you are not only doing a favor to future-you, but to all the site owners that lack the resources and/or the expertise to recover from a failed upgrade. We could improve our documentation to feature a step-by-step guide on how to test the upgrade, especially outlining the backup/restore phases. It seems in many cases people are already doing this, but with the stable release instead of Betas or RCs.
    First step: https://www.drupal.org/blog/pilot-program-help-us-improve-reliability-of... →
  • Develop one (or more) canary website via a widely used distribution featuring common setups but also advanced use cases, involving the usage of as many popular modules as possible. Use that to continuously run the core and contrib test suites (ideally on every core commit), possibly increasing test coverage for the latter. Many regressions were discovered by simply running very thorough contrib test suites (e.g. Paragraph’s one). (A copy of) Drupal.org itself could become a canary once on D8.
  • Introduce a distributed regression testing tool: this would require us to implement a client module that would basically sanitize/anonymize the DB and configuration of the website it is running on and upload it to the Drupal.org infrastructure (or alternatively to a local machine with the companion server module installed). Here the anonymized site would be recreated and all available tests would be run. Failures would be reported back to the original website and on Drupal.org. This would help core developers to spot regressions earlier, and at the same time would provide an incentive for people to use/contribute to this system, because they would know almost in real time how ready their website is for the next minor. In practice this would become a companion system for automatic updates, which could even be aborted if errors were to be reported. At the same time every website participating in this system would become an automated canary (see previous bullet).
    Drupal.org itself could join the system, once on D8.
  • Other suggestions welcome :)

This idea aims to prevent regressions and is related to 🌱 Create an official "Minor release upgrade path" initiative Closed: outdated that instead focuses on dealing with existing regressions in a better way.

đź“Ś Task
Status

Fixed

Component

Proposed Plan

Created by

🇮🇹Italy plach Venezia

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.69.0 2024