[PP-2] Reduce CPU requirements for core gitlab pipelines

Created on 22 August 2024, 28 days ago
Updated 23 August 2024, 27 days ago

Problem/Motivation

Once 📌 [PP-1] Order tests by number of public methods to optimize gitlab job times Postponed lands we have tests distributed fairly evenly between test jobs on the gitlab pipeline.

I've also been experimenting in a sandbox branch with 📌 Add the ability to install multiple modules and only do a single container rebuild to ModuleInstaller Needs review and Use one-time login link instead of user login form in BrowserTestBase tests Needs review + various individual test performance issues combined to see what the absolute potential floor of test run time is.

With all of those applied, the best run I've managed is 4m55s which is down from the current floor of about 5m30s.

https://git.drupalcode.org/project/drupal/-/pipelines/261246

However, that time was achieved with much lower overall CPU requirements than the current pipelines - this issue is to extra those changes from the sandbox MR, however it will depend on some of the other issues landing in order not to be a regression against the current state (at least in terms of wall time).

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

📌 Task
Status

Postponed

Version

11.0 🔥

Component
PHPUnit 

Last updated about 4 hours ago

Created by

🇬🇧United Kingdom catch

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @catch
  • 🇬🇧United Kingdom catch
  • Merge request !9302Lower CPU requests for pipeline jobs. → (Open) created by catch
  • Status changed to Postponed 28 days ago
  • 🇬🇧United Kingdom catch

    What this does:

    1. Lowers the CPU request from 24 to 16 for most jobs. The theory behind this is that the total-CPUs-per-machine is more likely to be a multiple of 16 than 24 so theoretically we can fit more jobs on a lower number of machines (or on 16 CPU machines if such a machine exists). I don't fully (or even much) understand the relationship between CPU requests, kubernetes and AWS instances, so this might be flawed, but also in general lower and simpler numbers seems better.

    2. Lowers the concurrency of a couple of jobs quite a lot, especially functional tests where I am pretty sure the concurrency in HEAD is leading to CPU contention and hence slower rather than faster test runs. This is made possible by 📌 [PP-1] Order tests by number of public methods to optimize gitlab job times Postponed which removes @group #slow from the vast majority of tests, relying on a better ordering algorithm instead.

    3. Increases the parallelism for functional js and functional tests by 1 each. This is because in theory most test runs (in the sandbox branch with various changes applied) can finish within about 2m30s, but we still have a lot of individual tests over 2 minutes each. With lower concurrency, those long running jobs are spread out enough we don't run two slow tests end to end. I'm pretty sure there is potential to bring this lower by continuing to optimise some of these slower tests, but it also gives us a bit of headroom when we add new coverage.

    If we look at the jobs, we can see that the overall CPU requirement is reduced dramatically:

    Functional JS:
    Before: 2 * 24 = 48
    After: 3 * 16 = 48

    Functional:
    Before: 7 *24 = 168

    After: 8*16 = 128

    W3 legacy:
    Before: 1 * 24 = 24

    After: 1 * 16 = 16

    So an overall reduction of 48 CPUs, with potential scope to reduce further.

  • 🇬🇧United Kingdom catch

    Found an extra 16 CPU requests to drop on 📌 [PP-1] Order tests by number of public methods to optimize gitlab job times Postponed which brings the total to 64 here.

Production build 0.71.5 2024