Reduce CPU requirements for core gitlab pipelines

Issue created by @catch
Comment 12 months ago →
🇬🇧United Kingdom catch
Merge request !9302Lower CPU requests for pipeline jobs. → (Open) created by catch
Status changed to Postponed 12 months ago12:24pm 22 August 2024
Comment 12 months ago →
🇬🇧United Kingdom catch
What this does:

1. Lowers the CPU request from 24 to 16 for most jobs. The theory behind this is that the total-CPUs-per-machine is more likely to be a multiple of 16 than 24 so theoretically we can fit more jobs on a lower number of machines (or on 16 CPU machines if such a machine exists). I don't fully (or even much) understand the relationship between CPU requests, kubernetes and AWS instances, so this might be flawed, but also in general lower and simpler numbers seems better.

2. Lowers the concurrency of a couple of jobs quite a lot, especially functional tests where I am pretty sure the concurrency in HEAD is leading to CPU contention and hence slower rather than faster test runs. This is made possible by 📌 Order tests by number of public methods to optimize gitlab job times Fixed which removes @group #slow from the vast majority of tests, relying on a better ordering algorithm instead.

3. Increases the parallelism for functional js and functional tests by 1 each. This is because in theory most test runs (in the sandbox branch with various changes applied) can finish within about 2m30s, but we still have a lot of individual tests over 2 minutes each. With lower concurrency, those long running jobs are spread out enough we don't run two slow tests end to end. I'm pretty sure there is potential to bring this lower by continuing to optimise some of these slower tests, but it also gives us a bit of headroom when we add new coverage.

If we look at the jobs, we can see that the overall CPU requirement is reduced dramatically:

Functional JS:
Before: 2 * 24 = 48
After: 3 * 16 = 48

Functional:
Before: 7 *24 = 168

After: 8*16 = 128

W3 legacy:
Before: 1 * 24 = 24

After: 1 * 16 = 16

So an overall reduction of 48 CPUs, with potential scope to reduce further.
Comment 11 months ago →
🇬🇧United Kingdom catch
Found an extra 16 CPU requests to drop on 📌 Order tests by number of public methods to optimize gitlab job times Fixed which brings the total to 64 here.
Status changed to Needs review 11 months ago7:35pm 19 September 2024
Comment 11 months ago →
🇬🇧United Kingdom catch
Just rebased and the full pipeline took six minutes: https://git.drupalcode.org/project/drupal/-/pipelines/287574

Our current best runtime is currently around 5m30s, given the amount of variation between runs, that seems in range given the overall CPU saving here. Moving to needs review.
Status changed to RTBC 11 months ago8:31am 20 September 2024
Comment 11 months ago →
🇪🇸Spain fjgarlin
The changes look good and so do your maths in #4. Pipelines are also happy. RTBC.
Comment 11 months ago →
System Message

longwave → committed 88774524 on 11.x
Issue #3469687 by catch, fjgarlin: Reduce CPU requirements for core...
Status changed to Downport 11 months ago10:35am 20 September 2024
Comment 11 months ago →
🇬🇧United Kingdom longwave UK
Committed 8877452 and pushed to 11.x. Thanks!

Patch doesn't apply to 11.0.x and below, we don't run as many tests there, but this feels like a good candidate for backport if it reduces costs for the DA?
Comment 11 months ago →
🇬🇧United Kingdom catch
I'm not sure it's worth backporting to 11.0.x but it probably is worth backporting to 10.4.x since that will then carry forward to the next 10.x branches which will have daily test runs for another couple of years.

10.4.x does not have all of the test performance improvements in 11.x, but I'm sure that tests would still finish in 7-8 minutes or less with these changes, and we don't run that many MR pipelines against 10.4.x (compared to on-commit/scheduled runs).

Also if my uninformed kubernetes theories are correct, it might help recycling/re-use of test runners since they'll be more consistent sizes between the branches?

So moving back there. If there's a problem with 10.4.x and the changes here, we'll find out from the backport pipeline hopefully.
Comment 11 months ago →
🇬🇧United Kingdom catch
catch → changed the visibility of the branch 3469687-pp-2-reduce-cpu to hidden.
Merge request !956110.4.x: reduce CPU requirements for gitlab jobs → (Closed) created by catch
Pipeline finished with Success
11 months ago
Total: 378s
#288832
Status changed to Fixed 11 months ago7:27am 21 September 2024
Comment 11 months ago →
🇬🇧United Kingdom catch
Backport pipeline finished in 6 minutes and 12 seconds. https://git.drupalcode.org/project/drupal/-/pipelines/288832

Since the backport itself was trivial, going to go ahead and commit here.

catch → committed 6dc60ef8 on 10.4.x

Issue #3469687 by catch, fjgarlin, longwave: Reduce CPU requirements for...

Comment 11 months ago →
System Message
catch → closed merge request !9561
Comment 10 months ago →
System Message
Automatically closed - issue fixed for 2 weeks with no activity.

Reduce CPU requirements for core gitlab pipelines

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

Merge Requests

!9561Reduce CPU requirements for core gitlab pipelines
Closed

!9302Reduce CPU requirements for core gitlab pipelines
Open

Comments & Activities

Reduce CPU requirements for core gitlab pipelines

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

Merge Requests

!9561Reduce CPU requirements for core gitlab pipelinesClosed

!9302Reduce CPU requirements for core gitlab pipelinesOpen

Comments & Activities

!9561Reduce CPU requirements for core gitlab pipelines
Closed

!9302Reduce CPU requirements for core gitlab pipelines
Open