Lower default concurrency

Issue created by @catch
Merge request !250Resolve #3469828 "Concurrency" → (Merged) created by catch
Pipeline finished with Failed
12 months ago
Total: 51s
#262194
Status changed to Needs review 12 months ago12:43am 23 August 2024
Comment 12 months ago →
🇬🇧United Kingdom catch
The pipeline fails due to expected versions which I think might be HEAD and I'm not able to trigger a child pipeline (probably due to that, or maybe permissions?).
Pipeline finished with Failed
12 months ago
Total: 142s
#262195
Comment 12 months ago →
🇬🇧United Kingdom catch
First phpunit run with no real changes (just a comment) was 5 min 26 sec. https://git.drupalcode.org/issue/paragraphs-3469830/-/jobs/2527933

Realised 10.3/10.4 doesn't have test timings in run-tests.sh, so adding opt in next major to see the longest running tests.
Status changed to RTBC 12 months ago8:43am 23 August 2024
Comment 12 months ago →
🇪🇸Spain fjgarlin
Triggered downstream pipelines https://git.drupalcode.org/issue/gitlab_templates-3469828/-/pipelines/26...

Based on the mentioned tickets where there is plenty of investigation I think this change makes sense.
Comment 12 months ago →
🇬🇧United Kingdom jonathan1055
I don't know if this is useful, but here is a test on Scheduler using MR250. The PHPunit job is using _concurrent=1 and there are five branches, run in a parallel matrix. The js and kernel tests are separated out already so the benefit might not be seen.
https://git.drupalcode.org/project/scheduler/-/pipelines/264154

Here's the same branch, but using ref 'main'
https://git.drupalcode.org/project/scheduler/-/pipelines/264160
Status changed to Needs work 12 months ago12:00am 26 August 2024
Comment 12 months ago →
🇬🇧United Kingdom catch
hmm that is useful, it actually makes it slower.

e.g .https://git.drupalcode.org/project/scheduler/-/jobs/2546041 vs https://git.drupalcode.org/project/scheduler/-/jobs/2546071

There is every possibility that the target -> main job was able to use spare CPUs rather than just 2, which means the 32 concurrency might be fine in those scenarios, but ideally we want the results of this to be neutral when runs aren't CPU constrained and faster when they aren't. If it fails to take advantage of extra CPUs when they're available, it's not really an improvement.

Currently when --directory is passed we order by test suite which results in

functional js -> functional -> kernel -> unit. Because scheduler already divides by test suite as you point out, there is no 'running quick tests at the end' effect.

Something like the most optimal ordering would need to expand on the logic in 📌 Order tests by number of public methods to optimize gitlab job times Fixed , so that we first order by test suite, then order by number of methods.

However for example https://git.drupalcode.org/project/scheduler/-/blob/2.x/tests/src/Functi... uses a dataProvider which isn't taken into account by the logic there.

We can't expect all contrib authors to add @group #slow manually to their tests, and don't want to significantly regress existing performance, so should try to automate this a bit more I think.

I just opened 📌 Include a check for data providers when ordering by method count Active against core which would try to account for data providers and run them earlier.

The next step after that would be implementing the test ordering by type and method logic when --directory is used.
Pipeline finished with Success
4 months ago
Total: 50s
#487443
Comment 4 months ago →
🇬🇧United Kingdom jonathan1055
Rebased and fixed conflicts. I presume we do still want to work on this?
📌 Switch to using run-tests.sh by default Active is also being looked at again.
Comment 4 months ago →
🇪🇸Spain fjgarlin
Thanks for rebasing.

Yeah we can investigate, but we will merge only if we find it makes a real difference, which based on #6 / #7, we couldn't really prove.
Comment 4 months ago →
🇬🇧United Kingdom catch
I think we might want to postpone this on 📌 Deprecate TestDiscovery test file scanning, use PHPUnit API instead Active which orders tests slowest-first more effectively than the current logic in run-tests.sh does, but I think it's worth trying to get done to see if we can get the best combination of pipeline wall time vs. CPU time for contrib tests.
Comment 3 months ago →
🇪🇸Spain fjgarlin
Postponing based on the previous comment.
Comment 3 months ago →
🇬🇧United Kingdom jonathan1055
I had already done some testing before you postponed this issue, so I might as well post the results here. I had made a change at some time since the tests in #6 and #7 above, so that running _PHPUNIT_TESTGROUPS: --all only runs one run job not the matrix, so we get a proper comparision now.

A baseline using '--all' to run eveything in one job, without this MR
https://git.drupalcode.org/issue/scheduler-3445052/-/jobs/5133960
phpunit test run duration: 11 min 25 sec
Overall job took 12 mins 16 sec

Then with MR250
https://git.drupalcode.org/issue/scheduler-3445052/-/jobs/5135182
phpunit 22 min 32 sec. Job took 24 mins 32 secs
re-run phpunit 19 mins 38 sec. Job 20 mins 32

The outcome is similar to before, with this particular project changing concurrency from 32 to 4 significantly increases the run times.
Comment 3 months ago →
🇬🇧United Kingdom jonathan1055
Revert unintended status change
Comment 3 months ago →
🇪🇸Spain fjgarlin
Thanks for putting the results of the testing. They are really useful.
Comment 3 months ago →
🇬🇧United Kingdom catch
With https://git.drupalcode.org/issue/scheduler-3445052/-/jobs/5135182 specifically there are several tests that take more than 300s each and one that is over 700s. To improve run times below 11 minutes, those tests would have to be split up or otherwise refactored to take less time per test class.

However, it also looks like if these four tests were explicitly marked with @group #slow

SchedulerFieldsDisplayTest SchedulerHooksTest SchedulerPermissionsTest SchedulerRulesEventsTest
then it might get the runtimes with lower concurrency down closer to 11 minutes. Or we might need to move the concurrency a bit higher to something like 8 too.
Comment 3 months ago →
🇬🇧United Kingdom jonathan1055
Thanks for the info. I marked those four tests with @group #slow and it was quicker. Initial run 17 min 16 sec, and re-run 20 min 36 sec, so it does appear faster with this.
https://git.drupalcode.org/issue/scheduler-3445052/-/jobs/5196407#L482

Then I changed concurency from 4 (as in the MR) to 8 and got runtimes of 7 min 54 sec, 19 min 38 sec and 14 min 50 sec (I guess plenty of other factors are affecting the runtimes, hence why I did three)
https://git.drupalcode.org/issue/scheduler-3445052/-/jobs/5196975#L482
Comment 3 months ago →
🇬🇧United Kingdom catch
Thanks for testing, this is encouraging.

I think the next tweak would be to mark this one with @group #slow

SchedulerNonEnabledTypeTest
Because that one finished last in the run. With eight concurrency, that still leaves three spare processes to run the other tests from the start of the job. Theoretically might get under 7m58 then, but also at 7m58 this looks like it's up to 4 minutes faster than HEAD?

Once 📌 Deprecate TestDiscovery test file scanning, use PHPUnit API instead Active lands in 11.2, it would be great to see what a result looks like with @grouip #slow against that issue.

And then once we've got a baseline against that issue, it would be interesting to see if reverting the @group #slow from scheduler still results in faster test runs (because the default ordering should in some cases lead to the same results).
Comment 3 months ago →
🇬🇧United Kingdom jonathan1055
With SchedulerNonEnabledTypeTest also marked #slow, the times are 13 mins 50, 11 mins 48 and 13 mins 45
https://git.drupalcode.org/issue/scheduler-3445052/-/jobs/5198370#L482
Comment 3 months ago →
🇬🇧United Kingdom catch
Looks like SchedulerRequiredTest could also use @group #slow.

(sorry for the back and forth, this is what it was like trying to get core test runs down too - constant whack-a-mole).
Comment 3 months ago →
🇬🇧United Kingdom jonathan1055
sorry for the back and forth, this is what it was like trying to get core ...

No problem at all, I'm pleased you are giving good feedback and ideas.

With #slow added in SchedulerRequiredTest - 9 min 53 sec, 7 min 2 sec and 6 min 18 sec. Each re-run was faster, but I'm sure that is not salient.
https://git.drupalcode.org/issue/scheduler-3445052/-/jobs/5198987
Comment 3 months ago →
🇬🇧United Kingdom catch
In https://git.drupalcode.org/issue/scheduler-3445052/-/jobs/5198987 it looks like SchedulerRequiredTest might not be marked as #slow yet (it starts in the middle of the job still), is that definitely the right link?
Comment 3 months ago →
🇬🇧United Kingdom jonathan1055
Sorry, yes you are right. Not sure what happened there, the tests were run with the correct change, so the results were right, I just had the wrong link. Here is the first of that run, and the two re-runs next door. 9 min 53 sec, 7 min 2 sec and 6 min 18
https://git.drupalcode.org/issue/scheduler-3445052/-/jobs/5206786
Comment 3 months ago →
🇬🇧United Kingdom catch
Thank you!

https://git.drupalcode.org/issue/scheduler-3445052/-/jobs/5206786 looks pretty good.

SchedulerTestLegacyHooks still overhangs, and SchedulerMultilingualTest nearly overhangs but does not (e.g. SchedulerLightweightCronTest finishes in-between).

Since SchedulerLightWeightCronTest starts sixth from last and takes 30s, I think it's likely that the last few finishing tests all finish within a few seconds of each other.

As soon as we use @group #slow for more than 8 tests, one of them won't start until another has, and then you're back to square one, we even considered @group #really_slow in core for that but didn't go that far yet. So if we tagged SchedulerTestLegacyHooks as #slow it might save 10-30s still, but it starts to get towards diminishing returns at this point.

So for me this is showing that we can get good pipeline runtimes with 8 concurrency, but it will probably require manual tweaking for projects with several long-running tests and more than 8 tests in total.

Next thing is to see if 📌 Deprecate TestDiscovery test file scanning, use PHPUnit API instead Active gets us the same or better results with less manual tweaking.

Explicitly marking this with PP-1 on that issue, I think it could make things worse for some projects if we make the switch before that lands. But once it lands, hopefully we can make everything a bit more efficient.
Comment 3 months ago →
🇬🇧United Kingdom jonathan1055
Six tests have @group #slow and we have _CONCURRENCY_THREADS: 8. So shall I mark those two you identified (LegacyHooks and Multilingual) with #slow, to make sure that they get run by the last two threads, and don't get picked together? Or is the tactic to always have more concurrency than #slow tests? I will try it anyway, to see what that shows us.

By the way, is there any way to widen the summatry output? The test classes are truncated at 60 chars in the sumary and three of them show as identical Drupal\Tests\scheduler_rules_integration\Functional\Schedule, and two as Drupal\Tests\scheduler\FunctionalJavascript\SchedulerJavascr
I was just doing a little bit of data collection and analysis, fiding average runtimes per test, but cannot programatically get one-to-one numbers when the test classes are not shown as distinct.
Comment 3 months ago →
🇬🇧United Kingdom catch
Or is the tactic to always have more concurrency than #slow tests? I will try it anyway, to see what that shows us.

Exactly the same number is fine, so if the 8 slowest tests are all tagged that's great - as soon as one finishes, a process if freed up for a faster test to start.

Once you get to 9 @group #slow tests with 8 concurrency, the risk is that the 9th slow test is the slowest one - say a test that takes 5 minutes, and it's been displaced by a test that takes 45s that runs first, now the test run is going to be at least 5m 45s. If the 9th test takes 90 seconds, then it'd be 90s + 45s which is usually fine, but this is where it gets into diminishing returns very quickly. Unfortunately speaking from experience here...

So it's best to tag the minimum number of tests as @group #slow to avoid overhangs, and let ordering (improved by the discovery MR) take care of the rest. It may be the case with scheduler that the ideal minimum number of tests to tag is exactly 8 :) at least until the discovery MR lands and changes things again.
Comment 3 months ago →
🇬🇧United Kingdom catch
By the way, is there any way to widen the summatry output? T

I think something like this just landed in 11.2, iirc it truncates from the right instead of the left, or something like that now,so next minor might pick that up already.
Comment 3 months ago →
🇬🇧United Kingdom jonathan1055
Here are the visual results from the sets of tests when we started with 4 #slow, then added the 5th and the 6th. "Total minutes" is the combined time of all tests, which is obviously much larger than the elapsed time, but I just added that for interest.

I have not done any more test runs with the two (7th and 8th) you mentioned in comment 23. There is some variation in the positions. The javascript and rules integration tests seem to have the most variability, looking the colours.
Comment 3 months ago →
🇬🇧United Kingdom catch
📌 Deprecate TestDiscovery test file scanning, use PHPUnit API instead Active is in 11.2 now, so while this can't be committed yet, I think it would be possible to test scheduler without @group #slow and see how it gets on with next minor, ideally both with and without the change here. if it's neutral or better, then this might be ready to go (with concurrency of 6 or 8, probably not 4) when 11.2.0 is tagged.
Comment 3 months ago →
🇬🇧United Kingdom jonathan1055
I created a new test branch without the @group #slow so that we can still re-test the above if we need to.

'Next minor' throws many deprecations, some in Scheduler and many in the 3rd-party modules needed for testing integration and plugins. So this is not any help so far. The log length limit has been reached.
https://git.drupalcode.org/issue/scheduler-3445052/-/jobs/5372389
Pipeline finished with Success
3 months ago
Total: 83s
#506836
Comment 3 months ago →
🇬🇧United Kingdom catch
While the log length limit doesn't help, 7m30s seems in-line with previous runs that had @group #slow and also faster than HEAD - so that might be encouraging?

I'm not sure how gitlab templates deprecation support works, is it possible to turn that off so the logs are quieter?

🇬🇧United Kingdom jonathan1055

Yes the time of 7m30 does seem encouraging.

I realised that the 'next minor' test above was actually running against 11.3-dev (due to a quirk in the templates which is being fixed generically, as part of the change in 📌 11.2.x is released - update NEXT_MINOR Active )

When I run again 11.2.0-dev every test produces

Drupal\Core\Test\Exception\MissingGroupException: Missing group metadata in test class Drupal\Tests\Composer\Generator\BuilderTest
in /builds/issue/scheduler-3445052/web/core/lib/Drupal/Core/Test/PhpUnitTestDiscovery.php:304
Stack trace:
#0 /builds/issue/scheduler-3445052/web/core/lib/Drupal/Core/Test/PhpUnitTestDiscovery.php(187):
Drupal\Core\Test\PhpUnitTestDiscovery->getTestClassInfo(Object(PHPUnit\Framework\TestSuite), 'PHPUnit-Unit')
#1 /builds/issue/scheduler-3445052/web/core/lib/Drupal/Core/Test/PhpUnitTestDiscovery.php(120):
Drupal\Core\Test\PhpUnitTestDiscovery->getTestList(Object(PHPUnit\Framework\TestSuite), NULL)
#2 /builds/issue/scheduler-3445052/web/core/scripts/run-tests.sh(1044):
Drupal\Core\Test\PhpUnitTestDiscovery->getTestClasses(NULL, Array)
#3 /builds/issue/scheduler-3445052/web/core/scripts/run-tests.sh(189): simpletest_script_get_test_list()

See https://git.drupalcode.org/issue/scheduler-3445052/-/jobs/5378256
This sounds related to the change, so I wanted to draw your attention to it.

Comment 3 months ago →
🇬🇧United Kingdom catch
The only known issue from that change is 🐛 PhpUnitApiGetTestClassesTest and PhpUnitApiFindAllClassFilesTest need to execute PHPUnit discovery before TestDiscovery Active , might be worth posting the error there for now.
Comment 3 months ago →
🇬🇧United Kingdom jonathan1055
I have raised Gitlab Tempates issue 📌 MissingGroupException: Missing group metadata with run-tests.sh in 11.2.0-dev Active . It may end up being a core problem, but need to investigate further.
Comment 2 months ago →
🇬🇧United Kingdom catch
Just re-committed 📌 Deprecate TestDiscovery test file scanning, use PHPUnit API instead Active with a fix for the discovery, so this should be unblocked again.
Pipeline finished with Failed
about 2 months ago
Total: 52s
#526213
Comment about 2 months ago →
🇬🇧United Kingdom catch
Pushed a commit changing this to 8 instead of 4 per the discussion in #23-#28. Short version is scheduler and several other modules will have enough slower tests that 4 isn't enough space to start them at the same time. 8 is more likely, and still considerably lower than 32.
Comment about 2 months ago →
🇬🇧United Kingdom catch
I think this is safe to try after 📌 Bump Core to 11.2.0 Active
Status changed to RTBC about 1 month ago4:36pm 3 July 2025
Comment about 1 month ago →
🇪🇸Spain fjgarlin
This being such a small MR and core having changed the logic to optimize slow/fast test, I think it's worth merging. We can always revert if we see that it is disrupting contrib, but it shouldn't based on some of the tests above.
Comment about 1 month ago →
🇬🇧United Kingdom jonathan1055
Yes RTBC I just left a comment in the MR on the description. Do we want to add "The range can be 1 - 32, with 8 as a good default" or something like that.
Pipeline finished with Success
about 1 month ago
Total: 48s
#539133
Pipeline finished with Skipped
about 1 month ago
#539217
Comment about 1 month ago →
System Message

fjgarlin → committed 6fefed42 on main authored by catch →
Issue #3469828 by fjgarlin, catch, jonathan1055: Lower default...
Comment about 1 month ago →
🇪🇸Spain fjgarlin
Thanks! I've now merged this.
Comment 29 days ago →
System Message
Automatically closed - issue fixed for 2 weeks with no activity.

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!250Lower default concurrency
Merged

Comments & Activities