Reevaluate the break-up of the various testsuites on GitLab

Created on 16 January 2024, 10 months ago
Updated 15 February 2024, 9 months ago

Problem/Motivation

After all child issues of 📌 [META] Convert Functional tests classes which make no HTTP requests into Kernel tests Active are committed we should reevaluate the break-up of all the testsuites on GitLab.

Already now, after "only" six of them are committed the job "PHPUnit Kernel 1/2" is the slowest and is taking ~30 seconds more than the next slowest job ("PHPUnit Functional Javascript 1/2")

Steps to reproduce

Proposed resolution

- Reshuffle tests/break-ups so they all take roughly the same amount of time

Merge request link

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

📌 Task
Status

Fixed

Version

10.2

Component
PHPUnit 

Last updated 1 day ago

Created by

🇳🇱Netherlands spokje

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @spokje
  • 🇳🇱Netherlands spokje

    Postponed on all of the child issues of 📌 [PP-1] Reevaluate the break-up of the various testsuites on GitLab Postponed being committed.

  • 🇬🇧United Kingdom catch

    Yes these were already finishing just around the same time as other jobs so not surprising that moving tests over could push them over the top.

    Couple of options, could do one or both:

    1. Change parallel: 2 to parallel: 3
    2. Move them up in the YAML before functional tests (this puts them in the queue earlier, often makes a few seconds difference).

    If we do #1, we also need to check for 'overhanging' tests (slow kernel tests that finish last in a job) and add those to @group #slow so they get distributed across the start of each test job instead.

    It would be nice to be able to also reduce the parallel of functional tests to match, but probably need to shift quite a few more tests before that can be done without having an effect.

  • 🇳🇱Netherlands spokje

    With already ~30 seconds "lead" (or rather trailing) and seeing the amount of NR/RTBC issue that will convert more Functional => Kernel test, I think only option #2 won't cut it.

    In fact, but that's IMHO, I don't see much gain in it even when combined with #1.

    I would go all-in on #1.

    But let's first get all the children home/committed so we can assess the full "damage" done :)

  • 🇳🇱Netherlands spokje

    Just for giggles/n =1/yada yada

    These are the duration times of https://git.drupalcode.org/issue/drupal-3245997/-/pipelines/76305 which ran about 3 hours before the commit of the first of the "Convert BlahTest into a Kernel Test" was committed:

    PHPUnit Kernel 1/2 - Duration: 4 minutes 49 seconds 
    PHPUnit Kernel 2/2 - Duration: 4 minutes 46 seconds  
    PHPUnit Functional 1/7 - Duration: 4 minutes 39 seconds 
    PHPUnit Functional 2/7 - Duration: 4 minutes 12 seconds
    PHPUnit Functional 3/7 - Duration: 3 minutes 51 seconds 
    PHPUnit Functional 4/7 - Duration: 3 minutes 54 seconds
    PHPUnit Functional 5/7 - Duration: 4 minutes 44 seconds 
    PHPUnit Functional 6/7 - Duration: 4 minutes 35 seconds 
    PHPUnit Functional 7/7 - Duration: 4 minutes 54 seconds 
    
  • 🇬🇧United Kingdom catch

    It's usually safer to go by 'Test run duration' which you can get from ctrl-f in the logs. e.g. https://git.drupalcode.org/issue/drupal-3245997/-/jobs/622282 4 min 5 sec.

    There's still a lot of variation, but it excludes things like 'waiting for pod to be ready' which is even more variable.

  • 🇳🇱Netherlands spokje

    Well that's slightly annoying, I expected since there's also a Queued time, that all the overhead would be deducted.

    Anyway, thanks for the insight, updated the times.

  • 🇬🇧United Kingdom catch

    The queue time is while the job is queued, but once the job starts, it still needs to get a pod to actually run the tests on and that's not included in the queue time - it's due to the interaction between gitlab and kubernetes (what I typed is about as much as I understand - but it's something along those lines).

  • Status changed to Active 10 months ago
  • 🇳🇱Netherlands spokje

    Even if not all children of 📌 [META] Convert Functional tests classes which make no HTTP requests into Kernel tests Active are in (there's one or two incoming later), I think it's time to unpostpone this issue.

    By now both the Kernel test job are taking about ~1 minute longer than the Functional test jobs.

    I think it's time to split the two Kernel jobs into three?

    @catch I'll happily leave this in your capable hands if you have time and interest in this>

  • 🇬🇧United Kingdom catch

    I've actually been messing around in https://git.drupalcode.org/project/drupal/-/merge_requests/6271 already, but let's get a 'clean' MR on here.

  • Merge request !6325Increase kernel jobs from 2 to 3. → (Closed) created by catch
  • Status changed to Needs review 10 months ago
  • 🇬🇧United Kingdom catch

    This successfully lowers the long runtime of the two kernel jobs, but it does it a bit unevenly, so we may have a few more @group #slow to add to even things out between the three jobs.

  • 🇳🇱Netherlands spokje

    This successfully lowers the long runtime of the two kernel jobs, but it does it a bit unevenly,

    Agreed.

    so we may have a few more @group #slow to add to even things out between the three jobs.

    Not sure if you want to get this in first and do the above in a follow-up (in which case I would happily RTBC this), or want to do that in here as well?

  • Status changed to RTBC 10 months ago
  • 🇺🇸United States smustgrave

    @catch can we open follow ups for the slow groups. Would be easier to find in a search later if we ever needed to.

  • 🇬🇧United Kingdom catch

    I think it's fine to do that in a follow-up yeah, it's a bit neverending unfortunately because as you find some, that exposes more. Eventually we can try to order tests by how long they take (descending) rather than the manual @group #slow.

    • larowlan committed a16f3fc8 on 10.2.x
      Issue #3415004 by catch, Spokje: Reevaluate the break-up of the various...
    • larowlan committed e94e1d77 on 11.x
      Issue #3415004 by catch, Spokje: Reevaluate the break-up of the various...
  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    Committed to 11.x and backported to 10.2.x - thanks folks

  • Status changed to Fixed 10 months ago
  • Automatically closed - issue fixed for 2 weeks with no activity.

  • 🇺🇸United States smustgrave

    Follow up added here 📌 Add @slow group to newly split kernel jobs Active

Production build 0.71.5 2024