GitLab CI pods load too high when running multiple pipelines at the same time

Created on 30 November 2023, about 1 year ago

Problem/Motivation

Whilst working and debugging things in [investigation] get metrics of request to see which settings need to be changed for random failures Active , it was recommended by the infra team to up the resources of the NightWatcht and FunctionalJavascript tests, so the pod allocation works better.

Steps to reproduce

There were several long threads, but the recommendation was here: https://drupal.slack.com/archives/CGKLP028K/p1701280510860119?thread_ts=...

Proposed resolution

Up resources for those jobs.

Remaining tasks

MR

User interface changes

API changes

Data model changes

Release notes snippet

📌 Task
Status

Fixed

Version

11.0 🔥

Component
PHPUnit 

Last updated about 11 hours ago

Created by

🇪🇸Spain fjgarlin

Live updates comments and jobs are added and updated live.
  • JavaScript

    Affects the content, performance, or handling of Javascript.

Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @fjgarlin
  • Merge request !5620Up resources. → (Open) created by fjgarlin
  • Status changed to Needs review about 1 year ago
  • 🇬🇧United Kingdom catch

    Committing this one from needs review so we can monitor, thanks!

    • catch committed 0213c58f on 10.2.x
      Issue #3405242 by fjgarlin: GitLab CI pods load too high when running...
  • Status changed to Fixed about 1 year ago
    • catch committed f9037be4 on 11.x
      Issue #3405242 by fjgarlin: GitLab CI pods load too high when running...
  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    😳

    So: claim more than we need for any chromedriver-dependent tests, because those tests are especially likely to be timing-sensitive and hence load-sensitive?

  • 🇪🇸Spain fjgarlin

    Apparently, we were under-resourcing some jobs, so pods would take more jobs because they had space for them, but then the jobs were more demanding than anticipated and the whole pod would struggle. With this change, we are trying to make sure that no two big jobs land on the same pod.

    My k8s jargon is far from great so take the above with a grain of salt.

  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    Thanks, that makes sense 👍 😊

  • 🇬🇧United Kingdom catch

    Quoting from slack:

    nnewton
    1 day ago
    this node is struggling

    nnewton
    1 day ago
    I can't even shell onto it

    nnewton
    1 day ago
    load is 116

    load of 80, good lord

    It may be that tests use CPUs that we couldn't see when we tried to log load levels when setting CPU requests in the first place (i.e. maybe the load we saw was only on one container, not other containers that are also used for the tests which have heavier loads). Not sure how to figure that out conclusively.

    But definitely also the case that js tests are particularly susceptible to timing issues.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024