- Issue created by @fjgarlin
- Status changed to Needs review
about 1 year ago 2:27pm 30 November 2023 - 🇬🇧United Kingdom catch
Committing this one from needs review so we can monitor, thanks!
- Status changed to Fixed
about 1 year ago 2:31pm 30 November 2023 - 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺
😳
So: claim more than we need for any
chromedriver
-dependent tests, because those tests are especially likely to be timing-sensitive and hence load-sensitive? - 🇪🇸Spain fjgarlin
Apparently, we were under-resourcing some jobs, so pods would take more jobs because they had space for them, but then the jobs were more demanding than anticipated and the whole pod would struggle. With this change, we are trying to make sure that no two big jobs land on the same pod.
My k8s jargon is far from great so take the above with a grain of salt.
- 🇬🇧United Kingdom catch
Quoting from slack:
nnewton
1 day ago
this node is strugglingnnewton
1 day ago
I can't even shell onto itnnewton
1 day ago
load is 116load of 80, good lord
It may be that tests use CPUs that we couldn't see when we tried to log load levels when setting CPU requests in the first place (i.e. maybe the load we saw was only on one container, not other containers that are also used for the tests which have heavier loads). Not sure how to figure that out conclusively.
But definitely also the case that js tests are particularly susceptible to timing issues.
Automatically closed - issue fixed for 2 weeks with no activity.