- Issue created by @cmlara
- πͺπΈSpain fjgarlin
I wonder if the list in
when
could be trimmed down so that we get "good retries" rather than "retry almost anything that fails". - πΊπΈUnited States cmlara
Everything except script_failure is already configured as a global default to retry two times.
https://git.drupalcode.org/project/gitlab_templates/-/blob/e8fa5b6816adf...
All I am suggesting is adding scripts_failures to the existing list for the composer jobs.
Ideally we might have been able to limit it to a specific exit code however I am not an aware of a specific code for the issues we encounter with timeouts.
- πͺπΈSpain fjgarlin
Oh cool. I didnβt even check the code π
Maybe we can trigger a timeout somehow while trying things here.So itβd be only adding one more category for retrying, thanks for clarifying.
- πΊπΈUnited States cmlara
I doubt we want to set the script failure globally, composer-base would likely be better.
Leaving as NR. Change will technically work however increases probability of re-running on other stages. See this downstream pipeline with phpunit failures https://git.drupalcode.org/project/api/-/pipelines/328919/builds
- πͺπΈSpain fjgarlin
Good point. It's also annoying that GitLab uses
script_failure
for two things: https://docs.gitlab.com/ee/ci/yaml/#retryscript_failure: - The script failed. - The runner failed to pull the Docker image. For docker, docker+machine, kubernetes executors.
- πͺπΈSpain fjgarlin
In any case, the issue seems to happen in the "composer" jobs more often than others, so I've made the change you suggested in #8, so it only affects "composer-base".
- πΊπΈUnited States cmlara
D7 branch as well ?
It's also annoying that GitLab uses script_failure for two things:
Thankfully to my knowledge we have not seen these in production. If we get to this level of failure we likely have a very significant infrastructure issue.
Although if the root cause of composer script failures is a GitHub block we may eventually see issues with loading 3rd party images from the GitHub registry.
- π¬π§United Kingdom jonathan1055
Would a test on Scheduler at D10 and D7 be helpful? It is unlikely to show anything, but can do it if useful?
- πͺπΈSpain fjgarlin
Thatβd be great if you can. Non downstream projects tests are really useful too, to make sure things work on projects where we donβt have full permissions.
- π¬π§United Kingdom jonathan1055
Tested on D11
https://git.drupalcode.org/project/scheduler/-/pipelines/331061
With Composer Max PHP I usedPHP_VERSION: $CORE_PHP_NEXT
which cannot resolve to an installable set of dependencies yet, so it automatically retried twice (I did not re-run it manually). So that is working as expected, even though re-running would not help in this situation.Tested on on D7
https://git.drupalcode.org/project/scheduler/-/pipelines/331067 - πΊπΈUnited States cmlara
Visually looks good to me, with tests from #15 appears safe to call this RTBC.
Note: To clarify this is not intended to stay long term, ideally the Infra team finds some solution to our timeout issue in the near future, as soon as they do we should revert this commit.
This is intended as a tradeoff in potentially higher CI runtime on valid failures in attempt to reducing the amount of maintainer/contributor labor due to failures outside their control
-
fjgarlin β
committed 5d373e84 on main
Issue #3484713 by fjgarlin, cmlara, jonathan1055: Use Retry:2 on script...
-
fjgarlin β
committed 5d373e84 on main
- πͺπΈSpain fjgarlin
Agree. I'm also happy that this is only for the "composer" jobs and not all jobs.