Use Retry:2 on sript failure for composer jobs

Issue created by @cmlara
Comment 9 months ago →
🇺🇸United States cmlara
Comment 9 months ago →
🇪🇸Spain fjgarlin
I wonder if the list in when could be trimmed down so that we get "good retries" rather than "retry almost anything that fails".
Comment 9 months ago →
🇺🇸United States cmlara
Everything except script_failure is already configured as a global default to retry two times.

https://git.drupalcode.org/project/gitlab_templates/-/blob/e8fa5b6816adf...

All I am suggesting is adding scripts_failures to the existing list for the composer jobs.

Ideally we might have been able to limit it to a specific exit code however I am not an aware of a specific code for the issues we encounter with timeouts.
Comment 9 months ago →
🇪🇸Spain fjgarlin
Oh cool. I didn’t even check the code 😅
Maybe we can trigger a timeout somehow while trying things here.

So it’d be only adding one more category for retrying, thanks for clarifying.
Merge request !282Add retry category. → (Merged) created by fjgarlin
Comment 9 months ago →
🇪🇸Spain fjgarlin
MR needs review.
Comment 9 months ago →
🇺🇸United States cmlara
I doubt we want to set the script failure globally, composer-base would likely be better.

Leaving as NR. Change will technically work however increases probability of re-running on other stages. See this downstream pipeline with phpunit failures https://git.drupalcode.org/project/api/-/pipelines/328919/builds
Comment 9 months ago →
🇪🇸Spain fjgarlin
Good point. It's also annoying that GitLab uses script_failure for two things: https://docs.gitlab.com/ee/ci/yaml/#retry

script_failure: - The script failed. - The runner failed to pull the Docker image. For docker, docker+machine, kubernetes executors.
Comment 9 months ago →
🇪🇸Spain fjgarlin
In any case, the issue seems to happen in the "composer" jobs more often than others, so I've made the change you suggested in #8, so it only affects "composer-base".
Comment 9 months ago →
🇺🇸United States cmlara
D7 branch as well ?

It's also annoying that GitLab uses script_failure for two things:

Thankfully to my knowledge we have not seen these in production. If we get to this level of failure we likely have a very significant infrastructure issue.

Although if the root cause of composer script failures is a GitHub block we may eventually see issues with loading 3rd party images from the GitHub registry.
Comment 9 months ago →
🇪🇸Spain fjgarlin
D7 branch as well ?

Done.
Comment 9 months ago →
🇬🇧United Kingdom jonathan1055
Would a test on Scheduler at D10 and D7 be helpful? It is unlikely to show anything, but can do it if useful?
Comment 9 months ago →
🇪🇸Spain fjgarlin
That’d be great if you can. Non downstream projects tests are really useful too, to make sure things work on projects where we don’t have full permissions.
Comment 9 months ago →
🇬🇧United Kingdom jonathan1055
Tested on D11
https://git.drupalcode.org/project/scheduler/-/pipelines/331061
With Composer Max PHP I used PHP_VERSION: $CORE_PHP_NEXT which cannot resolve to an installable set of dependencies yet, so it automatically retried twice (I did not re-run it manually). So that is working as expected, even though re-running would not help in this situation.

Tested on on D7
https://git.drupalcode.org/project/scheduler/-/pipelines/331067
Comment 9 months ago →
🇺🇸United States cmlara
Visually looks good to me, with tests from #15 appears safe to call this RTBC.

Note: To clarify this is not intended to stay long term, ideally the Infra team finds some solution to our timeout issue in the near future, as soon as they do we should revert this commit.

This is intended as a tradeoff in potentially higher CI runtime on valid failures in attempt to reducing the amount of maintainer/contributor labor due to failures outside their control
Pipeline finished with Skipped
9 months ago
#335177

fjgarlin → committed 5d373e84 on main

Issue #3484713 by fjgarlin, cmlara, jonathan1055: Use Retry:2 on script...

Comment 9 months ago →
🇪🇸Spain fjgarlin
Agree. I'm also happy that this is only for the "composer" jobs and not all jobs.
Comment 8 months ago →
System Message
Automatically closed - issue fixed for 2 weeks with no activity.

Use Retry:2 on sript failure for composer jobs

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!282Use Retry:2 on sript failure for composer jobs
Merged

Comments & Activities

Use Retry:2 on sript failure for composer jobs

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!282Use Retry:2 on sript failure for composer jobsMerged

Comments & Activities

!282Use Retry:2 on sript failure for composer jobs
Merged