Use artifacts to share the phpstan result and cspell caches from core to MRs

Created on 20 July 2024, 5 months ago
Updated 18 September 2024, 3 months ago

Problem/Motivation

~
Opening this as an alternative to #3387117: Enable distributed caching in GitLab Runner .

gitlab has an API for fetching artifacts from arbitrary branches and jobs and it recommends this over using the distributed cache, which it looks like may not be shared between issue repositories and the main project (and also isn't enabled on Drupal gitlab yet).

phpstan is the longest job in the lint step, taking 1m51s (and also requiring a lot of cpu), whereas the others complete in less than a minute.

If we can cut phpstan jobs to under one minute, that will potentially mean all pipeline runs return a minute faster.

What this MR does:

1. Configures the phpstan tmp directory so that it's within the workspace
2. Sets phpstan artifacts to always upload.
3. Downloads the phpstan result cache using curl from the gitlab API. This is to avoid using 'needs:' which results in a hard failure if the file isn't available, whereas we need the cache get to fail silently so phpstan can just run without the cache if it's not there. This is to prevent a circular dependency from the job to itself.

Steps to reproduce

https://git.drupalcode.org/project/drupal/-/jobs/2191580 instead of finishing in about 1m44s, finished in 55s. Because phpstan is the slowest lint step, this has the potential to reduce overall pipeline time (currently around 7m30s+ down to about 6m40+

Proposed resolution

Remaining tasks

Follow-ups are open to apply the same pattern to cspell and eslint. We should do phpcs too.

User interface changes

API changes

Data model changes

Release notes snippet

📌 Task
Status

RTBC

Version

10.4

Component
PHPUnit 

Last updated about 12 hours ago

Created by

🇬🇧United Kingdom catch

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @catch
  • 🇬🇧United Kingdom catch
  • Status changed to Closed: won't fix 5 months ago
  • 🇬🇧United Kingdom catch

    No this is a dead-end - no way to share artificacts between different pipelines of the same branch, let alone different branches.

    https://docs.gitlab.com/ee/ci/caching/

  • Pipeline finished with Failed
    5 months ago
    #230662
  • Status changed to Needs work 5 months ago
  • Pipeline finished with Failed
    5 months ago
    #230664
  • Pipeline finished with Failed
    5 months ago
    #230667
  • Pipeline finished with Failed
    5 months ago
    #230669
  • Pipeline finished with Failed
    5 months ago
    #230671
  • Pipeline finished with Failed
    5 months ago
    #230682
  • Pipeline finished with Failed
    5 months ago
    #230684
  • Pipeline finished with Failed
    5 months ago
    #230686
  • Pipeline finished with Failed
    5 months ago
    #230692
  • 🇬🇧United Kingdom catch

    Getting closer.

    I had to move the phpstan tmpDir into core so it can be used as an artificat, with the associated .gitignore etc. cspell cache is already in the core directory so hopefully this is fine.

    Temporarily set the ref to point to the MR branch, that lets us create artifacts in one pipeline and download them in the next to simulate what would happen once this is committed. Also added --vvv to the phpstan command so it tells us if it gets cache hits and misses.

    https://git.drupalcode.org/project/drupal/-/jobs/2190395 shows a cache hit.

    Here's a cache hit with Drupal.php changed https://git.drupalcode.org/project/drupal/-/jobs/2190459

  • Pipeline finished with Failed
    5 months ago
    #230801
  • Pipeline finished with Failed
    5 months ago
    #230805
  • 🇬🇧United Kingdom catch

    To make this optionally get the cache, we'd need https://gitlab.com/gitlab-org/gitlab/-/issues/389050 which doesn't exist. But I think we can get it in the script section with curl via the gitlab API instead per https://docs.gitlab.com/ee/ci/jobs/job_artifacts.html. This might be the only way to avoid a circular dependency between the job and itself...

  • Pipeline finished with Failed
    5 months ago
    #230818
  • Pipeline finished with Failed
    5 months ago
    #230820
  • 🇳🇱Netherlands bbrala Netherlands

    Interesting to just do it manually... Keeping an eye on this one.

  • Status changed to Needs review 5 months ago
  • 🇬🇧United Kingdom catch

    Here's a successful run using curl against the gitlab API, with the ref temporarily changed to the merge request ref (because HEAD doesn't store the phpstan result cache as an artifact yet).

    Given that works, this ought to work once it's merged and tries to fetch from HEAD instead.

  • 🇬🇧United Kingdom catch

    Ready for review, updated the issue summary with the eventual approach. A lot of twists and turns in the commit hisory because I was learning as I go, but it's not that complicated at the end of it all.

  • 🇬🇧United Kingdom catch

    One more commit (for now) to reduce the number of CPUs requested to 4 - we don't need 16 CPUs on a cache hit and this should mean less time waiting for pods to be available.

  • 🇬🇧United Kingdom catch
  • 🇬🇧United Kingdom catch

    One thing we could potentially do here but which I haven't is to have a specific job for warming this cache instead of using the job itself.

    - run the job only on commit runs
    - potentially run phpstan, cspell, anything else we want to generate artifact caches for
    - in the normal job, don't upload the cache as an artifact, only download it from gitlab

    This would mean duplicating most of the phpstan job into a second job, but it would result in less bandwidth and storage usage.

  • 🇳🇱Netherlands bbrala Netherlands

    You could possible scope the caching with usage of the variables of gitlab and add it to the phpstan job id think. That menas no extra job, but only cache on commit.

  • Merge request !8871Add a cache warming job → (Open) created by catch
  • 🇬🇧United Kingdom catch

    I couldn't work out how to do what with artifacts: when but there might be a way yeah.

    Added an extra MR branch here with an extra job for comparison in the meantime.

  • 🇬🇧United Kingdom catch
  • 🇬🇧United Kingdom catch
  • 🇳🇱Netherlands bbrala Netherlands

    I looked at using artifacte: when: and indeed, seems pretty hard to get it to do the work there.

    I specific cache job is a possibility, but thinking about this, this kinda is a bandaid until we can just use real caches by gitlab? I ended up not caring to much if it is seperate or not. Its a little more controlled seperately which is good. Might make the move to gitlab cache a little easier since we only need to change very little in the main job.

    I end up with preferring a seperate job. But the preference is kinda light.

  • 🇫🇷France andypost

    Nice to see this working!

    btw both cache and artifacts has TTL so some fallback when cache is missing still required.
    so instead of || true for curl if could use condition to print error message and re-init cache

  • 🇳🇱Netherlands bbrala Netherlands

    Isnt the fallback pretty much automatic, since if its not available it will just run without?

  • 🇬🇧United Kingdom catch

    Yes #21 is right, and the MR demonstrates this because we don't have the artifacts being created in core yet, so the cache is always empty (except when I hacked it to point to the MR instead).

  • 🇬🇧United Kingdom catch

    https://gitlab.com/gitlab-org/gitlab/-/issues/269404 suggests that forks can't use the cache from the upstream project, assume those forks are the same as our issue forks (but maybe they're not?), then this might not be a stop-gap but the only way to share the cache between core and all the issue forks.

  • 🇬🇧United Kingdom catch

    Put this into a sandbox MR, hacking it to get the cache from the MR itself, and got a test run of 7m10s, I think the shortest we've seen prior to that was about 7m25s. My hope is that this will get us under 7 minutes, which has been elusive so far, but 7m10s is in the right direction at least.

    https://git.drupalcode.org/project/drupal/-/pipelines/231155

  • Pipeline finished with Canceled
    5 months ago
    Total: 470s
    #232834
  • Pipeline finished with Canceled
    5 months ago
    Total: 155s
    #232852
  • Pipeline finished with Canceled
    5 months ago
    Total: 218s
    #232854
  • Pipeline finished with Success
    5 months ago
    Total: 450s
    #232862
  • 🇬🇧United Kingdom catch

    Opened 📌 [PP-1] Use artifacts to share the cspell cache between runs Postponed for cspell. Using the cache the entire cspell job completes in about a minute which means we can remove the git diff logic entirely. Postponed on this one because it uses the same lint cache warming job as this.

  • Pipeline finished with Failed
    5 months ago
    Total: 933s
    #234745
  • 🇬🇧United Kingdom catch

    catch changed the visibility of the branch 3462763-try-to-store to hidden.

  • 🇬🇧United Kingdom catch
  • 🇳🇱Netherlands bbrala Netherlands

    Yeah I'm also leaning towards a separate job for preparing the caches. Although it might cost you a little in feedback after commit.

  • 🇬🇧United Kingdom catch

    Although it might cost you a little in feedback after commit.

    This should hopefully be OK because the on-commit jobs (at least as the current MR stands) will continue to run the separate lint jobs, they'll just run this extra cache warming job on top. So if we introduce a lint failure, we should still be able to see what went wrong in those individual jobs. However it definitely introduces some level of uncertainty because any jobs that don't run on MRs are hard to test - should possibly add a when: manual to make that easier?

  • Status changed to Needs work 5 months ago
  • The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

    This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

  • Status changed to Needs review 5 months ago
  • 🇬🇧United Kingdom catch

    Rebased and merged 📌 [PP-1] Use artifacts to share the cspell cache between runs Postponed into here because the cspell changes are very small relative to the MR size, and I think it's easier to see how it'll work with multiple jobs when there's two examples in here.

  • Pipeline finished with Failed
    5 months ago
    #235314
  • Pipeline finished with Failed
    5 months ago
    #235315
  • Pipeline finished with Failed
    5 months ago
    #235316
  • Pipeline finished with Failed
    5 months ago
    #235317
  • Pipeline finished with Success
    5 months ago
    Total: 645s
    #235310
  • Pipeline finished with Success
    5 months ago
    Total: 566s
    #235318
  • 🇬🇧United Kingdom catch

    catch changed the visibility of the branch 3462763-cache-warming-job to hidden.

  • Pipeline finished with Failed
    5 months ago
    Total: 547s
    #235328
  • 🇬🇧United Kingdom catch

    Set up an MR in a sandbox issue to change the cache target to point back to the MR's own jobs - this is because the actual change here restricts the artificat download to only core branches, which means the cache is always empty until we actually commit this.

    phpstan: https://git.drupalcode.org/project/drupal/-/jobs/2245377
    cspell: https://git.drupalcode.org/project/drupal/-/jobs/2245381

    And the actual cache warming job itself. https://git.drupalcode.org/project/drupal/-/jobs/2245376

    The cache warming job also fetches its own cache, because this will save CPU cycles on the commit runs - then it'll write a new cache based on the changes. We don't have to do this, but it'll mean the updated files are available much quicker.

    We will probably want a follow-up to remove the verbose output from the jobs, but I think it's worth keeping that until we see it actually working after commit.

  • 🇳🇱Netherlands bbrala Netherlands

    Trying again. My 2 g9tlab comments got eaten.

    Can we make a reference of the cachefetcj script lines? That cleans the jobs themselves up, better elreadability and no duplication?

    Secondly, is cspell caching strategy content optimal? Id say metadata is fine and way less compute/io.

  • 🇬🇧United Kingdom catch

    Metadata is the default so at least by accident I tried that first. However it uses mtime which is a no go, so had to change it to content.

    The cache fetch lines are unique to each job so not sure that will help us much? Each only specifies the files it needs.

  • 🇳🇱Netherlands bbrala Netherlands

    Might have missed something then. Thought the cache fetch for the phpstan job was the same as the fetch fo r the warm job and assumed the same for cspell. Mightve missed a env var then that might be different.

    You do fetch the cache in the warm job right?

  • 🇬🇧United Kingdom catch

    No it's me that missed it. Thought you meant sharing the command between phpstan and cspell, not between the cache warming job and the lint job. Good plan and will do when back at computer.

  • 🇫🇷France andypost

    Maybe instead of custom scripting the glab ci a better option

  • 🇬🇧United Kingdom catch

    @andypost: afaict glab ci is intended for local use rather than on gitlab runners themselves, we'd have to install it in the gitlab images, and we'd also need to figure out authentication, if it supports job tokens, it doesn't mention that in the docs.

    @bbrala: moved the cache fetching to references, that is indeed tidier. Also moved cspell command to references since that's identical too. phpstan command we do different things between the cache warming job and the two invocations in the job itself, so left that bit.

  • 🇳🇱Netherlands bbrala Netherlands

    Nice and clean like this. Changes look good except I haven't seen the cache run with the latest code so bit wear to rtbc.

  • 🇫🇷France andypost

    @catch glab is limited in CI but because of absence of write permissions, the auth is glab auth login --token=$CI_JOB_TOKEN

  • Status changed to RTBC 5 months ago
  • 🇳🇱Netherlands bbrala Netherlands

    Well, this seems great. Checked if cache is downloaded and used, all seems like it should be.

    cpsell is 7 seconds cached, phpstan wat less.

    Think this is great, don't think the glab cli is really needed right now. It will need a small change before merge, but feel this is all good.

    Thanks catch for working on this <3

    Added credits for catch, me and also andypost.

  • 🇬🇧United Kingdom catch

    It will need a small change before merge

    Just to say the MR here doesn't have the hack to point the cache items to itself, those pipeline runs are from a branch on a different, sandbox issue so that this one stayed relatively clean (still managed to break it in numerous ways while working on it of course).

    @mstrelan I originally did that on purpose since they're both dealing with the version, but yeah let's move that comment back where it was. Done that, leaving RTBC because it's just moving a comment one line down.

  • 🇬🇧United Kingdom catch
  • Pipeline finished with Canceled
    5 months ago
    Total: 306s
    #237040
  • Pipeline finished with Failed
    5 months ago
    Total: 159s
    #237045
  • Pipeline finished with Failed
    5 months ago
    Total: 122s
    #237051
  • 🇳🇱Netherlands bbrala Netherlands

    bbrala changed the visibility of the branch 3462763-test-mtime-optimization to hidden.

  • Pipeline finished with Canceled
    5 months ago
    Total: 743s
    #237055
  • 🇳🇱Netherlands bbrala Netherlands

    Added followup issue to investigate using git-restore-mtime to optimize even more. Could mean we don't need strategy -> content but could use file metadata. Should theoretically be more performant.

    • alexpott committed c5d4f8a4 on 11.0.x
      Issue #3462763 by catch, bbrala, andypost: Use artifacts to share the...
    • alexpott committed c56ebfb7 on 11.x
      Issue #3462763 by catch, bbrala, andypost: Use artifacts to share the...
  • Status changed to Fixed 5 months ago
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    Committed c56ebfb and pushed to 11.x. Thanks!
    Committed c5d4f8a and pushed to 11.0.x. Thanks!

    • alexpott committed 2cd8c6ec on 11.0.x
      Issue #3462763 follow-up by alexpott: Use artifacts to share the phpstan...
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    This broke 11.0.x because of dictionary changes due to 📌 Update JavaScript dependencies for Drupal 11.0-rc Fixed - that remade the dictionary on 11.x but did not account for differences on 11.0.x... to fix I remade the dictionary on 11.0.x and committed.

  • Merge request !9008Also run on daily tests. → (Closed) created by catch
    • catch committed fffc68d7 on 11.0.x
      Issue #3462763 by catch, bbrala, andypost, alexpott: Use artifacts to...
    • catch committed fbe398b5 on 11.x
      Issue #3462763 by catch, bbrala, andypost, alexpott: Use artifacts to...
  • 🇬🇧United Kingdom catch

    Pushed a follow-up to also run this on the daily tests.

    I've been seeing 404s for the artifacts on the 11.x branch, but correctly seeing them on the 11.0.x branch.

    The gitlab API URL we're using provides 'the artifacts from the latest successful pipeline' - my theory is that if the latest successful pipeline doesn't actually run the job, then you end up with a 404, even if the last pipeline where the job ran was successful.

    If so, this means at least for now, we need to switch the scheduled performance jobs to run on the 11.0.x branch, and ensure that the cache warming job runs on the daily pipeline - so that every 11.x pipeline runs this job.

    Should find out whether the above is all correct within an hour or so.

  • Pipeline finished with Success
    5 months ago
    Total: 362s
    #240046
  • 🇬🇧United Kingdom catch

    Confirmed. After disabling the 11.x performance test schedule (now moved to 11.0.x) and after a successful on-commit pipeline run, I got a cache hit on a core MR: https://git.drupalcode.org/project/drupal/-/jobs/2295859

    • catch committed 2ef15514 on 11.0.x
      Issue #3462763 by catch, bbrala, andypost, alexpott: Use artifacts to...
    • catch committed 20ea83cb on 11.x
      Issue #3462763 by catch, bbrala, andypost, alexpott: Use artifacts to...
  • 🇬🇧United Kingdom catch

    One more follow-up commit - after the above, I still haven't seen a cache hit on an MR that's not from a core developer. I wonder if it's the job token not being valid for branch pipelines when it's from a fork. However, @fjgarlin tested the URL logged out and was able to download the artifacts, which means we don't need the job token at all anyway because the API URL is completely publicly accessible anyway.

  • Status changed to Downport 5 months ago
  • 🇬🇧United Kingdom catch

    We should try to backport this to 10.4.x and 10.3.x, not because MR pipeline times matter that much on those branches, but it would help to keep the gitlab files in sync for other changes. There's other issues tagged for backport though we should get in first.

  • First commit to issue fork.
  • Merge request !9353Use artifacts to share 10.4.x → (Open) created by quietone
  • 🇳🇿New Zealand quietone

    Made an MR for the 10.4 backport, MR 9353.

  • Status changed to Needs review 3 months ago
  • 🇬🇧United Kingdom catch
  • Status changed to RTBC 3 months ago
  • 🇺🇸United States smustgrave

    Backport seems inline with 11.x

  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    WOWWWWWWWWW! 🤩🤩🤩 👏

    Can we achieve the same for https://www.drupal.org/project/gitlab_templates , to accelerate contrib CI?

  • 🇬🇧United Kingdom catch

    @Wim the current solution only works for core scheduled/on-commit pipelines and MRs from committers so far. This is due to how both phpstan and all the yarn linters validate their cache files (mostly to do with absolute paths, but also what project ID points to etc.), 🐛 Hardcode the core project ID for fetching artifacts from the gitlab API Needs review has details.

    I think that contrib would have the same issue - i.e. caching working for scheduled/on-commit pipelines and committers but not anyone else, so we should probably get core 100% working before trying to apply the same pattern. However in theory it should be be possible to make it work the same way.

  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    I think there's some differences between what's in 11.x and what's in the backport. I'm not sure they're significant but would appreciate a second opinion

Production build 0.71.5 2024