Ignore more git files

Created on 4 March 2025, about 1 month ago

Problem/Motivation

When projects only have a dev release, they are brought via clone, and the CI artifacts can grow really big so it fails.
We've ignored a few git files in the past due to this, but it seems like we need to be more aggresive.

Steps to reproduce

โœจ CI Active and see a good number of commits failing on the composer step.

Proposed resolution

Widen the excluding pattern

๐Ÿ› Bug report
Status

Active

Component

gitlab-ci

Created by

๐Ÿ‡ช๐Ÿ‡ธSpain fjgarlin

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @fjgarlin
  • Merge request !336Widen files to exclude. โ†’ (Merged) created by fjgarlin
  • ๐Ÿ‡ช๐Ÿ‡ธSpain fjgarlin

    The suggested code

        exclude:
        - '.git'
        - '.git/**/*'
        - '$_WEB_ROOT/**/.git'
        - '$_WEB_ROOT/**/.git/**/*'
        - 'vendor/**/.git'
        - 'vendor/**/.git/**/*'
    

    is set in the MR in the linked issue and it's working.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom jonathan1055

    So the actual addition is

    - '$_WEB_ROOT/**/.git'
    - '$_WEB_ROOT/**/.git/**/*'
    

    but you are also changing vendor/**/.git/* to vendor/**/.git/**/* in d10 to match what we already had in main-d7. So the two sets are now identical. All looks good.

    In https://git.drupalcode.org/project/contribution_records/-/merge_requests/3 can we change that to use this MR and drop the composer customization? That would prove we have this right.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom jonathan1055

    I changed MR3 to use and test this MR336 and in fact more files were ignored. The previous run, with custom composer artifacts, the log shows

    .: found 102676 matching artifact files and directories 
    web/**/.git: excluded 1 files                      
    web/**/.git/**/*: excluded 49 files                
    .git: excluded 1 files                             
    .git/**/*: excluded 19 files   
    

    In the pipeline using MR336 we get

    .: found 102676 matching artifact files and directories 
    .git: excluded 1 files                             
    .git/**/*: excluded 26 files           <<< this has increased, was 19 above         
    web/**/.git: excluded 1 files                      
    web/**/.git/**/*: excluded 49 files 
    

    But I don't know if you want to investigate what those extras are?

    Also 102,676 files! Is there any scope to ignore more? I already have in my notes of things to raise, the fact that the composer artifact is huge when downloaded (for example 720Mb). But am I right in thimking it's the same set of files that are needed for the subsequent jobs? Just wondering if we can specify a smaller subset when downloading, or is it one and the same thing?

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom jonathan1055

    The other thing I noted is that the composer artifacts definition does not have any 'name:' key so the downloaded file is just called artifacts.zip. Same with 'upgrade status. But all the other jobs have
    name: artifacts-$CI_PIPELINE_ID-$CI_JOB_NAME_SLUG
    Can we change that to match? If so, shall we do it in this MR?

  • ๐Ÿ‡ช๐Ÿ‡ธSpain fjgarlin

    Happy to include #6 in this MR or in a separate issue.

    Re #5, not sure what the extra files are, but in any case, we should just ignore anything ".git" related as that has the potential to be really big.

    +100K files and +700Mb is heavy, but as long as it doesn't contain git files it should be file. Think that the "vendor" folder is part of the artifact and so is all of Drupal core.

    The alternative to not pass this big artifacts between jobs is to run the composer install commands inside each job, but that'll be a big change. There is an issue for this ๐Ÿ“Œ Consider merging the build and validate stages Active , which I tagged 2.x because I don't know if we could ever achieve BC on it. But in any case, so far there is no problem with this big artifacts as they are deleted after a few weeks automatically.

    --

    So, if doing #6 is quick and easy let's do it here, otherwise as a follow up and maybe we can RTBC this one. Happy either way.

  • Pipeline finished with Success
    30 days ago
    Total: 52s
    #441810
  • Pipeline finished with Success
    30 days ago
    Total: 50s
    #441845
  • Pipeline finished with Success
    30 days ago
    Total: 276s
    #441848
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom jonathan1055

    The alternative to not passing this big artifacts between jobs is to run the composer install commands inside each job, but that'll be a big change.

    So I think that answers my question "... can specify a smaller subset when downloading, or is it one and the same thing?"
    The composer artifact is deleted in 1 week, which is good. All the others are 6 months.

    I have pushed the change to specify a name for the composer and upgrade-status artifacts. Here is the re-run of Contribution Records MR3 - artifact name is ok.

    I also tested the GTD MR7 (which runs d9-basic on Drupal 9 and 10 with Upgrade Status to check D11) manually via UI specifiying this MR to test against. The two composer jobs correctly have the $CI_PIPELINE_ID-$CI_JOB_NAME_SLUG added, making it possible to download them both without a name clash. The Upgrade Status job likewise has the correct artifact name. Here is that pipeline

    I did notice in the composer artifacts, there are several vendor projects which have .github files. They are not large, of course, but could they also be ignored? Would simply changing .git to .git* in all the filter rows achieve that? It might be worth just seeing how much that reduces the file number and overall size? Other than that question, this would be RTBC.

  • Pipeline finished with Success
    30 days ago
    Total: 51s
    #441870
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom jonathan1055

    I pushed the change to ignore .git* and it made a bit of a difference to the number of files in the log

    web/**/.git*/**/*: excluded 478 files              
    .git*: excluded 3 files                            
    .git*/**/*: excluded 26 files                      
    vendor/**/.git*: excluded 53 files                 
    vendor/**/.git*/**/*: excluded 85 files            
    web/**/.git*: excluded 435 files    

    The download size unzipped was only reduced by a few MB and there are still .github folders downloaded, so I don't exactly understand how that exclude: is meant to work. I thought I did, but clearly not.

    This is RTBC if you want to get on and do it. The last commit can stay in, as it does not appear to do any harm?

  • ๐Ÿ‡ช๐Ÿ‡ธSpain fjgarlin

    Let's revert the last commit, as that could be too greedy and ignore files needed for pipelines, like this folder: https://git.drupalcode.org/project/drupal/-/tree/11.x/.gitlab-ci?ref_typ...

    I think just targeting ".git" and ".git/**" as it was before that commit should be enough for now. If we need to get deeper into excluding more files this could be a follow-up, but this one should be ready as a "quick" improvement.

  • Pipeline finished with Success
    30 days ago
    Total: 49s
    #441939
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom jonathan1055

    OK fine with me, I've reverted that.
    RTBC

  • Pipeline finished with Skipped
    30 days ago
    #441944
  • ๐Ÿ‡ช๐Ÿ‡ธSpain fjgarlin

    Merged. Thanks for the reviews, the extra addition and the tests.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States cmlara

    Just a note this should likley have been done as a v2 only change as it still can break pipelines.

    As noted in ๐Ÿ“Œ Ignore all git files in artifacts Active I have used jobs in the past that depend upon git being present in the modules folder (my workflow involves copying all of the modules code to the custom folder and working out of the custom folder for all later steps as itโ€™s better aligns to core design and avoids symlink faults)

    and the CI artifacts can grow really big so it fails.

    The better solution for this would likley be to not depend upon the artifacts being built until needed. We need to deal with the lack of caching on d.o however over in Quasar Iโ€™ve been designing (in loca testing) so that my phpunit and phpstan stages can be built โ€œon demandโ€ without storing the large asset files. Similar can be done as a v2 for gitlab_templates and is somewhat a design expectation if the templates ever move to components.

  • ๐Ÿ‡ช๐Ÿ‡ธSpain fjgarlin

    This was more of a bug fix than a feature addition. We never intended to pack ".git" files in the artifacts.

    The workaround, if you do need to have the ".git" files in the artifacts, would be to override the .composer-base:artifacts:exclude section.

    Building "on demand" is what's suggested in ๐Ÿ“Œ Consider merging the build and validate stages Active and it's definitely a 2.x must-have, to avoid scenarios like this.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024