Add job for CSPELL in contrib pipeline

Created on 4 December 2023, 7 months ago
Updated 7 March 2024, 4 months ago

Are there any plans to add a job to run CSPELL for contrib? Core has this and it would be great for contib too. I don't think it was available for contrib on drupal.org CI so this might be a new addition?

I've not been involved with any of it before, but happy to look at the core template and take some guesses as to how it could be added for contrib. If anyone has some ideas, please share.

✨ Feature request
Status

Fixed

Component

gitlab-ci

Created by

πŸ‡¬πŸ‡§United Kingdom jonathan1055

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @jonathan1055
  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    Core uses a package brought via yarn: https://git.drupalcode.org/project/drupal/-/blob/11.x/.gitlab-ci.yml?ref...

    We'd need to somehow add the ability for a module to have its own dictionary of allowed words, in addition to core's one.

  • πŸ‡§πŸ‡ͺBelgium Wim Leers Ghent πŸ‡§πŸ‡ͺπŸ‡ͺπŸ‡Ί

    I'm looking forward to adopting this in my contrib modules! See #3347168-3: Match core's code quality checks β†’ .

    We'd need to somehow add the ability for a module to have its own dictionary of allowed words, in addition to core's one.

    There's a PoC of that in https://git.drupalcode.org/project/cdn/-/merge_requests/10/diffs πŸ˜‡

  • Merge request !128Add job for CSPELL β†’ (Merged) created by fjgarlin
  • Status changed to Needs review 4 months ago
  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    Added a proof of concept. It tries to read a .cspell.json file from the repo. otherwise it creates an empty one, to which we can pass some words (in json format).

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    Looks good on the first test. No .cspell.json and nothing specified for variable _CSPELL_WORDS gives
    CSpell: Files checked: 178, Issues found: 723 in 114 files
    https://git.drupalcode.org/project/scheduler/-/jobs/824499

    Second run with _CSPELL_WORDS: '"unpublish", "unpublishing", "revisionable"'
    I guess this is not the correct syntax for the words, because it fails with

    $ test -f .cspell.json || printf '{"words":["%s"]}\n' "$_CSPELL_WORDS" > .cspell.json
    $ $CI_PROJECT_DIR/$_WEB_ROOT/core/node_modules/.bin/cspell -c .cspell.json $_WEB_ROOT/modules/custom/$CI_PROJECT_NAME/**
    Configuration Error: Failed to read config file: "/builds/project/scheduler/.cspell.json"
    CSpell: Files checked: 0, Issues found: 0 in 0 files
    
  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    Easy fix, there was an extra set of double quotes around %s
    I have pushed a fix, and also added a temporary cat of the .cspell.json file, to show what we are working with.

  • Pipeline finished with Success
    4 months ago
    Total: 33s
    #95764
  • Pipeline finished with Success
    4 months ago
    Total: 33s
    #95855
  • Pipeline finished with Success
    4 months ago
    Total: 33s
    #96006
  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    I have made three more change here, but if you don't like them I can undo.

    1. Added --show-context --show-suggestions to show the context in which the word is found. This helps decide whether to add it as an acceptable word or not. Also show the possible correct words, to help you fix the source
    2. Change directory into web/modules/custom/{project} so that the path to the files to check is just **. This means that the filenames in the are shorter and relative to the project root - see screen grab
    3. created an artifact of the unrecognised words, for download and copy/paste into the projects custom dictionary if they have one.

    All looking very good so far. Last test run https://git.drupalcode.org/project/scheduler/-/jobs/827373

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    I've just realised I have broken the artifact download by changing directory. Will fix it tomorrow.

  • Pipeline finished with Success
    4 months ago
    #96502
  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    Fixed the path. Ready for review.

  • Pipeline finished with Success
    4 months ago
    Total: 33s
    #96531
  • Pipeline finished with Success
    4 months ago
    Total: 33s
    #96538
  • Status changed to Needs work 4 months ago
  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    We'll need a new entry in the readme table where we list the jobs.

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    Doing some testing on the return code when no spelling error are found. I thought the quickest way to test this was to ignore all files in the project, so I added

        // Paths that do not need to be checked.
        "ignorePaths": ["*", "./*"]
    

    to the project's .cspell.json, but the cpell command still returned false, so that EXIT_CODE=$? gave 1
    https://git.drupalcode.org/project/scheduler/-/jobs/833922
    Maybe ignoring everything it is not counted as a valid 'good' clean pass? I will test without ignoring all, but have all wrong words added to my allowed file and see what we get.

  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    Not sure I'm following.

    My suggestion is to wrap the echo and the second run, and run them only if the first one is unsuccessful.

    if [[ $EXIT_CODE ]]; then
      echo "# CSPELL Unrecognised or mis-spelled words.\n# You may want to add some of these to your custom project dictionary" >> _cspell_unrecognized_words.txt
      $CI_PROJECT_DIR/$_WEB_ROOT/core/node_modules/.bin/cspell -c .cspell.json --words-only --unique --no-progress ** | sort --ignore-case >> $CI_PROJECT_DIR/_cspell_unrecognized_words.txt || true
    fi
    
  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    That's exactly what I have done :-) Maybe I was mis-leading, we don't need to ignore anything, I was trying to do that as a shortcut for making it appear that there were no spelling errors. I think it gave a false return because 0 files were checked. The log shows CSpell: Files checked: 0, Issues found: 0 in 0 files

    I've now retested without that ignore, but adding all the wrong words into a custom dictionary, and we correctly get blank for $EXIT_CODE. In the rare case that someone wants to ignore all spellchecking by adding "ignorePaths": ["*", "./*"] instead of using SKIP_CSPELL=1 then they will just have to accept that the second run is still executed. It is such an edge case that it does not matter! Checking for "$EXIT_CODE" != "" is the right thing and will give the required outcome. I will push the changes.

  • Pipeline finished with Success
    4 months ago
    Total: 63s
    #96736
  • Pipeline finished with Success
    4 months ago
    Total: 33s
    #96742
  • Status changed to Needs review 4 months ago
  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    There are a couple of echos for debug, I will remove them when we see it is all working.

    Test when no words are mis-spelled
    https://git.drupalcode.org/project/scheduler/-/jobs/834852
    File is browseable but is empty (artifact cannot be optional)

    Lots of spelling mistakes
    https://git.drupalcode.org/project/scheduler/-/jobs/834872

    Test when no .cspell.json is found, passing in variable _CSPELL_WORDS
    https://git.drupalcode.org/project/scheduler/-/jobs/834906

    Test when no .cspell.json is found. No words added
    https://git.drupalcode.org/project/scheduler/-/jobs/834963

    Testing SKIP_CSPELL
    https://git.drupalcode.org/project/scheduler/-/pipelines/96529

    Ready for review, and for testing on other projects.

  • Status changed to RTBC 4 months ago
  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    It looks good to me. We just need to clean up the unneeded comments/output and it's got to be shipped.

  • Pipeline finished with Success
    4 months ago
    Total: 33s
    #96793
  • Pipeline finished with Success
    4 months ago
    Total: 640s
    #96795
  • Pipeline finished with Skipped
    4 months ago
    #96828
    • fjgarlin β†’ committed 06bfa0a9 on main
      Issue #3405955 by jonathan1055, fjgarlin: Add job for CSPELL in contrib...
  • Status changed to Fixed 4 months ago
  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    Merged, marking this as fixed and re-generating tag 1.2.0.
    Thanks.

  • πŸ‡§πŸ‡ͺBelgium Wim Leers Ghent πŸ‡§πŸ‡ͺπŸ‡ͺπŸ‡Ί

    I want to adopt this job in πŸ“Œ [PP-1] Match core's code quality checks: cspell Postponed but it's not clear to me when this new 1.2.0 version will become the default? (I'd rather not pin to a specific version.)

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    I have two more things regarding this cspell job, the first is an easy change which can be done here, the second is longer, so we can start the discussion here but it may end up as a separate issue.

    1. Write the number of mis-spellled words to the log

    I started work fixing the cspell errors in Scheduler, there were hundreds, but a great many were repeats. I added a core dictionary and that reduced the number, I added a second dictionary and that helped more. It would be really informative to show the number of different unrecognised words in the log, right after the cspell summary. This shows instant feedback on how the latest change performed, gives a measure of progress and serves as encouragement to get the number lower. I will create a new MR on this issue for this change.

    2. Use of core dictionaries/config

    Not at first, but I soon realised that many of the unrecognised words should already be stored in a dictionary. Drupal words like theme names etc. I then looked at what Core does, and there are two dictionaries core/misc/cspell/drupal-dictionary.txt and core/misc/cspell/dictionary.txt which are added via Core's .cspell.json. So I created my own config and added these. There is no project that can run cspell and need these two dictionaries. I also had to add the 'php' dictionary as many function names were being reported. The question is - can we make the cspell job better by copy/inheriting the core config file? Or parts of it. Currently if .cspell.json does not exist we create one and add the words from the $CSPELL_WORDS variable. But this is almost a useless feature, because if you want to add dictionaries the project needs its own .cspell.config thus making the WORDS variable redundant. So we could copy/inherit the config from core. Then maintainers will be faced with the real but much smaller task of fixing their spelling errors, not be put off by 1000s of unrecognised words which may mean they simply choose to skip_cspell=1 and ignore it all.

  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    @wim-leers, we usually leave a week or two between releasing new features and making them available, by default, to all contrib.
    If you want the latest and brightest for your module as we commit it, you can do ref: main in your .gitlab-ci.yml file.

    You have some more documentation about the allowed values here https://git.drupalcode.org/project/gitlab_templates/-/blob/main/gitlab-c... or here https://git.drupalcode.org/project/gitlab_templates/-/tree/main?ref_type....

  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    @jonathan1055 - maybe we can do both of those suggestions in a follow-up. "Improve cspell job" or something like that.

    We could have a default ".cspell.json" file in the repo with the recommended set up, which we use (bringing it via curl) if we don't find one in the project. We can have the file with a placeholder for the words so this feature keeps on working.

    I think follow-up is best.

  • πŸ‡§πŸ‡ͺBelgium Wim Leers Ghent πŸ‡§πŸ‡ͺπŸ‡ͺπŸ‡Ί

    @fjgarlin in #22: RE "week or two" πŸ‘

    The question is - can we make the cspell job better by copy/inheriting the core config file?

    Yes, please.

    I thought that already was the case here πŸ˜… Without that, this job would likely fail for virtually every project? πŸ€”

    In other words, without #21 being fixed, there is no real way to using this? This will with almost 100% certainty break every contrib module using ref: main πŸ˜‡

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    OK. I didn't see your comment before I created a new branch for the number of words. I will create a follow-up issue.

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    jonathan1055 β†’ changed the visibility of the branch 3405955-number-of-words to hidden.

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    jonathan1055 β†’ changed the visibility of the branch main to hidden.

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    I have created #3422323: Improve CSPELL β†’ and will work on it right now. This can probably be included in the 1.2.0 ref also.

  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    @wim-leers, that's the reason for our "wait to change" the default tag strategy vs main.
    Thanks @jonathan1055 for the follow-up, let's make it now Drupalisms-proof :-)

  • πŸ‡§πŸ‡ͺBelgium Wim Leers Ghent πŸ‡§πŸ‡ͺπŸ‡ͺπŸ‡Ί

    Will we retroactively update the 1.2.0 tag? πŸ€”

    Wouldn't it be better to use a beta or rc tag to emphasize the experimental nature?

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    All the versioning plans were done in #3404175: Adopt (semver) versioning for gitlab_templates β†’ and I think it is working really well. This project is not like other Drupal projects. The moving tags are perfect for it.

  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    We will just bump it to 1.2.1 after fixing the "bug".

  • πŸ‡§πŸ‡ͺBelgium Wim Leers Ghent πŸ‡§πŸ‡ͺπŸ‡ͺπŸ‡Ί

    πŸ‘ Thanks for the context.

  • Automatically closed - issue fixed for 2 weeks with no activity.

  • πŸ‡ΊπŸ‡ΈUnited States apotek

    Is there anyone who takes the time to assess how these decisions affect contrib developers. A person makes a suggestion, there's no meaningful debate, and the next thing you know, every maintainer of every contrib module now has to spend time making cspell happy. When code linters came out I was an early adopter. Now I see merge request after merge request that add nothing to the functionality of the modules, but merely strive to keep them compliant with the latest enforcement tools. And... in the comments on contributed fixes, more is made of an linter warning on a yml indentation, that on how the code is actually working.

    I am waiting for someone to roll out a linter that enforces the correct use of the oxford comma.

    Where is this stuff being talked about? And is it ok to push the same core procedures out onto all contributed module developers without any meaningful debate at all? (I apologize if there is such a forum, in which case, I would love a link to that discussion).

  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    #3389921: [META] New templates for contrib: mirror as much as possible all the GitlabCI jobs from core β†’ was created in September 2023. Our goal was/is to get the tools available for core also available for contrib.
    - Core runs the following lints: phpstan, phpcs, eslint, stylelint and cspell.
    - It also runs in the testing phase: phpunit, nightwatch and the test-only job.

    Debate about issues can be found on that meta issue, the related ones, and constantly on slack.

    As you can see, we are now running the same ones, but with one very important difference, none of the lint jobs here will break or change the overall outcome of a pipeline. If a pipeline succeeds before, it succeeds after. A warning is just a warning (and it's just the way for GitLab to display it), which you can decide to ignore, or skip. Maintainers can decide to do nothing, and their pipelines will still succeed. It's like getting deprecation warnings when running phpunit (the tool, not the job), the tests still pass, but the warnings are shown, and you can decide whether to act on them or not.

    The same could be argued for the phpstan job, or the soon-to-come test-only job. Some maintainers like some, some maintainers like others, some maintainers like none, and that's all ok as it's very easy to tweak.

    You can argue that they don't add functionality, but again, why would we run linters on core? or phpcs at all in contrib? Each maintainer can decide what they want to run, and that will depend on the context of the module. Maybe cspell is a good tool for non-natives, or phpstan is a good tool for people new to Drupal.

    Again, and coming back to the first paragraph, you can see that we are close to reaching our goal of features available in core vs features available in contrib. We are all working in the same framework, so having tools that help us work in the same way makes sense, but again, you are free to choose what you run or not run.

    Note that #3389921: [META] New templates for contrib: mirror as much as possible all the GitlabCI jobs from core β†’ introduced Semver tags, so if you don't want any new features, you could use 1.1.x-latest instead. This was announced in slack and it's documented here and here. Again, it's a choice.

    Each feature that we add is discussed thoroughly if it should be opt-in or skip-based, so your feedback is really welcome in the issue queue and slack. We are also tracking the costs and we are trying to make decisions based on that too (for example, we don't run parallel matrix of php + database versions, even though we could).

    I'm not sure if I have replied to all your feedback/concerns, but if something is still not clear, please do mention it.

  • πŸ‡¨πŸ‡­Switzerland Berdir Switzerland

    I might not have feel as strong about this as #35 but I pretty much agree with the TL;DR. I think this should be opt-in, not opt-out.

    Setting up and maintaining a cspell dictionary is a non-trivial amount of work and from my experience, it's often pretty tedious in core issues to make cspell happy.

    > Now I see merge request after merge request that add nothing to the functionality of the modules, but merely strive to keep them compliant with the latest enforcement tools

    FWIW, this has been a problem for a long time, long before merge requests and GitlabCI were a thing. Popular modules have so many overlapping and duplicate issues and the patches/MR's in many cases have really bad quality (invalid syntax, generated comments, empty docblocks just to make phpcs happy). That said, having proper tools as part of CI makes it actually easier to deal with in my experience.

    Either way, it's pretty much out of scope for this issue, but it's something that we need to discuss as a community, because it is in my experience burning out both new contributors and project maintainers.

  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    Agree that a bigger discussion should probably happen out of here.
    Also, reiterate that we are matching what's available for core. I don't see many more new jobs being created soon (other than the test-only).

    --

    Now specific to the cspell, you can either:
    - Add SKIP_CSPELL: 1 to skip it and forget about it.
    - Add _CSPELL_WORDS: '"list","of","words"' to add a list of allowed words.
    - See the artifacts generated by the job (.cspell-project-words.txt) and put them in your project.

  • πŸ‡ΊπŸ‡ΈUnited States apotek

    Thank you @fjgarlin, @berdir for your patient replies. Many of us mortals do not have the ability to monitor slack daily and stay on top of every META issue. It's hard enough to just (fail) at keeping up with bug fixes and feature requests. I accept that the cost of not participating on Mt Olympus is that our fate must be handed to us.

    I just feel in the past year that the balance of work done to develop actual features or fix bugs is reaching less than half, and the other more-than-half is just keeping up with deprecations, Symfonyisms, phpstan and linters. That's what I am seeing in the issue queues. Thanks again for the responses. I'll try to stay on top of META developments in the future.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Well guess who stumbled upon this issue after finding out that all of their projects on GitLab now really hate their name :D

    On topic: While it's tempting to simply use SKIP_CSPELL: 1, it's a bit sad to see that cspell isn't all that clever, or at least isn't out-of-the-box. Could we at least make it ignore standard metastuff like composer.json so that it doesn't complain about my name and all of the garbled words in the dependencies?

    I'd love for it to catch actual spelling mistakes, but having to create a custom dictionary just for the author names, composer packages, etc. does seem rather tedious.

  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    Note that you can setup _CSPELL_IGNORE_PATHS: '"composer.json"'.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Yes, I already guessed we can, but the question is more if we should. I would assume that most modules will have some stuff in composer.json that is causing cspell failures. So why don't we ignore that by default?

Production build 0.69.0 2024