Fix cspell errors in Search API

Created on 11 March 2024, 10 months ago
Updated 5 May 2024, 8 months ago

Unfortunately, our GitLab CI pipelines are currently failing due to lots of CSpell errors being reported (example). I’m pretty sure most, if not all, of them are bogus, but it would still be great to have both green pipelines and spellchecking.

Therefore, we need to go through all the reported typos and fix them in some way – in most cases, by either adding a // cspell:disable WORD line at the top of the file or ignoring any lines with bogus text with either a single // cspell:disable-next-line directive or wrapping multiple such lines in // cspell:disable/// cspell:enable. (I already started with the src/Plugin/search_api/processor/Resources/*.php files, but this is gonna take a while …)
Details can be found here and here.

It’s also possible apparently to create a custom .cspell.json for the module – for instance, this could be the only way to fix reported errors for correctly spelled words in non-code files (for instance, README.md). Also, it would be more elegant for words like “datasource” which we use in lots of files.
Unfortunately, our custom .cspell.json would replace the one provided by Core, with no way to combine directives from both of them. There is already a template .cspell.json file which solves this by copying the necessary directives from Core’s .cspell.json file, but this seems less than ideal both because of the unstable and unpredictable relative paths and because we’d have to manually keep track of any changes to that template (or Core’s .cspell.json file). There is already an issue open for part of this, maybe some nicer solution will be found there: #3426136: Execute CSPELL in project root folder and make fixing words easier .

Passing extra allowed words in the _CSPELL_WORDS GitLab CI variable would be another option, but as this wouldn’t help with running CSpell locally, I wouldn’t want to do that.

📌 Task
Status

Fixed

Version

1.0

Component

General code

Created by

🇦🇹Austria drunken monkey Vienna, Austria

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @drunken monkey
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Temporarily disabled CSpell for our pipelines in 📌 Temporarily disable CSpell so GitLab CI pipelines are green Fixed , so they are green in the meantime while this gets fixed. So, the MR for this issue should re-enable it (that is, remove the SKIP_CSPELL variable from .gitlab-ci.yml again).

  • 🇦🇹Austria drunken monkey Vienna, Austria

    Not sure this is Novice-level after all, since I don’t know myself how to best do this.

  • 🇬🇧United Kingdom jonathan1055

    #3426136: Execute CSPELL in project root folder is very near completion and this will make is much easier for you to fix spelling errors. I suggest you wait until that issue is done, then you will have a much nicer time here :-)

  • 🇬🇧United Kingdom jonathan1055

    https://project.pages.drupalcode.org/gitlab_templates/jobs/cspell/ has the gtilab cspell documentation. But there will be some updates when the new issue is committed

  • 🇬🇧United Kingdom jonathan1055

    Hi,
    Are you OK if I use this issue to do final testing on the gitlab_templates #3426136: Execute CSPELL in project root folder and make fixing words easier ?

  • Merge request !116Issue #3427068: Fix cspell errors → (Merged) created by jonathan1055
  • Pipeline finished with Success
    9 months ago
    Total: 210s
    #120129
  • Pipeline finished with Success
    9 months ago
    #120213
  • Pipeline finished with Success
    9 months ago
    Total: 130s
    #120279
  • Pipeline finished with Success
    9 months ago
    Total: 186s
    #120292
  • 🇬🇧United Kingdom jonathan1055

    Here is the starting point for fixing the cspell job

    CSpell: Files checked: 548, Issues found: 1601 in 177 files
    The number of unrecognised/misspelled words is 251
    An artifact file has been created containing a unique list of these unrecognized words, for you to browse or download.
    

    I have attached the artifact file to this issue. Some may be genuine typos, others are test data.

    This output is from a completely un-customised job. The project does not have a .cspell-project-words.txt file, and the standard set of files are ignored (see the list on docs/jobs/cspell.md#commonly-ignored-files)
    There are no custom words added to _CSPELL_WORDS and no extra paths ignored in _CSPELL_IGNORE_PATHS

    You do not need to create a custom .cspell.json file. But you mention running cspell locally, so you may wish to download the cspell_json artifact and use that. You can modify any of the dictionary paths to meet your requirements because these are reset to the gitlab paths during the pipeline job.

  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks a lot!
    Do you need me to help you here with anything so #3426136: Execute CSPELL in project root folder and make fixing words easier can be fixed, or should I just wait for that? Anyways, very good to see this becoming easier for contrib maintainers.

  • 🇬🇧United Kingdom jonathan1055

    The issue is in final review, and I think it will get committed early next week. But that is only to the 'main' branch, and will only be available for those projects using 'main'. See the explanation on gitlab-ci/template.gitlab-ci.yml?ref_type=heads#L36-42

    If you wanted to start working on your spelling fixes, you can use this MR, and anything you do here will be perfectly valid for when the 'default-ref' tags gets updated in gitlab_templates and the new feature gets rolled out to all users.

    Here is the latest updated cspell doc page and if you do start work on this, you could read the new docs and let us know if anything is unclear or could be expained better. It would be nice to see in a real example like your project, how far you can get without adding your own .cspell.json file and just using .cspell-project-words.txt and _cspell_ignore_paths, and maybe some cspell:disable-next-line comments.

  • Pipeline finished with Success
    9 months ago
    #131351
  • Pipeline finished with Success
    9 months ago
    Total: 209s
    #131355
  • 🇬🇧United Kingdom jonathan1055

    #3426136: Execute CSPELL in project root folder and make fixing words easier was committed and also the gitlab_templates default-ref tag has been updated to include this change for all contrib by default. Hence I have reverted the temporary change here.

    The updated CSPELL doc page is https://project.pages.drupalcode.org/gitlab_templates/jobs/cspell/

  • Pipeline finished with Success
    9 months ago
    Total: 172s
    #131366
  • 🇬🇧United Kingdom jonathan1055

    After adding .cspell-project-words.txt with just two words + plurals the number of instances is reduced from 1601 to 505

    CSpell: Files checked: 548, Issues found: 505 in 56 files
    The number of unrecognised/misspelled words is 241
    
  • Pipeline finished with Success
    9 months ago
    Total: 189s
    #131396
  • Pipeline finished with Success
    9 months ago
    Total: 163s
    #131416
  • 🇬🇧United Kingdom jonathan1055

    I also noticed you have some Lorem Ipsum text, and we have a new issue #3436206: Add lorem-ipsum spelling dictionary to load and use a dictionary for those words. I changed to use that MR and the result is now down to 257 words.

    CSpell: Files checked: 548, Issues found: 257 in 55 files
    The number of unrecognised/misspelled words is 135
    

    I tried using a wildcard * suffix as per https://cspell.org/docs/dictionaries-custom/#words-list-syntax but it did not work.

    It will be up to you how you decide to clean the rest of the issues. Some are genuine typos, but many are part-words which of course is the nature of this module. You may want to exlcude some sections with cspell:disable and cspell:enable, or add the allowed words into a cspell:ignore thisword thatword comment at the top of the specific file that uses them. Or add them all into the .cspell-project-words.txt file.

  • 🇬🇧United Kingdom jonathan1055

    I can see you are fixing this - nice work. I left a comment on the MR but you have already noticed the error, and changed cspell:disable theword to cspell:ignore theword. So you can disregard my comment.

  • Status changed to Needs review 9 months ago
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks again for all your work and help on this!
    The pipeline passes now, so this would be mergeable. However, I’m not sure whether it’s better to do that right away or better to wait until #3436206: Add lorem-ipsum spelling dictionary is fixed? Probably the latter, but that makes it more likely that merge conflicts will be introduced.

    Also, one more question: Is there an easy way to run CSpell locally with the same settings as in GitLab CI? I tried to just copy .cspell.json from core/ to this module (excluding it from Git so I can just use it locally) and adapting the paths, but still there are lots of errors – even ignoring the Lorem Ipsum ones. E.g., “cacheable“ and “cacheability” are reported as typos.

    Furthermore, it seems GitLab CI doesn’t check for the forbidden words that Core uses – especially “please”, which this module unfortunately uses a lot. Do you know whether that is on purpose, should contrib modules not avoid using “please”, too? If so, is there a way to get the forbidden words checks back for this module’s CI? (There would be a lot more to fix in that case, but not too much to make it infeasible.)

  • Status changed to Active 9 months ago
  • 🇬🇧United Kingdom jonathan1055

    You are welcome, I'm pleased to help, as we also used your module to test the previous gitlab_tempates MR.

    ... wait until #3436206: Add lorem-ipsum spelling dictionary is fixed? Probably the latter, but that makes it more likely that merge conflicts will be introduced.

    There's no harm in committing this now, as it would avoid the potential conflicts. The gitlab MR will remain useable even when it is merged. You would be effectively running against a static gitlab_templates that is a little ahead of the 'default-ref' tag, but that is fine. The lorem-ipsum MR will probably be merged next week, but that would only be to the 'main' ref. You would need to wait until the 'default-ref' tag gets updated before returning your template to use the two recommended ref/repo values. I can let you know here when that is done, or you can check via https://git.drupalcode.org/project/gitlab_templates/-/tags You can also see what is committed to 'main' but not rolled out to 'default-ref' yet, via https://git.drupalcode.org/project/gitlab_templates/-/compare/default-re...

    Is there an easy way to run CSpell locally with the same settings as in GitLab CI?

    If you have got cspell running locally, you can download the modified .cspell.json file that is actually used in the cspell job, as it is saved as an artifact named _cspell_json.txt - as mentioned here https://project.pages.drupalcode.org/gitlab_templates/jobs/cspell/#full-...
    But it depends whether you just want to use this to check spelling locally (in which case you could add it to .gitignore) or whether you want to use that file for other customisations and add it in to your project. I designed the prepare-cspell.php to cope with using any changes made but also to make it run on gitlab pipelines. So you can alter the paths to the two core dictionaries, depending on whatever local folder hierarchy you are using.

    GitLab CI doesn’t check for the forbidden words that Core uses – especially “please”, which this module unfortunately uses a lot. Do you know whether that is on purpose, should contrib modules not avoid using “please”, too?

    Yes, we had some discussion about that, and the decision was to remove of those words, which included 'please', but we do still forbid 'e-mail', 'grey' and 'queuing'. You will see this when you download your _cspell_json.txt file. You could add 'please' into the array if you were intending to commit a .cspell.json file. Or you could raise an issue on https://www.drupal.org/project/gitlab_templates/issues to request that all the core words get added back by default, and make a good case for doing it.

    I am interested to see how you get on with the downloaded .cspell.json file, as I do not actually have cspell locally!

  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks for the feedback. Then I’ll merge this right away. Would be great if you could notify me once I can revert the changes to the used template file project/branch.
    In any case, thanks a lot again!

    I am interested to see how you get on with the downloaded .cspell.json file, as I do not actually have cspell locally!

    Well, I just used the one already included in the Drupal installation. I just ran yarn install in core/ and then ran ../../core/node_modules/.bin/cspell . inside the module directory. After downloading and adapting the .cspell.json file from the CI artifacts this worked like a charm – so, once again, thank you for your help! Then I can just run this locally as a pre-commit hook (as I already do with PHPStan and others) to spot any new typos right away. Plus, I can make sure that “please” stays out of this module after we fix 📌 Remove uses of "please" from this module Active . (Though I guess this still wouldn’t guard against MR commits by others …)

    Keeping this issue open so I can still revert the changes to the used template file project/branch.

  • 🇬🇧United Kingdom jonathan1055

    Would be great if you could notify me once I can revert the changes

    Yes I will, but you can also check yourself. When #3436206: Add lorem-ipsum spelling dictionary is merged you should see the commit listed in this link:
    https://git.drupalcode.org/project/gitlab_templates/-/compare/default-re...
    This shows all commits that are in 'main' but not yet reached 'default-ref'. If the commit is showing here then you need to wait.

    The following link also show the commits and where the tags are. You can see at the moment there are two commits which are above the set of tags, meaning that these are only in 'main'
    https://git.drupalcode.org/project/gitlab_templates/-/commits/main
    When the tags have been updated to include the lorrm-ipsum commit you will see it here.

    Do you think that 'please' should be added back into the flag-words for contrib?

  • 🇦🇹Austria drunken monkey Vienna, Austria

    Yes I will, but you can also check yourself. When #3436206: Add lorem-ipsum spelling dictionary is merged you should see the commit listed in this link:
    https://git.drupalcode.org/project/gitlab_templates/-/compare/default-re...

    Thanks, that’s very simple to follow. Then I’l just do that.

    Do you think that 'please' should be added back into the flag-words for contrib?

    I’m not sure, honestly, otherwise I would have created an issue. I personally want to follow that rule in this module, and I do think it makes sense that, if Core thinks the word should not be used, then contrib modules should also follow that. Especially those that care enough about standards to actually have cspell run on their code.
    However, I’m still hesitant to proscribe this to all modules. I think I’d rather just enable it for this one – unfortunately, that doesn’t seem possible without forking the .cspell.json file (and thus not getting further updates for it).
    I think the ideal solution would be if it would be easy to enable/disable specific flag words (or specify your own list). Then the default value wouldn’t matter as much, and I’d probably argue that the same list that Core uses should also be default for contribt. However, as it currently is, making this the default would make it rather difficult for modules that do want to use the word (especially if it’s in lots of places in their code) but also want to pass cspell checks.

    Is there any link where I can read your discussion on the topic, why you chose to remove “please” from the list? (In any case, it’s clear why it’s just “please”, all the others are just wrong/undesired spellings of words that are still allowed. “Please” is the only one Core just doesn’t want used at all.)

  • 🇬🇧United Kingdom jonathan1055

    Thanks for your response, really detailed, thoughtful and helpful. Regarding the discussion, I've checked the spelling issues, searched for 'flagwords', in the issues and on Slack, and cannot find it. But it was not a long in-depth discussion at all, it was just @fjgarlin and me, and we started with the full set that Drupal core has e-mail, grey, hte, ist, please, queuing and removed the two three-letter words (obviously Core had lots of misspelling of 'the' and 'its' at some point in the past). @fjgarlin then just said 'we should remove please, as that's just a "core" drupalism thing' or something like that. There was no more discussion.

    I can see that if a maintainer wants to add "please" into the flagwords, like you do, they have to take a copy of .cspell.json and modify it, which has a down-side as you say. I think this is a very good candidate for another _CSPELL... variable, _CSPELL_FLAGWORDS to make it simple for extra words to be flagged, without having to create the config file. If that sounds good to you I can work on it.

  • 🇦🇹Austria drunken monkey Vienna, Austria

    I can see that if a maintainer wants to add "please" into the flagwords, like you do, they have to take a copy of .cspell.json and modify it, which has a down-side as you say. I think this is a very good candidate for another _CSPELL... variable, _CSPELL_FLAGWORDS to make it simple for extra words to be flagged, without having to create the config file. If that sounds good to you I can work on it.

    That would be great, yes! Thanks for the suggestion!

  • 🇬🇧United Kingdom jonathan1055

    Can you create an issue in gitlab_template queue.

  • Merge request !131Remove lorem ipsum branch → (Merged) created by drunken monkey
  • Status changed to Needs review 8 months ago
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Oh, seems the Lorem Ipsum changes are now also deployed. Created an MR to remove our custom workaround.

  • Status changed to Fixed 8 months ago
  • 🇦🇹Austria drunken monkey Vienna, Austria

    GitLab CI seems happy, so I merged and created 🐛 Fix ESLint errors Active as a follow-up for the ESLint errors.

    Thanks again for all your help on this, @jonathan1055!

  • Status changed to Needs review 8 months ago
  • 🇬🇧United Kingdom jonathan1055

    You're welcome. Thanks for helping with my testing. I will work on the flagwords issue.

  • Status changed to Fixed 8 months ago
  • 🇬🇧United Kingdom jonathan1055

    Sorry. I replied but was viewing the thread without your last commit. Hence the stale form data still had 'needs review'. That's often happening. it is a bit of a fault with the issue form submission. On submit, any hidden status change should be detected and fail validation. Putting back to 'fixed'

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024