Add a dictionary for Drupal-specific words

Created on 22 December 2022, over 1 year ago
Updated 29 November 2023, 7 months ago

Problem/Motivation

In the various issues to remove misspelled words from dictionary.txt valid words are accidentally removed. This just more reviews and rerolls.

Steps to reproduce

Proposed resolution

Add a second dictionary for words that CSpell identifies as misspellings that we want to keep. These will include, at least, Drupal specific words and some technical terms.

Remaining tasks

Add change record for script changes
Commit
Update docs page, see #12

User interface changes

API changes

Data model changes

Release notes snippet

📌 Task
Status

Fixed

Version

10.2

Component
Other 

Last updated 15 minutes ago

Created by

🇳🇿New Zealand quietone New Zealand

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇳🇿New Zealand quietone New Zealand

    Only a reroll.

  • Status changed to Needs review over 1 year ago
  • 🇳🇿New Zealand quietone New Zealand
  • 🇺🇸United States smustgrave

    from your comment in the other ticket should drupalism words be moved too. Example mlid.

  • 🇳🇿New Zealand quietone New Zealand

    Yes, I did not do an exhaustive search for Drupalisms. I wanted to get this started and get feedback on the idea.

  • 🇺🇸United States smustgrave

    I do think organizing them will make it easier to manage.

    So +1 for RTBC

  • Status changed to RTBC over 1 year ago
  • 🇺🇸United States smustgrave

    Actually going to go ahead and mark it as it may be hard to get all phrases in one go. Maybe once this lands there is a follow up to move drupal like words over?

  • Status changed to Needs work about 1 year ago
  • 🇬🇧United Kingdom catch

    This is a good idea, but a few comments on the patch:

    +++ b/core/misc/cspell/drupal.txt
    @@ -0,0 +1,22 @@
    +llamma
    

    While llama's are very close to being a Drupalism, llamma is a real mis-spelling that we should fix to llama, and I think lammaids and lammasarelame should be handled by cspell-ignore maybe?

    Also not sure about fapi, mlid, nids, tids - these could potentially be expanded to Form API, menu link ID, node IDs, term IDs etc. and removed from the dictionary, but depends on the context.

    Should we call it drupal-dictionary.txt so it's easy to see that it relates to dictionary.txt?

    There are also some other changes in here - is that because we need to update dictionary.txt again after a cspell/dependency update? Might want to do that in a separate issue first to get a clean diff here.

  • Status changed to Needs review about 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    29,380 pass
  • 🇳🇿New Zealand quietone New Zealand

    I grepped the usages of the 'llama' type words, fapi, mlid, nids and tids. Most of those could be changed so I restored those to dictionary.txt. They will be handled in a spelling error issue in due course.

    This patch add as command to build the new drupal dictionary, and renames the existing command to spellcheck:make-dict.

    I have not made an interdiff because the patch is small.

  • 🇳🇿New Zealand quietone New Zealand

    If this is committed this doc page needs to be updated, https://www.drupal.org/docs/develop/development-tools/cspell

  • Status changed to RTBC about 1 year ago
  • 🇺🇸United States smustgrave

    Does renaming a script require a change record.

  • Status changed to Needs work about 1 year ago
  • 🇬🇧United Kingdom catch
    +++ b/core/misc/cspell/dictionary.txt
    @@ -80,13 +80,11 @@ autowiring
     backreferences
    -bakeware
     bangpow
     barbar
    

    Sorry bakeware's also not a Drupal-specific term.

    +++ b/core/misc/cspell/dictionary.txt
    @@ -379,13 +373,11 @@ endapply
     endcode
     endembed
    -endlink
     endmacro
     endofheader
     endofscript
     endpo
     endset
    -endtrans
     enim
    

    Also not sure about endlink and endtrans, I think those are from phpdoc?

  • Status changed to Needs review about 1 year ago
  • last update about 1 year ago
    29,505 pass
  • 🇳🇿New Zealand quietone New Zealand

    A change record make sense to me, adding tag.

    For #14. I searched and found references to @endtrans in laravel docs but I did not find any references to @endlink. However, it does seem sensible to remove the pairs of @trans and @endtrans and @link and @endlink. Either way, we can always update the dictionaries later. I think it is more important to have this in core. It will help in the other spelling error issues where people accidentally remove drupalisms.

    When making the patch I rebuilt both dictionaries.

  • Status changed to RTBC about 1 year ago
  • 🇺🇸United States smustgrave

    Started the change record from my understanding of the issue.

  • last update about 1 year ago
    29,551 pass
  • last update about 1 year ago
    29,553 pass
  • last update about 1 year ago
    29,559 pass
  • last update about 1 year ago
    29,563 pass
  • last update about 1 year ago
    29,571 pass
  • last update 12 months ago
    29,801 pass
  • last update 12 months ago
    29,801 pass
  • last update 12 months ago
    29,802 pass
  • last update 12 months ago
    29,802 pass
  • last update 12 months ago
    29,807 pass
  • last update 12 months ago
    29,811 pass
  • last update 12 months ago
    29,815 pass
  • last update 12 months ago
    Patch Failed to Apply
  • last update 12 months ago
    Patch Failed to Apply
  • last update 12 months ago
    Patch Failed to Apply
  • last update 11 months ago
    Patch Failed to Apply
  • last update 11 months ago
    Patch Failed to Apply
  • last update 11 months ago
    Patch Failed to Apply
  • last update 11 months ago
    Patch Failed to Apply
  • last update 11 months ago
    Patch Failed to Apply
  • Status changed to Needs review 11 months ago
  • last update 11 months ago
    29,946 pass
  • 🇳🇿New Zealand quietone New Zealand

    There have been recent commits that have changed the dictionary, so I am rerolling and updating the patch. It was a simple reroll.

    Since we now spellcheck:core when the dictionary changes I have added a check so that it runs when either dictionary.txt or drupal.txt is modified. I also rebuilt both dictionaries.

  • Status changed to RTBC 11 months ago
  • 🇺🇸United States smustgrave

    Reroll appears fine

    Applied #18 and ran both spellcheck scripts and got the same results.

  • last update 11 months ago
    29,946 pass
  • last update 11 months ago
    29,953 pass
  • last update 11 months ago
    29,953 pass
  • last update 11 months ago
    29,953 pass, 1 fail
  • Status changed to Needs work 11 months ago
  • 🇳🇿New Zealand quietone New Zealand

    Unrelated failure, retesting.

    1) Drupal\Tests\system\Functional\System\CronRunTest::testAutomatedCron
    Failed asserting that 1691657672 is less than 1691657672.
    
  • last update 11 months ago
    29,958 pass
  • First commit to issue fork.
  • last update 11 months ago
    CI aborted
  • @sakthi_dev opened merge request.
  • 🇮🇳India sakthi_dev

    As the patch test not running created a MR. Please review.

  • 🇳🇿New Zealand quietone New Zealand

    @sakthi_dev, perhaps you missed it but my comment stated that I was restarting the tests. And once an issue is using a patch work then it should stay with that workflow. Adding an MR here is not helping. I have canceled the tests and closed the MR.

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    I really like this idea. This makes lots of sense. Also maybe some of the things here and in our dictionary.txt should be pushed upstream into a CSpell dictionary? For example oembed could be in the SoftwareTerms dictionary.

    +++ b/core/package.json
    @@ -21,8 +21,9 @@
    +    "spellcheck:make-drupal-dict": "rm -f misc/cspell/drupal.txt && touch misc/cspell/drupal.txt && yarn -s spellcheck:core --unique --words-only | perl -Mopen=locale -pe '$_=lc$_' | LC_ALL=en_US.UTF-8 tr -d \\\\\\\\ | LC_ALL=C sort -u -o misc/cspell/drupal.txt",
    

    I don't think we can have an automatic drupal dictionary maker. When we add a word to this dictionary it should be an active choice by the community via an issue on drupal.org.

    +++ b/core/misc/cspell/dictionary.txt
    @@ -590,7 +584,6 @@ laatste
     langcode
    -langcodes
    

    Why add langcodes to drupal.txt and not langcode?

  • Status changed to Needs review 9 months ago
  • last update 9 months ago
    30,168 pass
  • 🇳🇿New Zealand quietone New Zealand

    The patch needed a reroll.

    Also, per #10 the new dictionary is now named, drupal-dictionary.txt
    #27.1. Having the community validate adding words is a good point. And that should still happen if the command is used to generate the drupal dictionary. But, we also need a way to validate the dictionary and update it when words are removed. For the later reason, I have left the command in.
    #27.2 Added langcode. I don't know why it was not included.

  • Status changed to RTBC 9 months ago
  • 🇺🇸United States smustgrave

    Reroll seems good. Going to mark but question, what will be the criteria for "drupal specific" words. Wouldn't works like lazybuilder or drupalorg qualify?

  • last update 9 months ago
    30,205 pass
  • last update 9 months ago
    30,363 pass
  • Status changed to Needs work 9 months ago
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    Done another round of thinking and I'm not that sure about what I said in #27 about this change being a good idea. The reason being once we've fixed all the misspellings that are in dictionary.txt - what is the point of drupal-dictionary.txt? And maintaining the extra command package.json all makes me pause.

    That said maybe it doesn't matter and it is just a case of taste. I'm not really sure that having this extra dictionary is going to speed up spelling fixes though. I'd guess that it will introduce a bigger chance of a bike shed about where a word belongs and for that reason I'm changing my opinion in #27.

    +++ b/core/package.json
    @@ -21,8 +21,9 @@
    -    "spellcheck:make-drupal-dict": "rm -f misc/cspell/dictionary.txt && touch misc/cspell/dictionary.txt && yarn -s spellcheck:core --unique --words-only | perl -Mopen=locale -pe '$_=lc$_' | LC_ALL=en_US.UTF-8 tr -d \\\\\\\\ | LC_ALL=C sort -u -o misc/cspell/dictionary.txt",
    -    "spellcheck:core": "cspell -c .cspell.json --root .. \"core/**/*\" \"composer/**/*\" \"composer.json\" \".gitlab-ci/*\" \".gitlab-ci.yml\"",
    +    "spellcheck:core": "cspell -c .cspell.json --no-progress --root .. \"core/**/*\" \"composer/**/*\" \"composer.json\"",
    +    "spellcheck:make-dict": "rm -f misc/cspell/dictionary.txt && touch misc/cspell/dictionary.txt && yarn -s spellcheck:core --unique --words-only | perl -Mopen=locale -pe '$_=lc$_' | LC_ALL=en_US.UTF-8 tr -d \\\\\\\\ | LC_ALL=C sort -u -o misc/cspell/dictionary.txt",
    +    "spellcheck:make-drupal-dict": "rm -f misc/cspell/drupal-dictionary.txt && touch misc/cspell/drupal-dictionary.txt && yarn -s spellcheck:core --unique --words-only | perl -Mopen=locale -pe '$_=lc$_' | LC_ALL=en_US.UTF-8 tr -d \\\\\\\\ | LC_ALL=C sort -u -o misc/cspell/drupal-dictionary.txt",
    

    The spellcheck:core command should not be changing.

  • 🇬🇧United Kingdom catch

    I think this will help to remove mis-spellings, because it'll stop people trying to remove words that we're going to keep. Maybe we could add it, then move it back to dictionary.txt when that file is empty?

  • Status changed to Needs review 9 months ago
  • last update 9 months ago
    30,360 pass
  • 🇳🇿New Zealand quietone New Zealand

    Yes, #31 is a good idea. This is definitely about making it easier to work on spelling issues. And once dictionary.txt is empty we can review the remaining words.

    And since the 'drupal' dictionary is an interim step, we can add more words that are not strictly Drupalisms. This could include a variety of technical terms and perhaps the non English ones needed for testing. Some of the technical terms may need an issue upstream but it will still help if they were moved out of dictionary.txt so we can focus on the misspellings.

  • Status changed to RTBC 9 months ago
  • 🇺🇸United States smustgrave

    Changes to spellcheck:core have been reverted.

  • last update 9 months ago
    30,371 pass
  • last update 9 months ago
    Patch Failed to Apply
  • last update 9 months ago
    Patch Failed to Apply
  • last update 9 months ago
    Patch Failed to Apply
  • last update 9 months ago
    Patch Failed to Apply
  • last update 9 months ago
    Patch Failed to Apply
  • last update 9 months ago
    Patch Failed to Apply
  • last update 9 months ago
    Patch Failed to Apply
  • last update 9 months ago
    Patch Failed to Apply
  • last update 8 months ago
    Patch Failed to Apply
  • last update 8 months ago
    Patch Failed to Apply
  • last update 8 months ago
    Patch Failed to Apply
  • last update 8 months ago
    Patch Failed to Apply
  • last update 8 months ago
    Patch Failed to Apply
  • last update 8 months ago
    Patch Failed to Apply
  • last update 8 months ago
    Patch Failed to Apply
  • last update 8 months ago
    Patch Failed to Apply
  • last update 8 months ago
    Patch Failed to Apply
  • Status changed to Needs work 8 months ago
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    The patch does not apply.

  • Status changed to Needs review 8 months ago
  • 🇳🇿New Zealand quietone New Zealand

    Simple reroll.

  • 🇬🇧United Kingdom longwave UK
    1. +++ b/core/misc/cspell/drupal-dictionary.txt
      @@ -0,0 +1,13 @@
      +jsonapi
      

      "jsonapi" is apparently in the cspell dictionary now as it has already been removed from dictionary.txt.

    2. +++ b/core/package.json
      @@ -21,8 +21,9 @@
      +    "spellcheck:make-drupal-dict": "rm -f misc/cspell/drupal-dictionary.txt && touch misc/cspell/drupal-dictionary.txt && yarn -s spellcheck:core --unique --words-only | perl -Mopen=locale -pe '$_=lc$_' | LC_ALL=en_US.UTF-8 tr -d \\\\\\\\ | LC_ALL=C sort -u -o misc/cspell/drupal-dictionary.txt",
      

      Does this actually make sense, or do we want to maintain drupal-dictionary.txt by hand?

  • Status changed to Needs work 8 months ago
  • 🇬🇧United Kingdom longwave UK
  • last update 8 months ago
    30,510 pass
  • Status changed to Needs review 8 months ago
  • 🇳🇿New Zealand quietone New Zealand

    I thought I left an explanation for this but I see I did not. A reason to keep that new command is to remove words like you found in #36.1. And we expect this list to grow so it will be easier to have some way to remove words that are added by any of cspells dictionaries. In fact, I used the new command to regenerate the new drupal dictionary to make this patch.

  • last update 8 months ago
    30,510 pass
  • Status changed to RTBC 8 months ago
  • 🇺🇸United States smustgrave

    Reroll and json removal seems good.

  • last update 8 months ago
    Patch Failed to Apply
  • Status changed to Needs work 8 months ago
  • 🇺🇸United States xjm

    This already doesn't apply again.

    Once this is rerolled, I will prioritize this over other cspell issues since I think this dictionary will accelerate cleaning up the dictionary (terms like "cacheable" and "uncacheable" that Drupal uses widely, for example, which get swept up in other scopes but are probably keepers). I think additional words should be discussed in followup issues, so the scope here is fine.

    I think this is a 100% valid choice and everything in the patch (including the dictionary and command changes) seems correct and in scope to me.

    Many Drupalisms should be made more global by converting them to correctly spelled and formatted English words, but others need to be kept. I suggest very stringent review before adding terms to the Drupal dictionary. E.g., "mlid" would perhaps be better converted to a snake_case_key in a perfect world, and we just need to weigh the tradeoffs of data model changes for things like that against keeping them. Whereas other terms will always be used.

  • Status changed to Needs review 8 months ago
  • 🇺🇸United States smustgrave

    Rerolled and reran the dictionary scripts

  • last update 8 months ago
    30,510 pass, 2 fail
  • Status changed to Needs work 8 months ago
  • The Needs Review Queue Bot tested this issue.

    While you are making the above changes, we recommend that you convert this patch to a merge request . Merge requests are preferred over patches. Be sure to hide the old patch files as well. (Converting an issue to a merge request without other contributions to the issue will not receive credit.)

  • Status changed to RTBC 8 months ago
  • 🇺🇸United States xjm

    womp womp, randoms. Also we will want to commit this before the next bot run so the patch is fine..

  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MariaDB 10.3.22
    last update 8 months ago
    30,516 pass
    • xjm committed a10e0f8d on 11.x
      Issue #3328741 by quietone, smustgrave, xjm, catch, alexpott, longwave:...
    • xjm committed f48bdfcd on 10.2.x
      Issue #3328741 by quietone, smustgrave, xjm, catch, alexpott, longwave:...
  • Status changed to Fixed 8 months ago
  • 🇺🇸United States xjm

    Committed to 11.x and 10.2.x. I also tried to backport it to 10.1.x, since it is a minimally disruptive coding standards tooling improvement, but it did not cherry-pick cleanly. Either the 10.1 codebase is sufficiently different from 11.x to cause non-mergeable changes in the dictionary, or we forgot to backport other CSpell issues.

    I don't think a different backport version is worth the effort, though, so marking fixed. Thanks!

  • Automatically closed - issue fixed for 2 weeks with no activity.

  • Status changed to Fixed 7 months ago
  • 🇺🇸United States Dave Reid Nebraska 🇺🇸

    As a project that has been re-using core's dictonary.txt, which file should we be referencing going forwards as the "good list"? Because normal Drupal spellings like itok are not present in the drupal-dictonary despite being used as a query parameter for image generation as one example.

  • 🇳🇿New Zealand quietone New Zealand

    @Dave Reid, If you think 'itok' should be in the Drupal dictionary, then can you make a sibling of this issue to discuss that?

    Thanks.

  • 🇺🇸United States Dave Reid Nebraska 🇺🇸

    I guess I'm just asking for consistency in recognizing that other projects might be re-using the dictionary file. If the idea is to slowly verify misspellings in the existing dictionary file, could those have been moved to a dictionary-unverified.txt file instead?

Production build 0.69.0 2024