Add missing words to dictionary eg varchar

Created on 9 January 2025, about 1 month ago

Problem/Motivation

Since mid December the word "varchar" has been flagged as a spelling mistake in many contrib projects. This is a reserved word used thousands of times in many projects and core. It seems to have been erroneously removed back on July 10th 2024, but only affected tests in the past 3-4 weeks after the dictionary was rebuild.

There are many other words that have been removed in this commit:
https://git.drupalcode.org/project/drupal/-/commit/09020e6b863ac9f1231da...

Original issue and linked issue
- πŸ“Œ Bump cspell to 8.16.1 Active
- πŸ“Œ Update JavaScript dependencies for Drupal 11.0-rc Fixed

Steps to reproduce

Run a test pipeline with cspell including a varchar field in the hook_schema() of a contrib module. Or run cspell on Drupal Core.

Proposed resolution

  • Review words that have been removed
  • Add words that should not have been removed back into the dictionary or drupal-dictionary

Remaining tasks

Add the following words to the appropriate dictionary

  • varchar

User interface changes

n/a

Introduced terminology

n/a

API changes

n/a

Data model changes

n/a

Release notes snippet

πŸ“Œ Task
Status

Active

Version

11.1 πŸ”₯

Component

other

Created by

πŸ‡¦πŸ‡ΊAustralia elc

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @elc
  • πŸ‡¦πŸ‡ΊAustralia elc
  • πŸ‡ΊπŸ‡ΈUnited States tr Cascadia
  • πŸ‡³πŸ‡ΏNew Zealand quietone

    There is something else going on here.

    Spellchecking is run on every commit of Drupal core and core is passing cspell checks. Like contrib, core does use "varchar", in fact 1,163 times as the following shows.

    (11.x)$ git grep varchar | wc -l
    1163
    

    The words 'removed' in the issues given in the issue summary were done when the dictionary was rebuilt, which is the standard practice when CSpell is updated. Thus, the word were not removed by mistake and reviewing them should not be necessary.

    However, since the core spell checking is not showing an error, why is contrib? And why is this happening when cspell trace does not show 'varchar' in any of the dictionaries that core uses? What change around 3-4 weeks ago? Anything with the Gitlab templates?

    I checked CSpell and there are no issues about 'varchar' suddenly reporting errors.

  • πŸ‡ΊπŸ‡ΈUnited States tr Cascadia

    There is something else going on here.

    Anything with the Gitlab templates?

    @quietone: Please read #3494834-17: Add common words not in core dictionaries β†’ et seq., which I added as a related issue, for some further information.

    Regardless, it is affecting contrib - hundreds of thousands of users are affected.

    For example, here's an issue in Webform: πŸ“Œ Fix broken tests Active
    And here's an issue in Project Browser: πŸ“Œ Fix CI failures Active
    This affects Honeypot too, but I'm not going to just add varchar to the Honeypot dictionary because that's stupid - everyone needs to use varchar, so this should be fixed in core for everyone.

  • πŸ‡ΊπŸ‡ΈUnited States tr Cascadia
  • πŸ‡ΊπŸ‡ΈUnited States tr Cascadia

    This core change is causing test failures in contrib, failures that didn't happen before πŸ“Œ Bump cspell to 8.16.1 Active . I call that a regression and a bug.

  • πŸ‡³πŸ‡ΏNew Zealand quietone

    Can someone provide a link to the failure in contrib?

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    varchar is in at least 4 of the dictionaries included by default - see

    cspell trace varchar --no-color -c .cspell.json --only-found
    Word    F Dictionary Dictionary Location
    varchar * cpp        node_modules/@cspell/dict-cpp/dict/cpp.txt
    varchar * fullstack  node_modules/@cspell/dict-fullstack/dict/fullstack.txt
    varchar * java       node_modules/@cspell/dict-java/dict/java.trie
    varchar * sql        node_modules/@cspell/dict-sql/sql.txt.gz
    

    I wonder if the problem is different versions of cspell being used by contrib testing and core? It'd be really helpful to have a link to failing contrib job to see the versions of everything being used.

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    The gitlab pipeline cspell is now using version 8.16.1 the same as core, when running with core 11.1
    The 'cpp' dictionary does indeed contain 'varchar' but it did not look like dictionary was added. In the core .cspell.json it only shows this list
    https://git.drupalcode.org/project/drupal/-/blob/11.x/core/.cspell.json?...

    Here's a test, where I have also added extra variant to run cspell at 11.0 (previous minor) and 10.4 (previous major)
    https://git.drupalcode.org/issue/scheduler-3356800/-/pipelines/389907

    In Gitlab Templates we looked at adding 'cpp' but it contains 40,000 words, many of which are rubbish or gibberish. It also includes some unwanted British spellings such as 'colour' which means that we can't enfore US spelings. See #26 onward on #3494834-25: Add common words not in core dictionaries β†’

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    @jonathan1055 see πŸ“Œ Use all the included dictionaries with cspell Closed: won't fix - by default cspell includes a lot of dictionaries. Has contrib cspell disabled this? Core does not. Core and contrib should do the same thing.

    Here are all the dictionaries core loads:

    cspell trace varchar
    Word    F Dictionary           Dictionary Location
    varchar - [flagWords]*         From Settings `flagWords`
    varchar - [ignoreWords]*       From Settings `ignoreWords`
    varchar - [suggestWords]*      From Settings `suggestWords`
    varchar - [words]*             From Settings `words`
    varchar - aws*                 node_modules/@cspell/dict-aws/dict/aws.txt
    varchar - companies*           node_modules/@cspell/dict-companies/dict/companies.txt
    varchar - computing-acronyms*  node_modules/@cspell/dict-software-terms/dict/computing-acronyms.txt
    varchar * cpp                  node_modules/@cspell/dict-cpp/dict/cpp.txt
    varchar - cryptocurrencies*    node_modules/@cspell/dict-cryptocurrencies/dict/cryptocurrencies.txt
    varchar - dictionary*          misc/cspell/dictionary.txt
    varchar - drupal*              misc/cspell/drupal-dictionary.txt
    varchar - en_us*               node_modules/@cspell/dict-en_us/en_US.trie.gz
    varchar - en-us-common-misspe* node_modules/@cspell/dict-en-common-misspellings/dict-en-us.yaml
    varchar - filetypes*           node_modules/@cspell/dict-filetypes/filetypes.txt.gz
    varchar - fonts*               node_modules/@cspell/dict-fonts/dict/fonts.txt
    varchar * fullstack            node_modules/@cspell/dict-fullstack/dict/fullstack.txt
    varchar - html*                node_modules/@cspell/dict-html/dict/html.txt
    varchar * java                 node_modules/@cspell/dict-java/dict/java.trie
    varchar - php*                 node_modules/@cspell/dict-php/dict/php.txt
    varchar - public-licenses*     node_modules/@cspell/dict-public-licenses/public-licenses.txt.gz
    varchar - software-term-sugge* node_modules/@cspell/dict-software-terms/cspell-corrections.yaml
    varchar - softwareTerms*       node_modules/@cspell/dict-software-terms/dict/softwareTerms.txt
    varchar * sql                  node_modules/@cspell/dict-sql/sql.txt.gz
    varchar - web-services*        node_modules/@cspell/dict-software-terms/dict/webServices.txt
    
  • πŸ‡³πŸ‡ΏNew Zealand quietone

    @tr, I had read that issue a few days ago, so missed that particular comment.

    @jonathan1055, thanks for the testing. Did you run the contrib tests with yarn? Are there any other differences between core and contrib testing?

    @alexpott, Yes, cspell does bring in lots of dictionaries but the only ones used for core are in cspell.json,

        "dictionaries": [
          "dictionary",
          "drupal",
          "companies",
          "fonts",
          "html",
          "php",
          "softwareTerms"
        ],

    I agree with @jonathan1055, it is puzzling that core is not failing on 'varchar'.

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    The contrib cspell job in Gitlab Templates executes cspell directly via core/node_modules/.bin/cspell -c .cspell.json --show-suggestions --show-context --no-progress - see this bit of the job.

    Locally, and in gitlab templates internal code checking, we use npm, and execute npx cspell --show-suggestions --show-context --no-progress --dot {**,.**} with a .cspell.json file in the top-level folder so that it finds that automatically.

    The intention was to match what core does, so initially we had the same seven dictionaries as in #7 above (the 2 from Drupal and 5 built-in). Then we still had lots of words reported and I realised there were more dictionaries available, so we added 'misc', 'typescript', 'node', 'css', 'bash', 'filetypes', 'npm', and 'lorem-ipsum' to help reduce the maintainers work.

    Locally the trace for 'varchar' shows:

    npx cspell trace varchar
    Word    F Dictionary           Dictionary Location
    varchar - [flagWords]*         From Settings `flagWords`
    varchar - [ignoreWords]*       From Settings `ignoreWords`
    varchar - [suggestWords]*      From Settings `suggestWords`
    varchar - [words]*             From Settings `words`
    varchar - aws*                 node_modules/@cspell/dict-aws/dict/aws.txt
    varchar - companies*           node_modules/@cspell/dict-companies/dict/companies.txt
    varchar - computing-acronyms*  node_modules/@cspell/dict-software-terms/dict/computing-acronyms.txt
    varchar * cpp                  node_modules/@cspell/dict-cpp/dict/cpp.txt
    varchar - cryptocurrencies*    node_modules/@cspell/dict-cryptocurrencies/dict/cryptocurrencies.txt
    varchar - dictionary*          misc/cspell/dictionary.txt
    varchar - drupal*              misc/cspell/drupal-dictionary.txt
    varchar - en_us*               node_modules/@cspell/dict-en_us/en_US.trie.gz
    varchar - en-us-common-misspe* node_modules/@cspell/dict-en-common-misspellings/dict-en-us.yaml
    varchar - filetypes*           node_modules/@cspell/dict-filetypes/filetypes.txt.gz
    varchar - fonts*               node_modules/@cspell/dict-fonts/dict/fonts.txt
    varchar * fullstack            node_modules/@cspell/dict-fullstack/dict/fullstack.txt
    varchar - html*                node_modules/@cspell/dict-html/dict/html.txt
    varchar * java                 node_modules/@cspell/dict-java/dict/java.trie
    varchar - php*                 node_modules/@cspell/dict-php/dict/php.txt
    varchar - public-licenses*     node_modules/@cspell/dict-public-licenses/public-licenses.txt.gz
    varchar - software-term-sugge* node_modules/@cspell/dict-software-terms/cspell-corrections.yaml
    varchar - softwareTerms*       node_modules/@cspell/dict-software-terms/dict/softwareTerms.txt
    varchar * sql                  node_modules/@cspell/dict-sql/sql.txt.gz
    varchar - web-services*        node_modules/@cspell/dict-software-terms/dict/webServices.txt

    The word is found in the same four dictionaries as @alexpott. The enabled dictionaries also matches @alexpott but this is not the same as the list specified in .cspell.json. Something must be overriding that config setting.

    Also none of the four dictionaries which has 'varchar' has a * at the end, indicating that this word would be flagged as unknown, which is exactly what we get in the contrib testing.

    I will keep on with the investigation, and we will ebentually discover the answers.

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    I have noticed that more dictionaries appear to be used than are configured to be enabled.

    npx cspell -v --no-must-find-files {composer.json,co*.php}
    

    gives the following (with some bits removed for easier reading)

    Checking: composer.json, File type: auto, Language: default
    Checked: composer.json, File type: json, Language: en-US ... Issues: 0 378.22ms
    Config file Used: .cspell.json
    Dictionaries Used: companies, cryptocurrencies, filetypes, public-licenses, softwareTerms, computing-acronyms, software-term-suggestions, web-services, dictionary, drupal, fonts, html, php, aws, en_us, en-us-common-misspellings, node, npm
    1/2 composer.json 378.22ms
    
    Checking: core.api.php, File type: auto, Language: default
    Checked: core.api.php, File type: php, Language: en-US ... Issues: 0 133.24ms
    Config file Used: .cspell.json
    Dictionaries Used: companies, cryptocurrencies, filetypes, public-licenses, softwareTerms, computing-acronyms, software-term-suggestions, web-services, dictionary, drupal, fonts, html, php, aws, en_us, en-us-common-misspellings, fullstack, typescript, css, npm, html-symbol-entities
    2/2 core.api.php 133.24ms
    CSpell: Files checked: 2, Issues found: 0 in 0 files.
    

    The key thing is that the php file uses more dictionaries, and in particular it adds "fullstack" which is one of the four dictionaries containing 'varchar'. If this same process happens in the core gitlab spelling job then it goes partly to explaining why we get no report of 'varchar' being unknown. Next we need to find out why those extra dictionaries are enabled when they are not in the config file.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Next we need to find out why those extra dictionaries are searched when they are not enabled in the config file.

    That's the default cspell behaviour. You need to specify --no-default-configuration to not load them. It's just the way it works(tm). This has been talked about before on πŸ“Œ Use all the included dictionaries with cspell Closed: won't fix

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    That wasn't mentioned directly on the other issue, but thanks @alexpott for the hint about the default behavior. It certainly explains why 'varchar' is not reported by the core jobs, because that words is allowed in .php files even if the 'fullstack' or 'sql' dictionaries are not enabled in .cspell.json. It also explains why contrib jobs fail for 'varchar' because that word was used in a .install file. It seems that this file extension does not trigger cspell to automatically use the 'fullstack' dictionary.

    Given that the main core 'dictionary.txt' is rebuilt when cspell is upgraded, 'varchar' cannot be added manually, and will never get re-added automatically unless it appears in a core file type that does not use the 'fullstack' dictionary. Neither can it be added to the 'drupal' dictionary, as that is also rebuilt during updates.

    So one solution would be that core has a new file of words that had been in the dictionary but are no longer needed by core, but contrib are using them. This solution is unlikely to be accepted though. The other solution is that Gitlab Templates adds back these words, for all of contrib to use. That is not a perfect solution, but for the sake of the sanity of all contrib maintainers it might be the best way forward.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    We should tell cspell to treat .module and .install like PHP files. That should be part of core. In fact it is!

        "overrides": [
          {
            "filename": "**/{*.engine,*.inc,*.install,*.module,*.profile,*.theme}",
            "languageId": "php"
          }
        ]
    

    We must be missing this from contrib's default cspell config!

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Moving this to the correct component.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ
  • Pipeline finished with Success
    about 1 month ago
    Total: 112s
    #391683
  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    Yes, that will certainly help in this specific case. The gitlab templates default was originally intended to match core, but maybe the significance of those overrides was missed (could be my fault there!).

    But we may also add 'fullstack' as an enabled dictionary so that those words can appear in any file. I don't see the point in banning 'varchar' from appearing in a text file if it is a good enough word for .php and .install. There are lots of other words that have been dropped from the core dictionaries, but that's a separate issue.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    @jonathan1055 words are being dropped from core dictionaries when they don't need to be there. Either because they were a mistake or because other dictionaries have them in. I don't think we should be maintaining another special dictionary in the gitlab templates project. If we think fullstack should be on for .txt and .md then we should make it so for core too.

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    words are being dropped from core dictionaries when they don't need to be there.

    Yes, I understand that. But core is, if you like, a special case. It is just one project, so can maintain it's dictionaries specifically for it own purpose, which is fine. But in Contrib we are trying to cater for many projects, and lots of them were using valid words in the core dictionary which have been dropped, and the jobs are now failing. It is good that we've discovered the reason why the specific word 'varchar' was being reported in contrib tests, and thank you for the MR, I will test it, and I know it will work correctly, and will definitely get merged.

    However, on the other issue πŸ“Œ Add new dictionary to cater for common words not in core dictionaries Active we may still want to add the other words that have been dropped. The spell checker job needs to be a help not a hindrance to maintainers, and making projects add 'curopt' and 'endapply' to their own dictionaries is just making work for many, when gitlab templates can do it for all.

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    @alexpott Are you OK if I push another commit to the MR? I tested it, and the updated .cspell.json is being created with the overrides, but 'varchar' was still reported. I'd like to add some verbose output to the calls.

  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    RTBC from me, but @flgarlin may also want to review this.

  • First commit to issue fork.
  • Pipeline finished with Skipped
    about 1 month ago
    #394452
  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    Looks good to me as well, Thanks so much for all the back and forth, reviewing and the fix!

    Merged!

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024