- Issue created by @elc
- π³πΏNew Zealand quietone
There is something else going on here.
Spellchecking is run on every commit of Drupal core and core is passing cspell checks. Like contrib, core does use "varchar", in fact 1,163 times as the following shows.
(11.x)$ git grep varchar | wc -l 1163
The words 'removed' in the issues given in the issue summary were done when the dictionary was rebuilt, which is the standard practice when CSpell is updated. Thus, the word were not removed by mistake and reviewing them should not be necessary.
However, since the core spell checking is not showing an error, why is contrib? And why is this happening when
cspell trace
does not show 'varchar' in any of the dictionaries that core uses? What change around 3-4 weeks ago? Anything with the Gitlab templates?I checked CSpell and there are no issues about 'varchar' suddenly reporting errors.
- πΊπΈUnited States tr Cascadia
There is something else going on here.
Anything with the Gitlab templates?
@quietone: Please read #3494834-17: Add common words not in core dictionaries β et seq., which I added as a related issue, for some further information.
Regardless, it is affecting contrib - hundreds of thousands of users are affected.
For example, here's an issue in Webform: π Fix broken tests Active
And here's an issue in Project Browser: π Fix CI failures Active
This affects Honeypot too, but I'm not going to just add varchar to the Honeypot dictionary because that's stupid - everyone needs to use varchar, so this should be fixed in core for everyone. - πΊπΈUnited States tr Cascadia
This core change is causing test failures in contrib, failures that didn't happen before π Bump cspell to 8.16.1 Active . I call that a regression and a bug.
- π³πΏNew Zealand quietone
Can someone provide a link to the failure in contrib?
- π¬π§United Kingdom alexpott πͺπΊπ
varchar is in at least 4 of the dictionaries included by default - see
cspell trace varchar --no-color -c .cspell.json --only-found Word F Dictionary Dictionary Location varchar * cpp node_modules/@cspell/dict-cpp/dict/cpp.txt varchar * fullstack node_modules/@cspell/dict-fullstack/dict/fullstack.txt varchar * java node_modules/@cspell/dict-java/dict/java.trie varchar * sql node_modules/@cspell/dict-sql/sql.txt.gz
I wonder if the problem is different versions of cspell being used by contrib testing and core? It'd be really helpful to have a link to failing contrib job to see the versions of everything being used.
- π¬π§United Kingdom jonathan1055
The gitlab pipeline cspell is now using version 8.16.1 the same as core, when running with core 11.1
The 'cpp' dictionary does indeed contain 'varchar' but it did not look like dictionary was added. In the core .cspell.json it only shows this list
https://git.drupalcode.org/project/drupal/-/blob/11.x/core/.cspell.json?...Here's a test, where I have also added extra variant to run cspell at 11.0 (previous minor) and 10.4 (previous major)
https://git.drupalcode.org/issue/scheduler-3356800/-/pipelines/389907In Gitlab Templates we looked at adding 'cpp' but it contains 40,000 words, many of which are rubbish or gibberish. It also includes some unwanted British spellings such as 'colour' which means that we can't enfore US spelings. See #26 onward on #3494834-25: Add common words not in core dictionaries β
- π¬π§United Kingdom alexpott πͺπΊπ
@jonathan1055 see π Use all the included dictionaries with cspell Closed: won't fix - by default cspell includes a lot of dictionaries. Has contrib cspell disabled this? Core does not. Core and contrib should do the same thing.
Here are all the dictionaries core loads:
cspell trace varchar Word F Dictionary Dictionary Location varchar - [flagWords]* From Settings `flagWords` varchar - [ignoreWords]* From Settings `ignoreWords` varchar - [suggestWords]* From Settings `suggestWords` varchar - [words]* From Settings `words` varchar - aws* node_modules/@cspell/dict-aws/dict/aws.txt varchar - companies* node_modules/@cspell/dict-companies/dict/companies.txt varchar - computing-acronyms* node_modules/@cspell/dict-software-terms/dict/computing-acronyms.txt varchar * cpp node_modules/@cspell/dict-cpp/dict/cpp.txt varchar - cryptocurrencies* node_modules/@cspell/dict-cryptocurrencies/dict/cryptocurrencies.txt varchar - dictionary* misc/cspell/dictionary.txt varchar - drupal* misc/cspell/drupal-dictionary.txt varchar - en_us* node_modules/@cspell/dict-en_us/en_US.trie.gz varchar - en-us-common-misspe* node_modules/@cspell/dict-en-common-misspellings/dict-en-us.yaml varchar - filetypes* node_modules/@cspell/dict-filetypes/filetypes.txt.gz varchar - fonts* node_modules/@cspell/dict-fonts/dict/fonts.txt varchar * fullstack node_modules/@cspell/dict-fullstack/dict/fullstack.txt varchar - html* node_modules/@cspell/dict-html/dict/html.txt varchar * java node_modules/@cspell/dict-java/dict/java.trie varchar - php* node_modules/@cspell/dict-php/dict/php.txt varchar - public-licenses* node_modules/@cspell/dict-public-licenses/public-licenses.txt.gz varchar - software-term-sugge* node_modules/@cspell/dict-software-terms/cspell-corrections.yaml varchar - softwareTerms* node_modules/@cspell/dict-software-terms/dict/softwareTerms.txt varchar * sql node_modules/@cspell/dict-sql/sql.txt.gz varchar - web-services* node_modules/@cspell/dict-software-terms/dict/webServices.txt
- π³πΏNew Zealand quietone
@tr, I had read that issue a few days ago, so missed that particular comment.
@jonathan1055, thanks for the testing. Did you run the contrib tests with yarn? Are there any other differences between core and contrib testing?
@alexpott, Yes, cspell does bring in lots of dictionaries but the only ones used for core are in cspell.json,
"dictionaries": [ "dictionary", "drupal", "companies", "fonts", "html", "php", "softwareTerms" ],
I agree with @jonathan1055, it is puzzling that core is not failing on 'varchar'.
- π¬π§United Kingdom jonathan1055
The contrib cspell job in Gitlab Templates executes cspell directly via
core/node_modules/.bin/cspell -c .cspell.json --show-suggestions --show-context --no-progress
- see this bit of the job.Locally, and in gitlab templates internal code checking, we use npm, and execute
npx cspell --show-suggestions --show-context --no-progress --dot {**,.**}
with a.cspell.json
file in the top-level folder so that it finds that automatically.The intention was to match what core does, so initially we had the same seven dictionaries as in #7 above (the 2 from Drupal and 5 built-in). Then we still had lots of words reported and I realised there were more dictionaries available, so we added 'misc', 'typescript', 'node', 'css', 'bash', 'filetypes', 'npm', and 'lorem-ipsum' to help reduce the maintainers work.
Locally the trace for 'varchar' shows:
npx cspell trace varchar Word F Dictionary Dictionary Location varchar - [flagWords]* From Settings `flagWords` varchar - [ignoreWords]* From Settings `ignoreWords` varchar - [suggestWords]* From Settings `suggestWords` varchar - [words]* From Settings `words` varchar - aws* node_modules/@cspell/dict-aws/dict/aws.txt varchar - companies* node_modules/@cspell/dict-companies/dict/companies.txt varchar - computing-acronyms* node_modules/@cspell/dict-software-terms/dict/computing-acronyms.txt varchar * cpp node_modules/@cspell/dict-cpp/dict/cpp.txt varchar - cryptocurrencies* node_modules/@cspell/dict-cryptocurrencies/dict/cryptocurrencies.txt varchar - dictionary* misc/cspell/dictionary.txt varchar - drupal* misc/cspell/drupal-dictionary.txt varchar - en_us* node_modules/@cspell/dict-en_us/en_US.trie.gz varchar - en-us-common-misspe* node_modules/@cspell/dict-en-common-misspellings/dict-en-us.yaml varchar - filetypes* node_modules/@cspell/dict-filetypes/filetypes.txt.gz varchar - fonts* node_modules/@cspell/dict-fonts/dict/fonts.txt varchar * fullstack node_modules/@cspell/dict-fullstack/dict/fullstack.txt varchar - html* node_modules/@cspell/dict-html/dict/html.txt varchar * java node_modules/@cspell/dict-java/dict/java.trie varchar - php* node_modules/@cspell/dict-php/dict/php.txt varchar - public-licenses* node_modules/@cspell/dict-public-licenses/public-licenses.txt.gz varchar - software-term-sugge* node_modules/@cspell/dict-software-terms/cspell-corrections.yaml varchar - softwareTerms* node_modules/@cspell/dict-software-terms/dict/softwareTerms.txt varchar * sql node_modules/@cspell/dict-sql/sql.txt.gz varchar - web-services* node_modules/@cspell/dict-software-terms/dict/webServices.txt
The word is found in the same four dictionaries as @alexpott. The enabled dictionaries also matches @alexpott but this is not the same as the list specified in .cspell.json. Something must be overriding that config setting.
Also none of the four dictionaries which has 'varchar' has a * at the end, indicating that this word would be flagged as unknown, which is exactly what we get in the contrib testing.
I will keep on with the investigation, and we will ebentually discover the answers.
- π¬π§United Kingdom jonathan1055
I have noticed that more dictionaries appear to be used than are configured to be enabled.
npx cspell -v --no-must-find-files {composer.json,co*.php}
gives the following (with some bits removed for easier reading)
Checking: composer.json, File type: auto, Language: default Checked: composer.json, File type: json, Language: en-US ... Issues: 0 378.22ms Config file Used: .cspell.json Dictionaries Used: companies, cryptocurrencies, filetypes, public-licenses, softwareTerms, computing-acronyms, software-term-suggestions, web-services, dictionary, drupal, fonts, html, php, aws, en_us, en-us-common-misspellings, node, npm 1/2 composer.json 378.22ms Checking: core.api.php, File type: auto, Language: default Checked: core.api.php, File type: php, Language: en-US ... Issues: 0 133.24ms Config file Used: .cspell.json Dictionaries Used: companies, cryptocurrencies, filetypes, public-licenses, softwareTerms, computing-acronyms, software-term-suggestions, web-services, dictionary, drupal, fonts, html, php, aws, en_us, en-us-common-misspellings, fullstack, typescript, css, npm, html-symbol-entities 2/2 core.api.php 133.24ms CSpell: Files checked: 2, Issues found: 0 in 0 files.
The key thing is that the php file uses more dictionaries, and in particular it adds "fullstack" which is one of the four dictionaries containing 'varchar'. If this same process happens in the core gitlab spelling job then it goes partly to explaining why we get no report of 'varchar' being unknown. Next we need to find out why those extra dictionaries are enabled when they are not in the config file.
- π¬π§United Kingdom alexpott πͺπΊπ
Next we need to find out why those extra dictionaries are searched when they are not enabled in the config file.
That's the default cspell behaviour. You need to specify --no-default-configuration to not load them. It's just the way it works(tm). This has been talked about before on π Use all the included dictionaries with cspell Closed: won't fix
- π¬π§United Kingdom jonathan1055
That wasn't mentioned directly on the other issue, but thanks @alexpott for the hint about the default behavior. It certainly explains why 'varchar' is not reported by the core jobs, because that words is allowed in .php files even if the 'fullstack' or 'sql' dictionaries are not enabled in .cspell.json. It also explains why contrib jobs fail for 'varchar' because that word was used in a .install file. It seems that this file extension does not trigger cspell to automatically use the 'fullstack' dictionary.
Given that the main core 'dictionary.txt' is rebuilt when cspell is upgraded, 'varchar' cannot be added manually, and will never get re-added automatically unless it appears in a core file type that does not use the 'fullstack' dictionary. Neither can it be added to the 'drupal' dictionary, as that is also rebuilt during updates.
So one solution would be that core has a new file of words that had been in the dictionary but are no longer needed by core, but contrib are using them. This solution is unlikely to be accepted though. The other solution is that Gitlab Templates adds back these words, for all of contrib to use. That is not a perfect solution, but for the sake of the sanity of all contrib maintainers it might be the best way forward.
- π¬π§United Kingdom alexpott πͺπΊπ
We should tell cspell to treat .module and .install like PHP files. That should be part of core. In fact it is!
"overrides": [ { "filename": "**/{*.engine,*.inc,*.install,*.module,*.profile,*.theme}", "languageId": "php" } ]
We must be missing this from contrib's default cspell config!
- π¬π§United Kingdom alexpott πͺπΊπ
Moving this to the correct component.
- Merge request !317CSpell: treat *.engine,*.inc,*.install,*.module,*.profile,*.theme as PHP files β (Merged) created by alexpott
- π¬π§United Kingdom jonathan1055
Yes, that will certainly help in this specific case. The gitlab templates default was originally intended to match core, but maybe the significance of those overrides was missed (could be my fault there!).
But we may also add 'fullstack' as an enabled dictionary so that those words can appear in any file. I don't see the point in banning 'varchar' from appearing in a text file if it is a good enough word for .php and .install. There are lots of other words that have been dropped from the core dictionaries, but that's a separate issue.
- π¬π§United Kingdom alexpott πͺπΊπ
@jonathan1055 words are being dropped from core dictionaries when they don't need to be there. Either because they were a mistake or because other dictionaries have them in. I don't think we should be maintaining another special dictionary in the gitlab templates project. If we think fullstack should be on for .txt and .md then we should make it so for core too.
- π¬π§United Kingdom jonathan1055
words are being dropped from core dictionaries when they don't need to be there.
Yes, I understand that. But core is, if you like, a special case. It is just one project, so can maintain it's dictionaries specifically for it own purpose, which is fine. But in Contrib we are trying to cater for many projects, and lots of them were using valid words in the core dictionary which have been dropped, and the jobs are now failing. It is good that we've discovered the reason why the specific word 'varchar' was being reported in contrib tests, and thank you for the MR, I will test it, and I know it will work correctly, and will definitely get merged.
However, on the other issue π Add new dictionary to cater for common words not in core dictionaries Active we may still want to add the other words that have been dropped. The spell checker job needs to be a help not a hindrance to maintainers, and making projects add 'curopt' and 'endapply' to their own dictionaries is just making work for many, when gitlab templates can do it for all.
- π¬π§United Kingdom jonathan1055
@alexpott Are you OK if I push another commit to the MR? I tested it, and the updated .cspell.json is being created with the overrides, but 'varchar' was still reported. I'd like to add some verbose output to the calls.
- π¬π§United Kingdom jonathan1055
RTBC from me, but @flgarlin may also want to review this.
- First commit to issue fork.
-
fjgarlin β
committed 075e6096 on main authored by
alexpott β
Issue #3498323 by alexpott, jonathan1055, tr, quietone, elc: CSpell: *....
-
fjgarlin β
committed 075e6096 on main authored by
alexpott β
- πͺπΈSpain fjgarlin
Looks good to me as well, Thanks so much for all the back and forth, reviewing and the fix!
Merged!
Automatically closed - issue fixed for 2 weeks with no activity.