- Issue created by @jonathan1055
- š¬š§United Kingdom jonathan1055
Ready for testing and feedback. But I undertand if you do not think this is our job to maintain. However, it is gitlab_templates decision to keep up with the core version, so we are causing contrib projects to fail when we bump to the next version. This won't be the last time that words are dropped.
- š¬š§United Kingdom jonathan1055
The name "extra-dictionary.txt" is not very imaginative. If anyone has a better idea, let me know. It is not just for words dropped from core, so I did not want to include 'core' in the name.
- šŖšøSpain fjgarlin
Iām thinking that while we have a handful of them only we might want to keep them in the array that will also populate the artifacts, instead of as a separate file.
That way if maintainers see the cspell config file with those words they might choose to keep them or not.
If or when we have more words we might consider a separate file but if we want to do a quick-fix Iād do it in the most simple way. Thoughts?
- š¬š§United Kingdom jonathan1055
OK. I will revert the initial work, and just add the words via the array in .cspell.json.
But first, I wanted to record the tests using this extra dictionary, just to see how it worked.
Here's a test with new words, using existing gitlab templates 1.6.11 running core 11.1
https://git.drupalcode.org/issue/scheduler-3445052/-/pipelines/373557Using MR310 with new dictionary
https://git.drupalcode.org/issue/scheduler-3445052/-/pipelines/373584
The job passes and the cspell.json artifact has the new dictionary - šŖšøSpain fjgarlin
Thanks for running the tests. With the external file approach, we'd need to include it as well in the artifacts, and for example, in the second run, the cspell.json file has
"path": "./extra-dictionary.txt",
but the artifact is not in there.I think that if we see the number of words growing to more than 15-20, we can do the extra file/artifact in a follow-up, but for now it might just be easier to add the files to the array as we had "ddev" and "lando". The only reason that I'm pushing for this simpler approach is that the solution would be lower risk and we can merge it and create a new release before tomorrow.
- š¬š§United Kingdom jonathan1055
I have pushed the changes, and re-tested on the same branch:
https://git.drupalcode.org/issue/scheduler-3445052/-/pipelines/373633
The cspell.json artifact has the new words (and not the new dictionary)Ready for review.
- šŖšøSpain fjgarlin
Looks good to me. Thanks so much!
As per #7, I'm happy if a follow-up issue is created to move those array of words into a separate file, where we can improve this and test it properly, but without the pressure of getting it out quick that this issue has.
-
fjgarlin ā
committed ab7fe3eb on main authored by
jonathan1055 ā
Issue #3494834: Add common words not in core dictionaries
-
fjgarlin ā
committed ab7fe3eb on main authored by
jonathan1055 ā
- šŖšøSpain fjgarlin
Merged. I'll tag a new release and make it default-ref.
-
fjgarlin ā
committed 4df6d2cd on main
#3494834: Changelog.
-
fjgarlin ā
committed 4df6d2cd on main
- š¦š¹Austria drunken monkey Vienna, Austria
Would it be possible to also include
varchar
there? See, e.g., š Fix CI failures Active .
The two other words re-added there to the dictionary,CURLOPT
andRETURNTRANSFER
, seem less common, but Iād expectvarchar
is a biggy. - š¬š§United Kingdom jonathan1055
OK, maybe we need to look at the whole file. This is not an ideal thing to do, but as we have started on this path, it needs fuller investigation.
Using the following
FILE1=/path/to/local/drupal-10.4-dev/core/misc/cspell/drupal-dictionary.txt FILE2=/path/to/local/drupal-11.0-dev/core/misc/cspell/drupal-dictionary.txt echo $FILE1 && cat $FILE1 && wc -l < $FILE1; echo $FILE2 && cat $FILE2 && wc -l < $FILE2; echo '===\nin first, not second';cat $FILE1 $FILE2 $FILE2 | sort | uniq --unique
has 18 words in D10 and 20 in Drupal 11, none are dropped. So that's all OK.
However has 823 words in D10 but only 718 in Drupal 11. There are 3 new words but attached is the list of 108 dropped words. Now we need to decide what we do with this info.
- šŗšøUnited States tr Cascadia
I brought up the
varchar
issue in the core queue: #3493141-25: Bump cspell to 8.16.1 ā
This cspell error started showing up with the Drupal 11.1 rollout that uses cspell 8.16.1 instead of the 8.13.0 that was used for Drupal 11.0. More details in that issue ...Not sure that gitlab_templates is the right place to address this, but no-one is answering my questions in core when I ask where this should be addressed...
Re #13: I also noticed that
CURLOPT
started to generate an error, but I just threw in somecspell:ignore/cspell:disable
for the few usages of curl in my modules. In principle because core requires curl, the core cspell configuration should not be complaining when I use perfectly valid curl constants, but in practice I've always had to explicitly add ignores for things likeCURLOPT_FOLLOWLOCATION
.Regardless, I think these are all related. I don't know what changed between 11.0 and 11.1 that's causing this - I would not expect it to be a problem with gitlab_templates because I do not get the failures testing against 11.0 but I do get them testing against 11.1.
- š¦š¹Austria drunken monkey Vienna, Austria
I have to admit that I donāt really understand this whole problem, or the whole mechanism at work here. Why did Core remove those words, and why does this not cause CSpell errors in the Core pipelines?
However, from a contrib perspective it seems clear that this is a regression and at least some of those words should be added back. (Though, by the above, I have no idea whether to do this in Core or in this project.) Looking through your list (Thanks!) I see lots of words that are actual nonsense, for which it makes sense to remove them and let contrib deal with any instances of them in their code.
However, a few words Iād say are common enough (some are even on Wiktionary, I linked them below) to warrant re-adding and I think Iād be confused if any of them were suddenly reported as spelling errors:
- autowire
- autowired
- bodyless
- divs
- doxygen
- iframes
- kernighan
- overridable
- permissionless
- phpcs
- postprocess
- rethrown
- varchar
That being said, it might be the case that some of those are now just part of the inbuilt CSpell dictionary, or something like that, so they donāt need to be included anymore? I quickly tested in the Search API module and this indeed seems to be the case: only ābodylessā, ākernighanā, āpermissionlessā and āvarcharā are actually reported as errors.
I would suggest readding those four words in some way but, again, I donāt actually know where to best do that. - š¬š§United Kingdom jonathan1055
I have just spent several hours investigating this problem, trying to track the changes to core cspell verions against core releases and the dictionary changes, and I believe I have just made a huge breakthrough!
Summary
The 'varchar' string seems to have been added/moved between internal cspell dictionaries. In cspell 8.13 it must have been in a dictionary that we use but in cspell 8.16 it is now in a dictionary 'sql' that Gitlab Templates has not previously used. Adding this dictionary will solve the problem. In addition, 'curlopt' and 'endapply' are in a dictionary called 'cpp' which we do not use. Neither of these dictionaries are specified in the core .cspell.json so they were also not specified in the Gitlab Templates list of dictionaries (because the inital work was based on the core config). I have no idea why the core cspell job is not reporting these words. But it will be an easy fix here.
History
š Update JavaScript dependencies for Drupal 11.0-rc Fixed
cspell version ^8.8.1 updated to ^8.10.4
The dictionary.txt was entirely rebuilt which dropped 38 words including varchar (no words added)
only 3 files changed in the commit - dictionary.txt, core/package.json and core/yarn.lock. All three of these files are in the .cspell.json config "ignorePaths" so what changes have caused the words to be dropped? Either (1) the words are now included in one of the cspell built-in dictionaries, or (2) other core changes had been done previously without a dictioinary rebuild, or (3) a combination of both.
10 July Committed to 11.x and 11.0.x same dictionary changes on each. (no backport to 10.x)š Update JavaScript dependencies for Drupal 11.0.0 Fixed
Update JavaScript dependencies for Drupal 11.0.0
cspell 8.10.4 updated to 8.13.0. 5 words dropped
1 Aug committed to 11.x and 11.1.x (No backport to 10.x)7 Aug Gitlab Templates switched up to 11.0 for the 'current' variant.
š Update cspell to latest Active
cspell ^8.13.0 -> ^8.16.0
dictionary.txt 32 dropped words (eg autoloader and backported), 1 added 'lspeak' (but this word does not appear in any core file. We have lolspeak but not lspeak, so why was it added?)
25 Nov committed to 11.1.x and 11.x (same changes)
NOT backported to 11.0.x which remains at cspell 8.13.0
backported to 10.4 and 10.5 cspell upgrade was a bigger jump 8.0.0 -> 8.16.0.
81 words dropped (and lspeak added). varchar is still in the file for 10.x11 Dec gitlab templates (TR project honeypot) https://git.drupalcode.org/project/honeypot/-/pipelines/366142
current core 11.0.8 running cspell 8.13.0 and varchar is OKš Bump cspell to 8.16.1 Active
cspell 8.16.0 -> 8.16.1
11 Dec in 11.1.x and 11.x this change removed 'endapply', 'nightwatchjs' and 'testgroups'
Backport to 10.5.x removed nightwatchjs and testgroups but endapply was left in.
Backport to 10.4.x did not remove endapply, nightwatchjs and testgroups. varchar is still in.18 Dec Gitlab templates updated current from 11.0.9 to 11.1.0
gitlab templates https://git.drupalcode.org/project/honeypot/-/pipelines/373004
current core 11.1.0 running cspell 8.16.1 and varchar failsDetails
https://cspell.org/docs/dictionaries/ lists the dictionaries, but this is not all of them. cspell provides a
trace
parameter which allows you to find which dictionaries a word is contained in. This shows many more dictionaries, and specifically highlights the ones where 'varchar' and 'curlopt' are stotred. And for a bonus, we also get all the curl constants, so for example 'FOLLOWLOCATION' is now in a dictionary and TR will not have to add this to his projects custom words.The question still remains as to why core jobs do not fail. Maybe that list of dictionaries in core/.csepll.json is somehow augmented to use every intenal dictionary? Core does run cspell via yarn, whereas Gitlab Templates runs is via npx, but surely that can't make a difference?
I will create a MR with the extra dictionaries ready for testing.
- šŗšøUnited States tr Cascadia
Wow! Great work @jonathan1055 ! Thanks for looking into this.
- šŖšøSpain fjgarlin
@tr - you can easily modify your .gitlab-ci.yml file to point to the fork created for the MR of this issue. See this: https://project.pages.drupalcode.org/gitlab_templates/info/testing-mrs/
- šŖšøSpain fjgarlin
@jonathan1055 - thanks so much for the detailed investigation and insight. I'd be happy to add those dictionaries as soon as you say it's ready if it's going to help the rest of contrib with all these cspell issues.
- š¬š§United Kingdom jonathan1055
Hi TR,
Thanks for the offer of testing, that's great. You can see my example here
https://git.drupalcode.org/issue/scheduler-3356800/-/blob/3356800-spelli...
Also I've skipped everything except CSPELL to save resources.
https://git.drupalcode.org/issue/scheduler-3356800/-/blob/3356800-spelli...fjgarlin,
While we are making changes to the spelling config, I also would like to help to solve the deadlock on š Rename ColourDefinitionSniff to ColorDefinitionSniff Needs work and related issues, by adding the six sniff names/error codes with spelling errors that cannot be renamed (which would break BC). The spell checker is intended to help maintainers by highlighting real typos, not make our lives harder by adding extracspell:ignore
everywhere. I know it is not strictly Gitlab Templates job to make these additions, but if we don't we risk people simply turning off the cspell job altogether, or adding more files to the ignore path, neither of which is a good outcome. If you'd like a separate issue for that, let me know, but it would be easy to do it here too. - šŖšøSpain fjgarlin
Happy for those six names to be added too and for that to happen on this issue. I'd like to see how the core issue develops separately, but we can defo add the suggested fix to the contrib templates regardless.
- š¬š§United Kingdom jonathan1055
I have added those six phpcs sniff names. Are you bothered about the array being on two lines not six? It passes all phpcs checks like this.
Also, some of the previously added words (ddev, endapply and curlopt) are in the cpp (c++) dictionary so those can be removed. We could also remove 'lando' as that was a temporary addition for an earlier file that has now changed.
Ready for feedback.
- šŖšøSpain fjgarlin
The array syntax is weird, but if the CI check is happy, I'm ok with it.
The changes look good to me. I just added a comment about modifying a comment and maybe recovering a word back, just for clarity.
- š¬š§United Kingdom jonathan1055
I have put back 'ddev' and updated the comment as requested.
Unfortunately I've found one downside to including the 'cpp' dictionary. It allows the British spelling of 'colour' and also the incorrect version of 'uneccessary' with only 1 'n'. We can very easily add these into the
flagWords
list, just like we already do with 'e-mail', 'grey' etc copying what Core does. But we'd want to try to find other incorrect words that would now go unreported. - šŖšøSpain fjgarlin
I'd rather not add "cpp" and add a set of words. "cpp" seems to contain nearly 40k words (https://github.com/streetsidesoftware/cspell-dicts/blob/main/dictionarie...).
We could grab the dictionaries from D10 and D11 and add the diff, or at least see which is the diff and then decide what to do. Very similar, if not almost the same, to what you did in #17 ( https://www.drupal.org/files/issues/2025-01-06/words-in-D11.0-dropped-fr... ā ).
- š¬š§United Kingdom jonathan1055
I'm not an expert on C++ but that dictionary of 40k words contains a lot of total rubbish and you are right that we don't want to enable it. Yes, we can get the dropped words and check through, as many of them won't be needed. Shall we use the approach that I started in the first three commits on the MR310, by having our .txt dictionary files in /assets.
- šŖšøSpain fjgarlin
Yeah, now that we have more words it make sense to have our own dictionary with the words instead of a massive array. We'll need to expose it as artifact too.
- š¬š§United Kingdom jonathan1055
Thanks for linking to that issue, I've made some more progress, from #10 #3498323-10: Add missing words to dictionary eg varchar ā
I have pushed a few tempoary changes to this MR, to force a fault by adding 'varchar' in a .php and a .sh file. The verbose job log shows that extra dictionaries are being checked on the .php file and the word is not reported because it is in the 'fullstack' dictionary. However it is reported in the .sh file
https://git.drupalcode.org/issue/gitlab_templates-3494834/-/jobs/3958908...If the word was in our custom dictionary then that would be used for every file and the word would never get reported. I think that is the behavior we want.
- š¬š§United Kingdom jonathan1055
I have now added 'fullstack' to the dictionary list and the job passes.
https://git.drupalcode.org/issue/gitlab_templates-3494834/-/jobs/3960072In the failed job, the extra internal dictionaries are added by default, even if not specifically enabled in the .cspell.json file. The list for a .sh file is
Checked: scripts/set-default-tag.sh, File type: shellscript, Language: en-US ... Issues: 1 Dictionaries Used: aws, companies, cryptocurrencies, filetypes, public-licenses, softwareTerms, coding-compound-terms, computing-acronyms, software-term-suggestions, web-services, allowed-words, fonts, html, php, misc, typescript, node, css, bash, npm, lorem-ipsum, en_us, en-us-common-misspellings, shellscript
This is because of the default behaviour of how cspell is designed. It checks some extra dictionaries based on the file type.
The list of dictionaries specified in .cspell.json can be thought of as "always used these regardless of the file type, in addition to the defauts we automatically add". I think that behavior can be avoided by using
--no-default-configuration
but I have not tested that yet.The passing job had one extra dictionary in the list:
Dictionaries Used: aws, companies, cryptocurrencies, filetypes, public-licenses, softwareTerms, coding-compound-terms, computing-acronyms, software-term-suggestions, web-services, allowed-words, fonts, html, php, misc, typescript, node, css, bash, npm, lorem-ipsum, fullstack, en_us, en-us-common-misspellings, shellscript
because I added 'fullstack' into the array in .cspell.json.
My hunch is that following the core changes in 11.1 the cspell job would report a failure for 'varchar' if it was written in a .sh file or one of the other file types which does not load the 'fullstack' extra dictionary.
So I think our way forward is to add the sql and fullstack dictionaries into our list, so that they are checked regardless of the file type. Also we could add some specific curlopt constants into our extra allowed words, if that would be helpful.
'endapply' was validly dropped from the core dictionary between 11.0 and 11.1 because that word was used in .twig files in 11.0 but does not appear anywhere in the 11.1 source files.
'curlopt' was dropped from the dictionarry but Core 11.1 has introduced some
// cspell:ignore curlopt returntransfer
and //cspell:ignore curle curlopt customrequest failonerror postfields
I don't know why that is better than having those two words in the dictionary where everyoone could use them. We can add those back if we decide its a good idea. - š¬š§United Kingdom jonathan1055
Here is a test which checks three more files -
- 14 words in 10.3 dropped in 10.4
- 108 words in 10.4 dropped in 11.0
- 64 words in 11.0 dropped in 11.1
Total dropped = 186
https://git.drupalcode.org/issue/scheduler-3356800/-/pipelines/392151
You can see the different numbers of reported word in the 3 variants just for info, but it is the regular cspell (at core 11.1) which gives us the full picture. Of the 186 dropped words, some of these, for example varchar, are now in one or more of the built-in dictionaries, and only 95 are reported as unknown - in the attached.Of these 95, we clearly want to add 'curopt' and 'returntransfer' and we could add other curlopt constants listed here -
https://github.com/streetsidesoftware/cspell-dicts/blob/main/dictionarie...
I don't know which of these are commonly used. Maybe we shouldn't be adding anything that had not previosuly been in core dictionary?We now need to review this list of 95 to see if any others are common enough in Contrib to warrant us adding them in. I think the list will be small, but worth checking through carefully.
- š¬š§United Kingdom jonathan1055
I have sifted through the 95 words, looking at the obvious oddities and identifying where they hads been used in previous core releases. Where they were obviously only for core files, mostly in tests or Drupal 6 migration files that no longer exist in D11, I have removed them from the checking list. The list is now down to 44, attached.
https://git.drupalcode.org/issue/scheduler-3356800/-/jobs/3991461
I have not checked the original use of these remaining 44, but someone else is welcome to do that. I have a few suggestions that we may want to add back
- backlinks (because backlink is a valid word and suggested alternative)
- mdhash
- uploaders (because uploader is a valid word in software terms)
- vocabs (because vocab is a valid word in en-us)
I have pushed a change to add these four (but can undo it if you disagree) and this is now ready for review. Then we can get it merged and out for use by all contrib.
- šŖšøSpain fjgarlin
The changes look good to me, we are basically adding a few words which we know can be really common and the "SQL" and "fullstack" dictionaries, which also seem like good and generic enough to add to contrib.
Downstream pipelines.
RTBC from my end.
-
fjgarlin ā
committed cb0d1519 on main authored by
jonathan1055 ā
Issue #3494834 by jonathan1055, fjgarlin, tr, drunken monkey: Add common...
-
fjgarlin ā
committed cb0d1519 on main authored by
jonathan1055 ā
- š¬š§United Kingdom jonathan1055
Thanks for merging. It will be good to have this in the upcoming release that you are doing.
-
fjgarlin ā
committed e76a264b on main
#3494834 Changelog.
-
fjgarlin ā
committed e76a264b on main
- š¦š¹Austria drunken monkey Vienna, Austria
Thanks a lot, @jonathan1055, great job!
Automatically closed - issue fixed for 2 weeks with no activity.