Not quite, procedural hooks are not deprecated yet.
Oh, you are right.
But I struggle to see the use case where one would have both an oop and procedural for the same hook in the same module.
Quite simple, you have the procedural implementation which has not been migrated yet, then you add the OOP implementation which may be needed to cover a different problem.
E.g. you might have a mymodule_node_update() and then you add a \Drupal\mymodule\Hook\MyHook::nodeUpdate() to do something unrelated. You don't want to touch mymodule_node_update() because you were asked (and paid) to work on a very specific problem, and changing existing code is out of scope.
In most cases the order of these two won't matter, if they are for unrelated problems. But you could have funky side effects that depend on the order.
They said that this was for an edge case, so the CR may not be needed.
Whether this is an edge case is in the eye of the beholder.
It applies when a module implements the same hook twice, one time as a function and another time as a method, and without using the #[LegacyHook] attribute. So it would already be doing something deprecated.
Note the proposal I made in https://www.drupal.org/project/l10n_client/issues/3395488#comment-16065118 📌 Localize client authentication with d.o infrastructure Active
donquixote → changed the visibility of the branch 3472140-remove-user-field-on-uninstall to hidden.
I think a README for the submodule would be fine.
I changed my mind about this.
The existing README.txt already needs some work, e.g. to install it mentions Admin / Modules, when since D8 it is "Extend".
If we want to create a README.md or .txt in the submodule, we have to decide what goes there, and possibly change the main README.txt too.
In the spirit of smallest change possible, in this issue we can just add an UNINSTALL section to the existing README.txt.
A lot of parts of the above could be covered as separate modules:
(and some might already exist)
- Public/private key storage with endpoint to get the public key.
- Entities for authorized websites with public key.
Possibly with different "scopes", where translation contribution is only one scope. - User field for a linked account on a separate website.
The field and field type could be in different submodules to avoid the dependency trap.
Alternatively to a public/private key check, we could also have drupal.org query the contributing website each time to verify that a submitted translation is legitimate. This still requires some kind of secret in that website, e.g. a JWT key, but drupal.org would not need to store public keys. But is this really better?
I am sure similar systems exist elsewhere, would be interesting to see.
I would have an alternative proposal.
Goals
Mainly we want to prove that a translation is coming from a user who owns a specific drupal.org account.
In addition, we should (but currently don't) prove that the translation is coming from an authorized Drupal website.
(This become relevant if a user would trust a secret token to a malicious Drupal website. But it also makes the following proposal easier.)
Both of these can be regarded as "signature" problems: We send some information, and prove that it is legitimate and who sent it.
Problems with the current proposal
The main problem I see with the current proposal is that we are asking users to copy around secret tokens to different websites, some of which might not be trustworthy, or might become compromised. Then we send this literal token with each translation request.
Anything that gets its hand on one of these tokens will be able to submit translations.
New proposal
Data model
In l10n_client_contributor:
We introduce a global private + public keypair (ssh, gpg or whatever) that is unique per website.
- The key could be rotated regularly (that is, a new key generated).
- We have yet to figure out how it would be stored, e.g. with key module.
- We provide a public url that allows to download the public key from that website.
TBD: We introduce a permission to allow a user to contribute translations to drupal.org.
TBD: We introduce a checkbox per user to opt in or out of submitting translations when they are saved.
We introduce a new permission that allows to register the website in drupal.org.
In l10n_server / drupal.org:
We introduce an entity type "Website that can contribute translations".
With each such entity we store:
- Url of the website. This is to identify the website, and to download a key.
- Public key.
This can be downloaded automatically (for publicly accessible websites) or set manually (for localhost websites). - (TBD) Old public key, to ease the rotation.
- A boolean / checkbox to enable/disable.
- An entity owner field.
We introduce a special permission that allows privileged users to register such contributing websites in drupal.org.
(This operation requires a signature from that website, see below, to avoid fake entries.)
For drupal.org user accounts, we add a multiple-value field or a separate entity, where for each record, we have:
- The drupal.org user id.
(this is a given if it is a field on the user account) - A reference to one of the above contributing website entities.
- A user id on that contributing website.
TBD: Normally each drupal.org user should register only one remote account per website, but we could allow multiple for testing purposes. - A boolean / checkbox to enable/disable.
Translation process
On the contributing website:
- A user saves a translation in a website with l10n_client.
- If the user has the respective permission on that website, and has not opted out of the feature, the website will submit the translation to drupal.org.
- The request contains the translation, the user id (from the contributing website), the url of the contributing website as registered in the drupal.org entity, and a signature using the private key.
On drupal.org:
- The l10n_server module finds the entity that matches the contributing website url.
- The l10n_server module verifies the signature using the public key for that website.
- The l10n_server module finds the drupal.org user account based on the user id.
- The l10n_server module checks permissions of the drupal.org user account.
- The l10n_server module accepts the translation, and logs where it is coming from.
Adding a contributing site on drupal.org
- A privileged user logs into the contributing website.
- The user goes to a specific page with a confirm form (no input fields), saying "register this website on drupal.org for submitting translations".
- The user is redirected to a special page with a form in drupal.org. The url query contains a signature that was generated with the private key, and the url of the website.
- The user must be logged in on drupal.org, or is asked to log in now.
- The l10n_server module verifies the permission to register translation-contributing websites.
- The l10n_server module downloads the public key from the website.
- The l10n_server module verifies the signature in the url, using the public key.
- The l10n_server module looks for already existing entities with the same url.
- The l10n_server module presents a summary of what will be saved in the new entity.
- The user clicks save/submit.
- The new entity is created, with the user as the owner.
Connect your website account with your drupal.org account
- A translator logs into the contributing website.
- The translator goes to a specific page with a confirm form (no input fields), saying "link this account with my drupal.org account to submit translations".
- The translator is redirected to a url in drupal.org. The url contains a signature and the user id from the source website.
- The translator must be logged in on drupal.org, or is asked to do so now.
- The l10n_server module checks permission and the token.
- The l10n_server checks for possible duplicate entries.
- The l10n_server shows a confirm form which presents the data.
- The user clicks save.
- The new field item or entity is created, linking the website user account to the drupal.org user account.
Key rotation
A new private + public key is generated on a contributing website, e.g. from a cron job.
The website pings drupal.org.
The l10n_server module requests the new public key from the contributing website.
OR
A new private + public key is generated on a contributing website, e.g. from a cron job.
A translator submits a request.
The key does not match.
The l10n_server module requests the new public key from the contributing website.
Block/cancel an untrusted or abandoned website
(by the owner)
The owner logs in to drupal.org, and disables or deletes the entity for that website.
(if the owner is still trusted)
Somebody disables or deletes the entity for that website.
(can be the owner, or another privileged user if the owner is not available)
(... if the owner is no longer trusted)
Somebody disables or deletes the entity for that website, and removes privileges (roles) from the owner's account.
Block/cancel a specific translator
(if the drupal.org account is still trusted)
Somebody disables or removes the field item that links the website account and drupal.org account.
(if the drupal.org account is no longer trusted)
Somebody removes privileges (roles) from that drupal.org account.
Benefits
We don't ask users to copy around their secret token to possibly untrusted websites.
We don't need to store secrets per user.
We can block compromised or abandoned websites.
We have more granular reporting where a translation came from.
TBD
Non publicly accessible websites
A question came up whether to support localhost websites that are not publicly available, so drupal.org cannot download the public key.
For such websites, the public key would have to be manually set.
In that case, the url alone cannot be the only identifier, otherwise we get duplicates because plenty of sites will be called localhost or e.g. mydrupaltest.ddev.site but all will have different public/private key.
A question would be do we need this?
If the only reason is testing, it is better to have a local docker/compose/ddev setup where one test site has l10n_server and the other has l10n_client.
New issue?
We could continue this discussion in a new issue, if it would cause too much distraction here.
(in this issue) Add an entry in the README.txt about uninstalling the module.
I notice that the submodule does not have a README.md or .txt.
Should we add one, or put everything in the main module README?
Or do we have a better place for this kind of documentation?
We could also create an uninstall validator that adds additional information, but it seems overkill.
I tested again, this time I had data in the field.
Again, "drush config:import" successfully removed the field and uninstalled the module.
What does this mean for an upgrade path, when a site has this module installed and wants to uninstall it?
I just tried it, it seems really simple for a site with config/sync:
- In your local instance, delete the field (manually), uninstall the module, and export (drush cex).
- Deploy and import config.
No custom update hook was needed.
I had a look at the core comment module, which also introduces a field type 'comment', that can be added (manually) to other entities like nodes.
The field does not store the comments but rather the comment settings for that entity.
Once a comment field has been created, the comment module can no longer be uninstalled.
The fields must be manually deleted first.
The comment module has a `comment_uninstall()` function, which deletes all fields of the type 'comment'.
But it is unclear to me why this exists.
At the time we get to that hook, all the fields will be already manually removed.
The main difference between comment and l10n_client_contributor is that comment fields are created manually, but the field from l10n_client_contributor is created automatically on install. This creates an asymmetry between install and uninstall.
We could add our own l10n_client_contributor_uninstall(). But that won't really do anything.
At the same time, I notice that since commit d941cd35 from 📌 Replace xmlrpc for new localize.drupal.org Needs work , we are not using that field anymore.
So I would propose the following steps to "fix" this:
- (in this issue) Add an entry in the README.txt about uninstalling the module.
- (in a follow-up) Remove the field, or rethink it, based on how we interact with the l10n_server in the future.
donquixote → changed the visibility of the branch 3472140-uninstall-validator to hidden.
donquixote → changed the visibility of the branch 3485896-donquixote-order-at-call-time to active.
donquixote → changed the visibility of the branch 3485896-donquixote-order-at-call-time to hidden.
I created a branch 3485896-parts and MR https://git.drupalcode.org/project/drupal/-/merge_requests/11783
This has a cleaned-up incremental history, which should make it easier to review step by step.
Ideally, if I did things correctly, each commit should have passing tests. But I have not really tried this, and I don't want to spam the pipelines.
We do lose original authorship.
In the end this is meant to help with reviews, the other branch is still the main review target.
donquixote → changed the visibility of the branch 3485896-parts to hidden.
I am proposing this change to avoid relying on two synced arrays in ModuleHandler:
https://git.drupalcode.org/project/drupal/-/merge_requests/11676/diffs?c...
donquixote → changed the visibility of the branch 3485896-ImplementationList-object to hidden.
donquixote → changed the visibility of the branch 3485896-order-operations-by-specifity to hidden.
I updated the comment, remove the todo.
But to recap my point from our conversation:
We should add cache metadata for dependencies when we actually use them.
In some cases this means adding the same dependency twice in two different methods, so that if we remove one of them, the other is still there.
On the other hand, in general we should not add cache metadata for something that we think will be used elsewhere.
This does not cause much harm for a config object, but it breaks the pattern and can lead people astray, which now rely on that cache metadata being added elsewhere.
Here we are already in that scenario, where we now have to support those hypothetical people.
For this reason it seems wrong to add config cache metadata to CasRedirectData, if nothing in that object really depends on that config.
E.g. there is nothing you could update in the config that would lead to different outward behavior of CasRedirectData.
Of course this can change if an event subscriber adds some data to CasRedirectData that is dependent on that config. Also because the config is made available in the event object. Normally that event subscriber would then have to add the config cache metadata to the CasRedirectData.
CasRedirectData to implement RefinableCacheableDependencyInterface and use RefinableCacheableDependencyTrait
This actually seems like a good idea.
However, perhaps it is better to rethink the architecture for cas 4.x, maybe a lot of it becomes obsolete.
But if this can be done BC-friendly for 3.x, sure.
The change proposed here allows snake_case to coexist in a file with promoted properties, which is good!
So I support this change, in general.
What I still don't like is that right now we don't have a clear preference in core whether to use snake_case or lowerCamel, and that decision is left to MR authors. I have seen MRs that keep the old snake_case but also some that move to lowerCamel on a by-file basis.
I think it is ok to be more flexible for contrib and custom code that wants to use Drupal conventions, but I think for core we should have some kind of guideline.
My personal preference is currently snake_case, partly because of arguments of readability made in the past, which at the time I was skeptical about but which I think do have merit. But I think this should be secondary here, my main point is I don't like it to be arbitrary. The point of conventions is to eliminate meaningless choices.
The conflict with promoted constructor parameters can be resolved in different ways:
- For promoted properties, the respective parameters shall also be lowerCamel (which they have to, naturally).
- For a constructor with at least one promoted property, all parameters shall be lowerCamel.
This will allow to convert other parameters to promoted properties in the future, without breaking named argument calls. - For any constructor, the parameters shall be lowerCamel.
This will allow to convert that constructor to use property promotion in the future, without breaking named argument calls. Everything else remains snake_case. - For any constructor and any static factory, the parameters shall be lowerCamel.
This allows to copy parameter lists more easily between constructor and factory. - Every parameter of any method in a class (that has a constructor or may get one in the future, which means any class) shall be lowerCamel.
(I don't really like this one, it is just to extend consistency half way.) - Every parameter and every local variable in a class file shall be lowerCamel.
(This is what I see in some MRs)
Outside of property promotion, we could also say things like:
- Files copied from elsewhere, or files already existing in Drupal core, can preserve the naming style for parameters and local variables, as long as it is one of snake_case and lowerCamel.
- For newly added class files, the preference as expressed above applies.
All of this can be a follow-up.
I would advocate for closing this, 1-4 micro seconds isn't worth chasing. There are far larger bottlenecks.
The measurements so far don't look promising, but I would not say they are conclusive (yet).
The 1-4 ms are some kind of artificial scenario.
Also, some of this might change after the changes from
📌
Hook ordering across OOP, procedural and with extra types
Active
, when omitting the $modules could provide other benefits.
donquixote → changed the visibility of the branch 3485896-test-file-scan-order to hidden.
donquixote → changed the visibility of the branch 3485896-test-file-scan-order to hidden.
donquixote → changed the visibility of the branch 3485896-donquixote-order-at-call-time-nicxvan-changes-TBD to hidden.
donquixote → changed the visibility of the branch test-file-scan-order to hidden.
Some equation juggling which may or may not be helpful.
Note that `(int)` behaves like floor() for positive numbers and like ceil() for negative numbers, which makes it harder to reason about.
But the bug still happens if we replace (int) with floor().
getRequestTime() = getCurrentTime() - (REAL.CURRENT_TIME - REAL.REQUEST_TIME)
getCurrentTime() = floor(getCurrentMicroTime())
getCurrentMicroTime() = SPECIFIED_TIME + getMicroTimePassed()
getMicroTimePassed() = REAL.CURRENT_MICRO_TIME - TIME_STARTEDgetRequestTime() = floor(REAL.CURRENT_MICRO_TIME - TIME_STARTED)
- REAL.CURRENT_TIME
+ REAL.REQUEST_TIME
+ SPECIFIED_TIMEgetRequestTime() = floor(REAL.CURRENT_TIME + REAL.CURRENT_TIME_FRACTION - TIME_STARTED)
- REAL.CURRENT_TIME
+ REAL.REQUEST_TIME
+ SPECIFIED_TIMEgetRequestTime() = floor(REAL.CURRENT_TIME_FRACTION - TIME_STARTED)
+ REAL.REQUEST_TIME
+ SPECIFIED_TIMEgetRequestTime() = floor(REAL.CURRENT_TIME_FRACTION - TIME_STARTED_FRACTION)
- TIME_STARTED_SECONDS
+ REAL.REQUEST_TIME
+ SPECIFIED_TIME
Here we split float variables into varname_int + varname_fraction, so that we can move the integer part out of the floor() call.
This would not be possible with just (int) due to the asymmetry mentioned above.
The problem would go away if TIME_STARTED_FRACTION is zero.
donquixote → created an issue.
Some people in chat were concerned about serialization of order operations, I think mostly for security.
I am not strongly attached to serialize/unserialize, but I think the security should be ok if we use a hard-coded list of very simple classes for 'allowed_classes'.
From https://www.php.net/manual/en/function.unserialize.php:
Warning
Do not pass untrusted user input to unserialize() regardless of the options value of allowed_classes. Unserialization can result in code being loaded and executed due to object instantiation and autoloading, and a malicious user may be able to exploit this. Use a safe, standard data interchange format such as JSON (via json_decode() and json_encode()) if you need to pass serialized data to the user.If you need to unserialize externally-stored serialized data, consider using hash_hmac() for data validation. Make sure data is not modified by anyone but you.
In this case the data is not "untrusted".
But, even if it was, what exactly can go wrong if the classes we allow are as simple as here?
I would like to know.
(and don't tell me "if the class has a wakeup method...". the specific classes we are looking at don't have it.)
We could actually create a separate issue where we add a part of HookOrderTest for HMIA as baseline, to be merged _before_ this one.
This will make it more visible if or how the ordering changes.
There is no issue with:
#[LegacyModuleImplementsAlter]
#[RemoveHook]
#[ReOrderHook]
Correct, technically these work as designed.
The problem is not with the #[LegacyModuleImplementsAlter] attribute but the greater purpose behind it.
Of course technically the problem is with the new order property in #[Hook].
We want modules to be able to use the new order property of the #[Hook] attribute, while at the same time keeping HMIA around to support older Drupal versions.
This will cause problems as described above in 11.1.x.
What these modules could do is declare compatibility with 10.x and 11.2.x but not with 11.1.x.
We found one problem with #[LegacyModuleImplementsAlter], or the idea behind it.
The solution could be a polyfill in a separate package outside of Drupal core.
Collection order should not have an effect.
It matters because if a module has multiple implementations, their order can be different on different system.
We key by class and method, but we don't sort.
Also, if we would sort by class, we would get a weird position for procedural based on the alphabetic order of ProceduralCall class.
Instead, we should decide whether procedural implementations should be always before or always after oop implementations from the same module (if no other order specified).
#[LegacyModuleImplementsAlter] problem
We found one problem with #[LegacyModuleImplementsAlter], or the idea behind it.
The idea was that a contrib module can support Drupal 11.1.x and Drupal 11.2.x+ by:
- Creating a MYMODULE_module_implements_alter() with #[LegacyModuleImplementsAlter] so that it is only called in Drupal < 11.2.0.
- Adding OOP hook implementations with e.g. #[Hook('my_hook', order: Order::First)], where the "order" part would only be picked up in Drupal 11.2.0 and up.
Problem:
In 11.1.x, the hook attributes will be discovered, but then the `order: Order::First` part will blow up because:
- The enum for Order::First does not exist.
- The named parameter `order: ` does not exist.
I found that objects in a container service parameter definition are no good, it prevents dumping of the container. So I changed it to serialize them explicitly.
Now we have HookOrderTest failing in the pipeline but not in my local.
The reason seems to be that RecursiveDirectoryIterator / DirectoryIterator does not produce a predictable order.
https://stackoverflow.com/questions/29102983/order-in-filesystemiterator
And the way we look for hooks, we mix procedural and oop in the same scan operation.
This means it is not clear whether OOP hooks are first or procedural hooks are first, if from the same module.
I created a separate MR with call time ordering.
Still a bit rough, but this is what I have for now.
To explain it differently:
The sorting done in one specific group of hooks can only be used when ->alter() is called with those exact hook names.
But this is not what the MR is doing.
Instead, when you call ->alter() with hooks/types ['a', 'b', 'c'], you might get the pre-ordered list from ['b', 'c', 'd'] or from ['b', 'x'], which gives you not only the wrong order, but can also give you the wrong implementations. Even if we correct the list of implementations at call time, by adding and removing implementations, the order would still be wrong.
There is no way to fix this within the current proposal, because naturally there won't always be a pre-ordered list for the current combination.
We can check that the implementation exists before adding to the order group.
This is missing the point.
I just pushed another version of the side effect test, which is using existing implementations.
What mechanism are you proposing that does that?
The "extra types" mechanism is just fundamentally flawed.
For an implementation to ask "I want to be ordered together with X" inevitably has an effect on other implementations and other groupings that can exist at call time. This can be mitigated to some extent, but the basic flaw is always present.
The correct thing is to order at call time.
Otherwise we would have to predict every possible combination of hooks that can be passed to alter().
New branch to prove the side effect:
https://git.drupalcode.org/issue/drupal-3485896/-/compare/11.x...3485896...
This commit:
https://git.drupalcode.org/issue/drupal-3485896/-/commit/404bc461f0d3fa5...
So we add a #[ReOrderHook] that targets a non-existing implementation in a non-existing module.
This should have no effect on the order of the other implementations.
But the extraTypes messes things up.
We see the first fail, but I suspect there is more, including possibly implementations disappearing.
Should we take this out of needs review for a bit?
Done.
I was on the fence about splitting hook collector out, but honestly pass is getting huge. I wonder if we have more files, one that does collection, one with parsing and ordering and we leave the registration in hcp?
I am undecided about how exactly to split the classes.
For now I just wanted to see that we _can_ split this up, so that ModuleHandler no longer has to deal with a compiler pass directly.
A separate HookCollector feels like a more generic class to use in different places.
Currently we still have lots of stuff in that class. So indeed we might want to move some of it back into HookCollectorPass, anything that deals with the container.
But, any decision we take now about the final shape of HookCollector might become obsolete with further changes.
I created new branches with tests.
For now I created an MR only for one of them, for tests that include the 3485896-attempt-simplification and do not include my own changes.
In general I think this is direction that these tests should take:
We call a hook or an alter hook, we get a list of called implementations as from __FUNCTION__ or __METHOD__, and we compare that to the expected order.
The test has some additional magic where we want to check that `->alter(['a', 'b', 'unknown'], ...)` produces the same call list as `->alter(['a', 'b'])`.
Currently this is more like a proof of concept, we definitely don't want to merge as-is.
The tests confirm my suspicion:
An order operation with "extra" types has side effects beyond that one implementation being reordered.
We could probably craft something even more clear to confirm this.
The tests would become a lot more interesting if we fix the grouping by module problem in ModuleHandler.
I know this is currently planned as follow-up, but we might actually want to do this first.
Some thoughts about relative ordering
In hook_module_implements_alter(), relative ordering always means to order one module relative to another module.
With the new code, we can order specific implementations relative to other specific implementations, OR to all implementations of a module.
As mentioned earlier, ordering relative to a specific implementation easily breaks when that implementation gets renamed.
Ordering relative to all implementations of a module might have unintended effects if one of these as a "first" or "last" order applied.
In my older proposal, there was the possibility to order relative to a module's default odering position.
So, even if the implementations of that module get moved around with first, last, before or after ordering, we still only order relative to the original position.
So, let's say:
- We have modules A, B, C, D, in that order.
- We have hook implementations A1, B1, B2, B3, C1, D1.
- We have ordering info "B1 first" and "D1 before B*".
With the current proposal we get D1, B1, A1, B2, B3, C1.
If "before B*" is changed to "before the default position of B", we get B1, A1, D1, B2, B3, C1.
We can even make this work if none of the B* implementations remain in the default position.
Another way to approach this is to define an order of ordering operations.
E.g. we could apply all relative ordering (before/after) before all absolute ordering (first/last). But this only gets us so far.
The third MR extracts a new class HookCollector from HookCollectorPass which now does most of the logic and is mostly independent from the container.
https://git.drupalcode.org/project/drupal/-/merge_requests/11415/diffs
If you want a cleaner diff, look at v2 instead:
https://git.drupalcode.org/project/drupal/-/merge_requests/11344/diffs
I would still like to do more changes:
- I think we should use $snake_case for all new local variables and non-promoted parameters. But for now I want to reduce disruption. We can do this once we agree.
- I would like to rename more variables, but again want to reduce disruption.
- More tests.
- Order and alter hook implementations at call time, especially for multi hook alter calls.
Let's keep this as follow-up, with actual measurements.
As said, I suspect the main cost is loading the file. After that, I suspect we can call "new ReflectionClass" and "$reflector->getAttributes()" repeatedly without much extra cost.
Atm I think we should drop everything that smells like additional complexity.
In this case, dropping that symfony reflection cache allows to do most of the calculations without a container.
I suspect that the main cost of reflection is loading and parsing the class file. And of course scanning the filesystem.
After that, I suspect we can call "new ReflectionClass" and "$reflector->getAttributes()" repeatedly without much extra cost.
So we should be really careful about the cost/benefit.
I tried to see what kind of caching symfony does.
I see a runtime cache (in a variable), I don't see anything persistent.
Besides, that symfony code looks quite scary to me, like it does a lot of stuff we don't need.
Maybe the benefit depends on how many times we want attributes from the same class for different purposes.
Regarding a parser instead of php:
In the past I created https://github.com/donquixote/quick-attributes-parser/-
It is definitely faster than nikic php-parser, at least it was at the time.
But I still would not recommend this for performance gains. I only did this because I wanted to support attributes in older php versions.
I refactored this based on a discussion with @GhostOfDrupalPast to use the container reflections that Symfony caches, this should be much faster.
What do we actually gain from this?
Worst case is that a ReflectionClass is created again, after it was already created in symfony.
I suspect the main cost of "new ReflectionClass" is just to load and parse the class file. Once the class is known to php, it will be very fast.
See also https://gist.github.com/mindplay-dk/3359812
symfony ContainerBuilder::getReflectionClass() actually does some crazy magic that we don't need, so it might actually be slower than just creating the reflection class again.
It definitely increases complexity and mysteriousness.
I would like to remove it to reduce the places that get passed a container.
Hi
Just a quick clarification.
If I read it right you can remove Hook priority entirely right? I'd have to take some more time to look at it, but it looks interesting.
Yes, we can remove HookPriority class, I just did not get to it yet.
I don't think you made any substantial changes beyond what you changed in HCP
The main change here was to change the algorithm within HookCollectorPass.
I wanted less interaction with the container, and only deal with priorities once everything is finished.
Once this is complete, we can run the main part of the sorting logic in a clean unit test without too much mocking.
I did try to keep the behavior of the existing MR.
This was successful as far as existing tests, _but_ as mentioned, it could be that additional tests will reveal functional changes.
In this version, we cannot remove the $order->extraHooks / ->extraTypes.
In a future version (which I think we can still do in this issue), we can try sorting at call time, which would allow us to get rid of $order->extraTypes.
(I say in this issue, because I much prefer not add something than to add and then remove.)
Once HMIA is gone this edge case is gone.
We can cross that bridge when we come to it, if we ever do.
Realistically we won't need to. This only takes affect if they are form alters.
To me these are not good answers that justify to "resolve" a review comment.
We are designing a new system, and we want to support BC.
All of this has to work reliably and predictably, otherwise it should not be added.
Form alters are not an edge case. HMIA is not an edge case, it has to be 100% supported until we remove it in 12.x.
Also, the time spent on an issue does not determine the mergeability.
The next steps I see for me:
- Add tests to cover more combinations.
- Further cleanup and refactoring of the code in this issue.
https://git.drupalcode.org/project/drupal/-/merge_requests/11344/diffs
The current version of this (commit af51ab5bb1ef81e35658ffa2dd1514d7757669c2) shows how we can calculate everything first and _then_ write to the container, in HookCollectorPass.
(It also contains some unrelated changes.)
It passes existing tests.
However, it is very possible that we just don't have enough coverage.
In that case, we should improve the tests.
To improve test coverage for different combinations, we can create a hook with a bunch of implementations with different ordering info, where each does "return __METHOD__;", or sets `$arg[] = __METHOD__` for an alter hook.
Then, instead of asserting something in $GLOBALS, we can assert the return value which gives us a list of all the methods that were called, and their order.
There are still more possible simplifications we can do.
donquixote → changed the visibility of the branch 3485896-donquixote-suggested-changes-part2 to active.
$snake_case vs $camelCase
The 11.x version of HookCollectorPass and other classes has a mix of snake and camel case in local variables and parameters, with snake case being the majority.
The MR introduces new variables that are mostly lower camel case, although in HookPriority I see new snake case variables.
I have a personal preference which has shifted over time, currently leaning towards snake case.
But this should not be relevant here.
The coding standards regarding this have been relaxed. I even saw discussions to soften the rule of not mixing within a file. But this does not mean it should be arbitrary in Drupal core.
But I also agree with @nicxvan to avoid renaming until there is broader consensus. Don't rename things just for me.
I can also all but guarantee that anybody trying to use this will probably not need it and it will be wrong.
We can't build this feature for what might be or how people might use it, it's far, far too much of an edge case. If someone opens a bug against it I will happily chase it down and try to find a solution, but we can't make guesses in this area, if it were not for HMIA supporting something like it, we would not have built it in the first place. As it stands we do need it unfortunately.
The bar for this really should be does it work in core, can we address contrib if they need it? The truth is if there is a contrib module doing some ordering on extra types this doesn't work for they can keep their procedural implementation and we can work on solving their use case before Drupal 12.
We are designing a public API here, and anything we create will be very hard to remove later.
We can fix things before a release, but in general we should only commit what we are ready to release.
From what I remember, ckeditor5 would work just fine with "Order::Last" instead of "Order::After".
The main reason why we would try to support this case in core instead of replacing it with "Order::After" is that there are likely more such cases in contrib, and we keep this one as a canary.
(Or maybe we are concerned about BC impact of moving this hook priority? I don't think so.)
What we don't want is add a bunch of complex code and complex extension points, and then if we are worried it might not work we say nobody will use it anyway. If we only care about one specific case in core, we better to some kind of one-off hack.
This makes me wonder: Order::First and Order::Last do not allow to specify additional hook names.
How will first and last work if alter is called with multiple hooks?
To be fair my core contribution experience is more limited than yours
To be clear I don't have any authority and have been a bit out of the loop in the last years.
I mostly express what makes sense to me.
I welcome any opinions from others.
Another higher-level recap.
Basic concepts
(these are just observations to verify my understanding.)
The new attributes and attribute properties allow to specify ordering information for hook implementations, or to entirely remove specific implementations.
An implementation can be first, last, or before/after specific other implementations.
For relative ordering and for removal, other implementations are addressed by hook name, class name and method name.
(Consequence: When the class or method for a hook implementation get renamed, the ordering attributes that target the old name no longer work.)
Alternatively, a relative ordering can address module names, which will target all implementations by that module.
Ordering rules from attributes are applied _after_ the implementations have been reordered with hook_module_implements_alter().
Ordering rules from attributes are applied one by one, in the order in which they were discovered, but with the order on regular Hook attributes being applied first, then the order from ReOrderHook attributes.
An ordering rule applied later will overwrite / win over a conflicting order by a previous rule.
A relative ordering rule can have "extra hooks" specified, which means that this ordering rule should be applied for ->alter() calls with multiple hooks. (more on that below)
Conflict resolution
The resolution of conflicts is quite simple:
Any "later" ordering rule will overwrite any "earlier" ordering rule.
If we want to control this further in the future, we should provide ways to prioritize the ordering rules.
(This could bring us to a similar situation as explicit priorities, but we can discuss that in the future)
Overall this basic concept is fine!
It is different from what I proposed in my older MR, but it is a very valid way to order hook implementations.
Extra hooks logic
Tbh I find it hard to follow the "extra hooks" logic.
I mean I think I understand the intention, but then I have trouble to follow what exactly is happening in the code.
I also suspect that developers who want to apply order attributes may not fully understand the consequences of adding or not adding hooks in the order extra types.
At the same time I had the suspicion that it might not behave as we expect in many scenarios.
But it is hard to point out possible flaws when I might just not correctly understand it.
This is why my earlier comments were primarily focused on docs, naming and possible refactoring.
Naming of methods and variables
(these are generic comments, for specific comments see the review)
My comments about naming may appear nitpicky and pedantic.
In general there is no "perfect" naming, but there are some mistakes we can avoid.
Overall, I like it if the same thing has the same name throughout a subsystem, and more so within the same method, and different things to have different names.
E.g. when I see something like `foreach ($priorities as $weight)` or `foreach ($trees as $forest)`, I get confused.
Of course it is much easier to point out flaws than to propose better names.
Sometimes, the difficulty to find a good name can point to a design issue, where the variable or method should not really exist in that way.
Test coverage
Actually I have not looked at all into the tests.
But I suspect we need more, given that some possible problems I found in the code did not cause test failures.
If we could isolate the calculation some more, it might become possible to cover a lot of stuff with unit tests, which are much faster than
Follow-up issues?
To me, follow-up issues should be for additional functionality, or to fix things that we left unchanged.
I don't like to plan follow-ups to change things from a current MR.
We still have some time until the next release, so we _could_ change or rewrite parts of this ordering in follow-ups, but I would prefer if we can avoid it.
Alternative: Order at call time (cached)
I can imagine one alternative that (I think) would allow to drop the "extra hooks" or "extra types" in the order object added to an attribute.
We could simply apply the ordering rules from attributes at runtime:
At container build time:
- Calculate the order for regular single hook.
- Store ordering rules per hook.
At runtime for a multi-hook alter call:
- Load and merge the implementations for each hook.
- Group by module, apply module order, call HMIA with the primary hook (as was done in older versions of Drupal).
- Undo the grouping by module, so we have a list of individual implementations, by identifier.
- Apply the ordering rules for each of the hooks from the ->alter() call.
- Cache the result.
A tricky edge case here would be if the same class + method implements multiple hooks that are invoked together.
So, if something calls `->alter(['A', 'B'], ..)`, and we have `#[Hook('A_alter'), Hook('B_alter')]` on one method, possibly with different ordering information, then the targeting with class + method would become ambiguous.
To avoid this ambiguity there could be an additional parameter in the Order object, to determine whether we order relative to A_alter or B_alter.
But, I think it would be much better to just disallow that, or to order relative to both.
This alternative will lead to a situation where we order "combined" hooks at call time, but regular hooks at container build time.
We _could_ instead just do all the ordering at call time, which would probably reduce code duplication.
Note that in the current MR we already do ordering at call time for a multi-hook, if we do not find a pre-ordered combined hook.
donquixote → changed the visibility of the branch 3485896-donquixote-suggested-changes-part2 to hidden.
donquixote → changed the visibility of the branch 3485896-donquixote-suggested-changes-part1 to hidden.
donquixote → changed the visibility of the branch 3485896-donquixote-suggested-changes to hidden.
I encountered this myself while working on another core issue.
To me, clearing out all empty arrays after their children have been cleaned up would be the preferable behavior.
Unlike the current behavior, this would be idempotent, which is generally nice to have.
However, this would be a change of existing behavior that we could consider a BC break.
An alternative would be to introduce a separate method which has the desired behavior.
Like NestedArray::filterRecursive() or NestedArray::filterCompletely().
The solution proposed in the issue summary is correct either way: We filter the parent array _after_ filtering the children.
Thank you so much for your review and this comment, you've obviously put a lot of thought into this topic over the years and this issue specifically.
I regret the inconsistency in my level of participation and commitment.
At a high level, I don't see anything here that should block this issue, once this is in there is some opportunity for clean up / expansion, but this is already far broader than a typical issue due to the nature of HMIA. I truly think for this issue the thought should be
In the MR for this issue we are introducing the different Order classes, including ComplexOrder with the ->extraTypes, which would be obsolete if we use a different model / architecture for the ordering.
(I need to do a more detailed analysis of these)
We should avoid adding features and extension points that we later remove.
We don't need to do everything in one issue, but we should know in which direction we are moving.
The initial OOP hook issue did use explicit priorities and that blew up which is why it was pulled out, this issue was specifically borne out of a BC way to allow ordering relatively.
Can you remind me what is the "initial OOP hook issue" and how it failed? Maybe I missed it.
(I assume we are not talking about my older issue here, which mostly suffered from being overly ambitious. I don't recall that anything "blew up".)
This said, I can imagine how this could have failed, depending how it is done.
Yes, hooks are different so the divergence is expected
My point is the more we diverge and the less we reuse from symfony events, the more we have to ask what we gain from using the symfony system, or if it would have been better to do something custom.
Maybe it is too late now.
multiple implementations are new, going to be rare, and ordering them as a separate entity will be even rarer.
The MR provides a way to order specific implementations relative to each other.
If we assume all implementations of one module to have the same ordering position, then all we would need is to order relative to a module, not relative to an implementation.
To me this MR is incomplete if we don't properly support multiple implementations per module. It could even reveal shortcomings in the approach.
Some higher level thoughts.
Divergence from symfony event system
We are using the symfony event system, but then we are reinventing the way that these events are declared, how we specify ordering, and how we invoke them.
Explicit vs calculated numeric priorities
Atm, if a module uses explicit event subscriber priorities, I suspect it does not really work so well in combination with the dynamic/calculated priorities produced by the hook ordering system proposed here.
If we want to stick closer to the event system, then the default way to specify ordering would just be to specify explicit priorities. There are reasons why we might want to avoid this, because modules will come up with random numbers like -999, -5 or -1000000. But this also applies to events that we already have in Drupal, and where we are ok with just priorities.
We could also consider to order event subscribers without setting a calculated priority.
If we have 10 subscribers with prio = 0, we can just order that array and set it, while the priority remains 0 for each.
We could either do this later in the process, when the tagged services have already been evaluated and applied as a service parameter, or we rely on the order of tagged service definitions in the container. Not sure if this works.
Relative ordering
One thing the symfony event system does not cover is relative ordering.
I remember older discussions where it was said the relative ordering is not really needed.
We only had one such case in core (ckeditor5) and here it was not even really required. But perhaps contrib has such cases.
In the past, with no more than one implementation per module (for regular hooks), relative ordering just meant ordering of module names relative to each other.
Even with advanced multi-hook alter hooks, where you could already have multiple implementations per module, we would group them by module for ordering, to have something in the format we can pass to hook_module_implements_alter().
Now we want to order implementations relative to each other, which is more than what we would need if we only want to preserve the old/existing ordering functionality.
In my older MR, we would order implementations relative to module names, which completely eliminates the possibility of circular conflict.
Module name per implementation
Also we need to know the module name for each implementation, which symfony does not really help us with.
We have to store the impl -> module map in a separate place, outside the actual event subscribers list.
Order lost when grouping by module
Currently in ModuleHandler::getHookListeners(), we are grouping implementations by module:
$this->invokeMap[$hook][$module][] = $callable;
This means that previous ordering by implementation is destroyed, if we have multiple implementations for one module.
A solution here could be to let ::getHookListeners() return an iterator, which would allow to repeat the same key.
yield $module_A => $module_A_early_implementations;
yield $module_B => $module_B_implementations;
yield $module_A => $module_A_late_implementations;
Of course the `$this->invokeMap` would need to have a different structure in that case. Perhaps it could contain iterator closures, or IteratorAggregate objects.
Ordering conflicts
With the system proposed here, there is always the potential for ordering conflicts.
E.g. A wants to run before B, but B also wants to run before A.
This type of conflict already existed (in theory) with hook_module_implements_alter(), and it almost never was a problem in practice, so I don't see it as a blocker.
And then we can have different implementations that want to run first, in that case the original module order decides.
Again, we already know this edge case problem from hook_module_implements_alter(). But explicit priorities or weights would avoid this fully.
Alternative
I want to recap the ordering logic of my older PR, from what I remember.
This was done without the event system, but in fact it would be very possible to integrate it with the event system, similar to what we do now, but without calculated priorities.
Basically, we have different levels of ordering, similar to semantic versions:
- Explicit numeric priority or weight. This would beat everything else.
- Altered order of modules. This would start with the default module order by module weight, and can be altered with hook_module_implements_alter(). This could either be applied for each weight/prio group, or only for priority = 0.
- Relative ordering relative to a module name. Instead of asking to run before or after a specific implementation, ask to run before or after a specific module's regular implementations, if the priority/weight is the same.
So it would look something like this:
// $impl_by_module order has been altered by hook_module_implements_alter().
foreach ($imp_by_prio as $prio => $impl_by_module) {
foreach ($impl_by_module as $module => $module_implementations) {
foreach ([...$before[$prio][$module], $module_implementations, ...$after[$prio][$module]] as $impl) {
yield $impl->module => $impl->callback;
}
}
}
The `$module_implementations` would include procedural _and_ OOP implementations, and a derivative of it is passed to hook_module_implements_alter().
This allowed to convert a procedural implementation with no impact on the ordering.
(The MR looked more complex due to micro optimization.)
This would also completely solve "combined" alter hooks, because we can just combine implementations with same prio and module.
I think the "artifacts" message is a distraction, main thing is to fix the test.
Possible follow-up: Completely remove PHP 8.1 support.
In the pipeline this is already the case (I think), but local tests could still run with 8.1, and websites could run on 8.1.
This is why I don't want to do it here, to keep the impact minimal.
See this for different versions of the message by php version:
https://3v4l.org/N7Mc4
For the real test fail:
DeprecationHelper::backwardsCompatibleCall(
\Drupal::VERSION,
'10.4.2',
fn() => '"foobarbaz" is not a valid backing value for enum Drupal\cas\Model\CasProtocolVersion',
fn() => '"foobarbaz" is not a valid backing value for enum "Drupal\cas\Model\CasProtocolVersion"',
),
Seems like we should distinguish by PHP version, not Drupal version.
donquixote → created an issue.
donquixote → created an issue.
Also wondering why login and logout are on the same service - they have quite different dependencies, e.g. loggout out doesn't need the entity storage.
I think I would prefer to see two distinct services.
We can add tests if we are happy with the direction.
The LogoutController->getServerLogoutUrl() should perhaps move to a service.
But this would add BC risks to this MR.
donquixote → created an issue.
donquixote → created an issue.
Let's see if anything wrong with this direction, then I can add tests.
Related but not same: #3196381: Forced login missing cache metadata → .
donquixote → created an issue.
A lot of standard entities, such as nodes and taxonomy terms, return string IDs.
These are stringified integers like '123' in php, a result of how we fetch the values from the database, even though in the database they are stored as proper integers. All of these would benefit from proper casting, at the cost of some possible BC hickups for code that relies on the stringified values.
I was talking about whether we have content entity types that allow non-numeric string ids like 'xyz'.
I guess contrib could create such types, in theory?
Btw for `->getOwnerId()`, it will return real 123 when the value was explicitly set earlier, but string '123' when it was loaded from the database.
For `->id()` I don't think we can explicitly set it, so it would always have the db value..
One idea to bring this to entity interfaces:
Add `@template TId of int|string` on EntityInterface, then use `@template-extends EntityInterface` or `@template-extends EntityInterface` on specific entity interfaces.
Not sure if there are content entity types that have string ids, or config entity types with int ids.
I would suggest to use some kind of feature detection instead of version detection.
This way we also support sites that apply a patch.
Not sure yet how that would work..
donquixote → created an issue.
Everything old is new again
Right, but now is as good a time as any to address it..
we might even backport to 10.x and/or 11.1.x pre OOP hooks.
Profiling PRs
I pushed two branches for profiling:
2865609-invokeAll-profile - adds profiling and does not include the change
2865609-inline-invokeAll-profile - adds profiling and includes the change
The profiling is for an invokeAll('theme') that actually never happens like this.
I only picked hook_theme() because it is the first that came to mind that has plenty implementations and can be executed without doing lots of other stuff first.
Arguably hook_theme() is a bad choice, because a lot of the time will be spent on the array merging, which we are not optimizing in this version of the MR. Still, if we can measure a difference here, then the experiment is not completely useless.
The way to run the profiling is to execute the kernel test.
The failure message reports different profiling metrics.
To compare, repeatedly switch branches and run the profiling kernel test.
Profiling results
For me, I get:
- Without the change, I get 'repeat.best' in the range of 54 to 56 microseconds.
- With the change, I get 'repeat.best' in the range of 52 to 55 microseconds.
There are outliers, but it is relatively consistent.
The difference between testHookTheme() and testHookThemeWithBogusArguments() is quite small, for me it was around 1 microsecond.
Conclusion
So, the improvement is very small, but it is reproducible.
I also see no reason to assume that it would cause slowness in other areas.
For complexity, I think the changed version is actually better, because you don't need to read two methods to see what is going on.
Some feedback.
I don't like attributes that are not related to the symbol they are attached to.
#[ReOrderHook] and #[RemoveHook] can be placed anywhere with the same effect, so perhaps there should be a different way to do this?
First version does not do anything about NestedArray::mergeDeep().
Imo we could postpone this to a separate issue, just to keep things incremental.
Replace isset($result) with NULL !== $result.Reviewing this, it seems that the benefits are negligible and can throw warnings where isset cannot.
The performance benefit would be negligible, true.
isset() treats undefined like null, it does not produce a warning if a variable or array offset does not exist.
Whenever we know for sure that a variable exists, we should use === NULL or !== NULL over isset(), it is more explicit and we learn quickly if a variable is misspelled, both from static analysis (IDE, phpstan) and from the php warning.
Actually, if we inline most of the stuff in invokeAll() we probably get some improvement.
We can do this in the other issue
📌
Optimize ModuleHandler::invokeAll()
Active
.
Perhaps the remaining benefit from separate methods per return value behavior would have diminishing returns after that other optimization.
And yes this part still applies with the new event hook system.
The most obvious first step here would be a dedicated logic for hooks with no return value and regular passing of parameters, so the first group in my previous comment.
I am not sure how much improvement this would bring, seems like a micro optimization.
If we only measure the raw overhead from the module handler, it will make a difference. If we measure total request time it will be overshadowed by other things.
Looking at ->invokeAll() calls, here are some examples for the most common types.
No return value:
hook_cache_flush()
hook_rebuild()
hook_(entity|ENTITY_TYPE)_(presave|insert|update|predelete|delete|revision_create)()
hook_entity_bundle_create()
hook_entity_bundle_delete()
Merge:
hook_updater_info()
hook_filetransfer_info()
hook_requirements()
hook_update_requirements()
hook_entity_preload()
hook_(entity|ENTITY_TYPE)_(access|create_access)()
hook_field_info_max_weight()
hook_entity_extra_field_info()
hook_entity_operation()
We could investigate if these all need the same merge logic.
I am looking at ->invokeAllWith() calls now..
For the inheritance problem:
I just found out that we can put attributes on variadic parameters :)
class B {
public function __construct(
protected readonly ServiceForB $serviceForB,
) {}
}
class C extends B {
public function __construct(
protected readonly ServiceForC $serviceForC,
#[AutowireParentArgs]
mixed ...$parent_args,
) {
parent::__construct(...$parent_args);
}
}
If we could create this attribute or find out that it already exists somewhere...
@deepali sardana
Please review the existing patches first, before you post your own.
Quick idea,
would it make sense to return a markup object with cacheable metadata, instead of a render element?
$markup = $renderer->renderRoot($element);
assert($markup instanceof MarkupInterface);
assert($markup instanceof CacheableDependencyInterface);
An alternative would be to put #[TrustedCallback] on the class itself. Which means we would have to change the #[Attribute] attribute on that attribute class.
But I think it is simpler and less "special" to just have the attribute on the __invoke() method.
@mdsohaib4242
Actually this does not do anything, because at the time you add your own '#process' callback, the '#process' array is still empty.
The would-be-replaced callbacks are added later by the form builder which is after hook_form_alter() based on $element['#type'].
@donquixote which MR? The newer one in draft?
Well, I am proposing to use the newer MR which is in draft.
Imo the old MR is a dead end, because it breaks existing behavior and makes it impossible to fully remove/replace the callbacks from #type.
But I don't want to remove the old one yet, unless everybody agrees on the new direction.
Having both means we can discuss which direction is preferable.
I notice that '#pre_render' does not allow object with __invoke() method, if it is not a closure.
See 📌 Support object with __invoke() for #pre_render. Active
(We can eventually handle #pre_render in a separate issue, but for now I find it interesting to include it here as proof-of-concept, to see the bigger picture of where this is going.)
donquixote → created an issue.