I just had another look and while I agree that part of ::getValidatedCachedRedirectChain() rehashes what we already commented elsewhere, I don't think it hurts here to be so verbose. VariationCache is becoming quite complex and the comments may help understand it.
As for the "you can" vs "See xy", I'll leave that up to committers to split hairs over :) I'm also not a native speaker but I try to do my best to write proper English in my comments, yet might fail sometimes.
if it's all the same to you, I'll leave the MR as-is now.
We initially counted all database queries in performance tests, but this caused random test failures with the chained fast backend because it uses timestamps for whether an item is valid, so millisecond rounding issues can result in an item being valid or not. For me personally I think the current split by cache bins is good - we know implicitly which bins use chained fast or not.
Right, almost forgot about how annoying we were to everyone else with our performance tests 😅 Agreed that we can leave this as is.
I'll see if I can comb over the MR one more time to find places to improve the comments. VariationCache has become a lot more complex with the addition of getMultiple() and the internal cache now, so maybe having this many comments isn't a bad thing?
Just wanted to add for posterity that this is all a slightly convoluted alternative to making the render cache a chained backend with a memory cache in front. However, that would have several drawbacks:
- We have to use MemoryBackend rather than MemoryCache or otherwise the render cache would no longer be a true cache. MemoryBackend is slow due to all the (un)serializing
- We might not see any benefit in our performance tests, or even worse performance, because we don't differentiate between DB gets and memory gets. Maybe we should?
- It would needlessly check all cache redirects for invalidation, even though they don't need to be invalidated. For now... some might if we land PseudoCacheContext from 🐛 Cache redirect error displayed when using 'view own unpublished content' permission alongside node grants Active
So all in all, given how crucial the render VariationCache is to a page's performance, I think it's a good thing we came up with a custom solution here.
That reduced the cache GETs by 3, probably fixing what you found in #35
Pushing something wild.
Rather than a simple "is the entire chain still valid", I'm swapping it out with "give me the part of the chain that is still valid". This should fix the false invalid result on chains that led to a cache HIT before, but also immediately optimize scenario's where an entire chain was deemed invalid if cache contexts changed value mid-request, even though we may have been able to reuse the start of the chain.
Oh, right. I will amend that, good catch!
But because we don't persist the cache hit in the $chain, this will never be TRUE for any lookups that were a HIT, it only works for misses.
That was by design, though, and discussed previously. We cannot persist the cache HIT as it may get invalidated and our redirect chain cache would have no clue that it did. Tests were even failing because of that because at first I did store cache hits in the cached redirect chain.
Off the top of my head: Don't grants also not apply if an access hook already gave a definitive answer?
Just noticed, fixed it.
catch → credited kristiaanvandeneynde → .
Done
It is. It's the new API that we introduced here.
The CachedStrategy itself isn't that CR-worthy (IMO) as it is merely an optimization.
Sure why not
Looks great to be honest. Comment is also very clear.
The MR you opened still has the following going for it:
1 and 3 will still lead to a VC error for anonymous people with the permission while grant implementations exist because, depending on the order, we will either try to overwrite a redirect for 'node_grants:view' with 'user.roles:authenticated' or vice versa. Anonymous people shouldn't get the permission, but still...
It also means we check the grants cache context where it doesn't apply. But that's only for one out of many configurations of status vs owner vs permissions. So I'd be able to live with that as collateral damage.
However, this is the first time we encountered this bug and we cannot know whether we might encounter it more. We may not always have the luxury of adding an unrelated cache context just to fix cache redirect collisions. So a proper solution is still needed to be safe and that is what I was trying to come up with.
I'd say the following:
- Are you displaying something? --> dependency
- Are you varying based on something? --> cache context we know (global) source of something, pseudo cache context if we don't
More details and edge cases to be added, but that would be the gist of it.
So take the status field for example:
- If it means you show the node background color in pink or not, then IMO that's not a variation and you can simply add the node as a dependency.
- If you vary access based on the status, we do not necessarily care about what will be output in that context. We do care about the status as part of a decision tree on what to return. So here it's a variation.
Just added the cache context and value object for review. Still need to implement in VC and the node access control handler as proof-of-concept.
Could this be called CacheableDependencyCacheContext - a cache context that varies by its cache dependencies?
Sure, we can name it whatever makes most sense.
But it's important to state it does not vary by the dependency, it's just that we have a dependency that causes variations which we cannot represent with a regular cache context so we need to be able to invalidate a part of the redirect chain to reflect the dependency.
Is there a forseeable situation where someone might want to make a similar cache context that also returns something?
I don't see a use case where the cache context would return any value, really. It will essentially serve as a substitute for a redirect chain link that we cannot otherwise declare. By virtue of that it does not make sense to have it return anything other than a fixed value (like 1).
The upside of the above is that the DX isn't nearly as bad.
It's easy to document PseudoCacheContext as:
"When you need to vary based on a cacheable dependency's property but you have no cache context to represent that, perhaps because you don't know where the dependency came from. In that case, add the dependency to a PseudoCacheContext and pass in the PseudoCacheContext as a cacheable dependency of your return value."
But with clearer language than what I could come up with just now.
Okay so moving on from my previous comment, I just thought of something that could make pseudo cache contexts work.
Imagine a value object PseudoCacheContext that implements CacheableDependencyTrait but:
- On any method that can add contexts it throws an exception so only cache tags and max age are allowed
- Its getters for tags and max age would return "nothing" ([] and -1)
- Its getCacheContexts() method sorts the tags and max-age (for better hit ratio) and puts that into a 'pseudo:some_sorted_string' cache context
- Said cache context would always return 1 (what @catch suggested before)
Now in VariationCache, when we detect such a context it will tag the the next chain item after the context with said metadata.
Now we could take the example of my previous comment and tell VariationCache exactly which redirect needs to be tagged. So no matter what redirect the pseudo-context gets added to, that's where we flag the next redirect for receiving the node cache tag. So from the example above, no matter if the chain looks like:
<INIT> --> AXZpseudo --> <CACHED DATA>
<INIT> --> ABpseudo --> <CACHED DATA>
<INIT> --> A --> pseudo --> XZ --> <CACHED DATA>
<INIT> --> A --> pseudo --> X --> Y --> <CACHED DATA>
<INIT> --> A --> pseudo --> X --> Z --> <CACHED DATA>
<INIT> --> A --> pseudo --> B --> <CACHED DATA>
We would always tag the one after the pseudo context with the node status.
The offending code in NACH would then become:
protected function checkViewAccess(NodeInterface $node, AccountInterface $account, CacheableMetadata $cacheability): ?AccessResultInterface {
// NO LONGER NEEDED!
// $cacheability->addCacheableDependency($node);
// We do not know where the node comes from, but we have to vary by its
// status. We therefore create a pseudo cache context that gets invalidated
// whenever the node is invalidated.
$pseudo_context = new PseudoCacheContext($node->getCacheTags(), $node->getCacheMaxAge());
$cacheability->addCacheableDependency($pseudo_context);
if ($node->isPublished()) {
return NULL;
}
$cacheability->addCacheContexts(['user.permissions']);
This feels like it could work with minimal effort on changing VariationCache. The only drawback is that the pseudo-context will in some cases introduce an extra redirect, but that would be the case anyway if we had a proper "node status cache context".
Given this some more thought and I don't see "hinting" for what to tag a redirect with working.
Say we have a scenario where we have three critical decision points:
- First one adds cache context A
- Second one adds B when published or X when unpublished and a pseudo-context (N:1) to indicate we need to tag with node 1
- Third one in the X branch can add Y or Z depending on value of X
Ideally, we want the node cache tag to be added to the redirect for B or X, but how do we know where to add it?
We could come to a cold cache with AB(N:1), AXY(N:1) or AXZ(N:1) and that would be one big fat redirect pointed to by the initial cache contexts (). So the chain would, for example, be:
<INIT> --> AB --> <CACHED DATA>
Now invalidating A whenever the node changes would be silly, but for the sake of arguing let's say we write the tag at the last redirect. If we now change the status, the whole chain is flushed, but no real damage is done. However, if we visit a second node page that triggers XY, the redirect path would be rewritten to:
<INIT> --> A --> XY --> <CACHED DATA>
At this point, tagging the last redirect would give us what we want. Only XY and the cached data get flushed when the node status changes and we keep redirect A. But if we now visit a third node that triggers XZ, we get:
<INIT> --> A --> X --> Z --> <CACHED DATA>
Now we're in a pickle again. Because we need to tag X with the node cache tag, but how do we know that? When we get to the cache we only see AXZ(N:1), we do not know which context was added in which order.
So hinting what to tag a redirect with would only work if we had a chronological order of when cache contexts were added and I still think it would be really poor DX to expect people to provide pseudo-contexts whenever the vary by something that is not part of the global context.
But that would require a complete rewrite of our cacheable metadata add and merge methods.
So moving on to another option: Source detection.
If we could tell NACH where the node that's being passed in is coming from, we could add the cache context for whatever global source we used to get the node. In this case it could be route.args:node or something. An example of this is the route.group cache context in Group, but a more generic version of it.
But that would require some sort of service where we can pass in an object and get a source in return. And if the node was actually loaded at random and passed into NACH, it would still break because we'd have no cache context to represent the status check.
So that left us with one final solution: A cache context to represent status checks. I am really not a fan of this either because it's just plain dirty. The node in question may be from a global source, but we don't know that. Falling back to loading the node without context is just asking for performance issues and poor cache hit ratio.
This was also touched on in the issue mentioned in #22.: To add insult to injury, we had a notion of "current user" vs "any user" being discussed there and that could lead to poor cacheability all around.
Now, I may have got a solution but I'll write that down in a separate comment. This whole comment was to outline why we're stuck.
Did you delete the MigrateControllerTest message by mistake?
I think that's the final one to open a child issue for, but it's probably a bit more complex to fix as I recall it involves changing JavaScript.
Oh wow, for a second there my brain processed the assert as an if-statement. Left one more suggestion, but that can be fixed upon commit.
It is not possible to do this, you can ignore placeholders, you can't remove them.
If CachedStrategy does not find any hits, it will also return an empty array. Isn't that the same as what the test strategy did? BigPipeStrategy can also return an empty array. The only difference I see is that the test did not have SingleFlushStrategy as a fallback, which ends up returning all placeholders anyway.
Might be my sickness talking but if I read ChainedPlaceholderStrategy::processPlaceholders() correctly, we reduce the set of $placeholders whenever a strategy returned some of them back. At the end of the loop we want $placeholders to be empty, no? So does that not mean that the check has to be !empty() to figure out that something was, in fact, not replaced?
ChainedPlaceholderStrategyTest is choking on the new assert, I think because of a small mistake.
Americanizzze the code comments.
😂
Dug up the ::class docs for people stumbling upon this issue: https://www.php.net/manual/en/language.oop5.basic.php#language.oop5.basi...
Found some. minor things but looks good to me otherwise
Re #10 at first it did that too for me but then when I refreshed it seemed fine. Must have had a hiccup in GitLab on my end, wouldn't set it to RTBC if it had a garbled history like it initially did.
Having said that, MR still looks great to me so RTBC +1.
Setting back to NR and opting out of bot to reduce noise.
@berdir, I don't really mind resolving merge conflicts like these, PhpStorm has a nice tool for it. So whichever issue goes in first, I'm not too fussed about having to resolve a merge conflict on the others. Can ping me to do it if you want.
but if the former returns forbidden can the latter override that decision? It looks like AccessResult::orIf would prevent that?
Returning Forbidden is what I meant by saying "subtract from". The default is neutral and then people can explicitly allow or forbid access, how is that not add to or subtract from?
If that is the case, then why would a query be any different at all?
Queries don't have the luxury of running code on them. So if one bad actor messes up the query, everyone else is stuck with that outcome. A processor in between could prevent a lot of that damage.
Providing a mechanism (a hook?) to either enable or disable an existing permission in core would fix the same problem without having to express intent at all, which is why I'm talking about the design decisions.
You are literally describing the Access Policy API. Which is one piece of this puzzle, but query access runs after permission checks.
Okay so I feel like this thread is seriously getting derailed, even if there was no intention to do so.
So I'll answer one more time, but after that I don't think we should still be discussing the design decisions we made in the past regarding how we should allow people to alter other modules' access. Even with the most recent addition to our access layer (Access Policy API) we allowed modules to both add to and subtract from passed in access logic. The goal is not to change that, so why are we discussing it?
Could you expand on this? Why would it be a ball ache and for whom? A lot of systems have access controls like this so I'm confused why it would be such a problem?
If you want a sports editor to only be able to post articles to the sports section of your news website, currently all you need to do is assign them the "editor" role in the sports section. If we take one of your suggestions, we would have to make them an editor on the entire website and then make sure they can't do anything in fashion, politics, economics, etc. which is a lot of extra work.
It would also mean we break with everything Drupal users have grown to expect. We already have an issue floating around somewhere about the massive mindfuck we have when it comes to neutral access results. Both the route access checks, entity access checks and grants system treats a neutral result differently.
Now on topic: When you build an entity query, we know that the conditions are filters because access conditions should go into the access check (= why we have the accessCheck() method). For Views it should be the same thing.
So if we build an API that takes an existing query and assumes that all conditions are filters, we could very well create a system that gathers the access logic and allows for easy alteration. Only after everything is processed do we add it all to the query. This would solve all of the problems I explained above because it would be crystal clear whether a status check is a filter or an access rule.
I'd like to pursue this route when i have a bit more time on my hands.
If we really want to allow modules to loosen permissions (which I don't think is a good idea, but fine)
But it is a good idea? Even node grants supported that for viewing unpublished nodes.
@kindutch put the concept behind this all very well. Thanks for that!
It all ties into the access policy API's concept of scopes and identifiers. One scope should not be concerned with another. It's perfectly valid to not have node edit access in the global scope, but to have said access in a specific domain or group scope. It would be an absolute ball ache to achieve per-domain editor access if it meant you first had to grant general node edit access for the entire website.
Currently it shows 2 fewer cache gets in 'render' after the getMultiple support.
Re #25, I'm fine either way we commit this. I consider this issue "done" now and the results are amazing. Also they're actual DB gets, not memory gets, so the impact is even bigger.
I'm also considering creating an issue where can differentiate between DB and memory gets.
Not sure why it's failing on QuickStartTest and RecipeQuickStartTest
Felt useless while sick on the couch, added back support for ::getMultiple(). Back to rest mode now.
I think I've got a clean solution for ::getMultiple() but it'll have to wait a few more days. Sickness got the best of me, tried to commit today but my head is not functioning and I can't guarantee that whatever I commit today is airtight.
Okay, all green :) Now for the final hurdle...
Fixed final test fail locally and the other went green locally. So if all goes green we can focus on reintroducing the static cache in getMultiple().
Pushed some fixes already, you can see the pattern is exactly like how we use refreshVariables()
Yeah I got a fix, but got something urgent to attend to.
I think NodeTest might be failing for the same reasons we need refreshVariablesTrait. One request has a MISS, the other makes it so something is cached, but the cache in the test's thread still thinks there's a MISS and returns as such.
That's looking a lot better already. I suggest we fix these test fails and then introduce the same logic in getMultiple().
Updated the code, we still get good results on performance tests. Just 3 cache gets more than the initial improvement.
I also temporarily reverted the getMultiple() changes so we can first see which tests fail on the single changes. Easier to debug that way.
I think we can do one better in getRedirectChain(). Rather than not setting a cache on a cache HIT, we can still store the path leading up to the hit and reuse that next time we construct a redirect chain. At least that way we can skip requesting all the redirects again.
Refresh status change...
LOL, @berdir beat me to it. I lost sleep last night thinking we're caching too much on cache HITS because we know redirects are permanent but the end of the chain on a hit is not.
So either we switch to a MemoryCache for the redirect chain, which would turn this whole thing into a no-op and a glorified chained backend or we just make sure we pop a cache HIT off the chain and store only the parts leading up to it.
Let's see what testbot has to say.
Spin-offs from recent discussion:
kristiaanvandeneynde → created an issue. See original summary → .
Will fail tests, but that's the idea. If it's only performance tests that fail, we have no collateral damage.
kristiaanvandeneynde → created an issue.
LGTM
Actually, the docs in PlaceholderingRenderCache seem off. Because we don't write the placeholder output until we start rendering the placeholder. At that point there should be no more GETs for said placeholder. Only during subsequent requests.
So it seems the optimization on PRC::set() does nothing. Need to look into that...
Then again from bigpipe
This bit surprises me, as we now have CachedStrategy and that one should hit warm caches by the time we get to them and as a result stop big pipe from looking up the placeholders. I'm thinking this is because you're using NullBackend for debugging and therefore CachedStrategy can't prevent big pipe from running.
But yeah, generally what happens is this:
- We try to render the block, a first cache lookup happens for the rendered output of the lazy builder
- Regardless of whether we get a cache hit or miss, the block is placeholdered either in the Renderer (miss) or PlaceholderingRenderCache (hit) because we don't want it to bubble up. So when we get to the final part of the render process, we have to do another cache lookup to see if the placeholder's actual output is in the cache. However, the PlaceholderingRenderCache has an optimization for this and keeps the placeholders it already found or set in memory so we don't repeat the work.
- During the above lookups and/or writes, VC will try to follow or construct a redirect chain, leading to a few lookups on the underlying cache backend.
1. and 2. mean that for any placeholdered block you will always have two render cache cache lookups, but only one of them ever goes to the actual underlying cache backend.
Let's assume most blocks have 1 cache redirect because the lazy builder added extra contexts, so in total we're talking about:
- Cold cache GET: 1 lookup in VC for get. 1 lookup in VC during SET. 0 lookups during replacement because of optimization in PlaceholderingRenderCache.
- Warm cache GET with 1 redirect: 2 lookups in VC for get. No set. 0 lookups during replacement because of optimization in PlaceholderingRenderCache.
- Invalidated warm cache GET with 1 redirect: 2 lookups in VC for get, but it fails to find the entry. 2 Lookups during SET as it follows the previously created redirect (which does not get flushed). 0 lookups during replacement because of optimization in PlaceholderingRenderCache.
So that's 2 lookups on both a cold and a warm cache or 4 after invalidating a warm cache. However, if the blocks weren't placeholdered, we'd still have the initial cache get regardless so we're really only talking about up to three extra cache gets for every placeholdered block. usually it will only be one extra.
In the MR we placeholder 5 blocks and we're seeing 10 extra cache gets. So that could come from the above explanation.
P.S.: While writing this I realized we can optimize the redirect chain fetching in VC by storing it in memory and flushing it whenever a set on the same cache keys is done. That would reduce the amount of cache gets for scenario 3 above.
LGTM and I agree with @daffie that we want to preserve the right to make changes as we get more metrics. Test changes also make sense.
I'm also confused why we even do a cache lookup in \Drupal\Core\Render\Renderer::doRender() for a placeholdered block? shouldn't we skip that?
The actual rendered output of something that got placeholdered can still be cached when the thing had cache keys, even if it varies by user or something else we normally don't want to bubble up to DPC. Blocks meet those criteria.
So when we encounter a placeholder in the attachments, it contains the cache keys and we take that to the Renderer to see if it can find whatever we want to replace the placeholder markup with in the placeholdering render cache.
If we didn't do this, we would always have to calculate placeholdered stuff live, even though it's possible to cache those calculations. We just don't want their cacheable metadata to "poison" the entire page, which is why we placeholder them.
Trying to come up with an answer for your debugging...
Also, the default conjunction on entity queries is AND, which screws Group over if modules start to implement entity query access without a way to show intent. Right now, a View that filters by status has the following query after it's altered by Group:
status=1
AND
(node_is_not_grouped OR (node_is_grouped AND group_grants_node_access))
So if the node isn't grouped, you will see it as long as it is published. But if it is grouped and you do not have Group access, it will not show up in the list. Which is what Group promises: To protect your private content.
Now imagine if the above query came from the fact that entities finally have generic query access and the status check was an access check. Now we run into the intent problem from above. If we were able to determine intent, we could change the query to:
(node_is_not_grouped AND status=1)
OR
(node_is_grouped AND group_grants_node_access))
But without intent, this would be impossible. Which is exactly why Group doesn't play nice with node grants right now.
I'm trying to understand why you would need to undo a condition from another module? Wouldn't that imply that you're trying to override an access restriction put in place by another module?
Group is a perfect example of this.
Say Node adds a condition for "status=1" to the query because the current user does not have any permission to view unpublished nodes. However, in one of the current user's groups they do have access to view any unpublished node. If Group were not able to undo Node's access check, how would it be able to also have unpublished nodes from the user's group show up in the results?
It doesn't seem like it's a problem that two modules could add conditions on the same field? It seems like that would only compile to a single JOIN?
I'm talking about manual joins. Access modules tend to want to join their "records" table to cross-reference.
I'm really trying to understand how this isn't exactly what I'm proposing.
There are many issues I've encountered in the past with a similar approach, but I'll give you a big one: Your suggestion lacks the expression of intent.
Assume the following query condition: bypass=1
, which comes from a module that allows you to bypass any entity access if you have a particular permission.
Now the query also specifies status=1
, but we don't know where this was from. Was it from a View that wishes to filter the results and so the query result should definitely not contain any unpublished entities even if you have access to them. Or was it from an access check because the current user does not have access to unpublished entities?
If we don't know intent, we don't know whether we should use status=1 AND bypass=1
(Views filter) or status=1 OR bypass=1
(access check).
See how this is a problem if all we have to go on is the query? And why we tried to go with a value object based solution before (because then you have intent)?
The only problem with this that I can see is that it doesn't cover Views which feels like a completely separate beast
You can convert Views just fine, I do so in Group already. So I don't really see a problem there.
Why would you need to do that? The AccessResults should get merged from the entity access handler as well as any hook implementations. Likewise the query gets modified by the hook implementations regardless of final AccessResult.
Say we allow this to live in a handler and a module has a query alter that looks up a particular flag. Now another modules comes in and wants to make it so you need both the original and a custom flag. They now need to extend said handler to make that happen, but by doing so lock out all other contrib modules from making similar changes to the handler.
You cannot easily undo what was done in a queryAccess() because the changes were made directly on the QueryInterface. Good luck digging through all of the joins and conditions finding the thing you need to undo and adjust.
This is why the original proposal in Entity API gathered value objects that were all processed at the very end. So that you could prevent the spaghettification of the query by the time you get your chance at changing it.
Similarly, if we allow anyone to alter the query anywhere we cannot know what joins were made by other handlers. So if we need to make some, we might end up joining the same table 5 times because everyone and their dog wanted to get some info from it.
It will lead to some really bad queries like this. Which is why I'd rather have a system where we can first allow modules to specify simple conditions, which get compiled in the end, and then figure out a way for more complex alters to step in before it's too late. This will not be easy, but we should keep our options open from the get go. If we start simple, ship it and then realize we cannot serve complicated, we're going to get a lot of angry contrib maintainers. Myself included.
I'm having trouble understanding how/why the proposed hook(s) wouldn't work for you?
See above. But I'll try to read through your suggestions again to see if I didn't miss anything. I'm about to sign off for the day, though, so may need to wait a bit.
All green again 🎉
Needs to merge in origin/11.x again for the performance tests and needs to update docs. The only thing that bothers me about both the MR and CR is that it's now painfully obvious that an allowlist can be limiting. We only want to turn off BigPipe and have to specify all other strategies. If contrib introduces another strategy, it won't apply to the messages block.
Doesn't a denylist make more sense here? You opt into all strategies out of the box, so the only possible change you'd want to make here is opt out of some of them, meaning a denylist makes more sense.
Only failing test is AssetAggregationAcrossPagesTest on something you added @catch: 'StylesheetBytes' => 89000,
. I've taken the 86642 the tests report and rounded it up to 86650, feel free to correct if necessary.
Seems to fail on random stuff, running tests again.
I noticed a bunch of commits got added from a branch other than 11.x as the diff showed all of those changes. I was fortunate enough to have the last "real" commit on my local so I merged in origin/11.x and force pushed that here.
Looks good to me and looking forward to the results when we land the blocks issue. Setting to NW for performance reasons.
LGTM.
Maybe wait until @alexpott confirms a services.yml file in sites/default is enough to take away his concerns in #27?
Can you help me understand how the interface I proposed in #245 does not allow you to modify the query more thoroughly?
If you put it on an entity storage handler, you imply that it only serves to define access over that entity type's queries. This is not always the case and thus it completely rules out anyone trying to provide entity query access for multiple entity types. You should not put user query logic into a NodeAccessControlHandler, for instance.
And what about any module trying to leverage entity query access across multiple entity types? As we know, entity handlers are a nightmare to extend because only one module can swap out the original (something I also fixed in Group). This is exactly why I abandoned the approach in Entity API in favor of a custom one in Group.
We need a system that can take instructions from various places, one of them could be an entity type handler for the default behavior, and compile those into an efficient query.
If you take a look at EntityOwnerTrait it exclusively relies on the owner key
This again assumes you are dealing with the entity type table. IIRC core does(/did?) some dirty juggling, looking for a "nid" column because it knows node access doesn't always get added to the node table directly. It could have been Book that did that, but I can't immediately recall.
Either way heavy -1 on relying solely on entity handlers to deal with this problem space. They are inflexible and almost non-extensible. You'd make access modules dead in the water from the get go.
Thanks for the quick reply @phenaproxima. I agree that a simple solution could indeed be nice here and move things forward far quicker.
The only thing I see your suggested approach not covering, is base fields. E.g.: If I were to create a new content type, the author field would not use Tagify because we never saw any field configuration form. The, admittedly, more complex solution did cover that part.
Either way, nothing that can't be solved manually and the added UX of not having to choose a widget but get Tagify suggested as the default is already a big win IMO. Maybe we don't need to babysit our users to such extreme lengths as I suggested in #11.
The reason it's important we do not touch existing fields is that you'd end up frustrating users who want to opt out of Tagify and see it reapplied over and over after, say, every cache rebuild.
Okay so I just had a call with @gxleano and we agree that Tagify should support this out of the box so that DrupalCMS can use it without too much custom code, if any. What we agreed upon was this:
Tagify will provide a configuration form where for each target entity type, you can configure if a widget should automatically be set and specify the configuration for said widget. The default would be to not do anything, because we don't want to force a Tagify widget on a target entity type that might not make sense.
So you could configure any entity reference to a user to use the profile widget, while any entity reference to a taxonomy term would use a different widget.
Tagify will only apply this once to new fields so that people can still change the behavior if they want. New fields are:
- Base and bundle fields of supported types whenever a new bundle entity is installed (e.g. new content type)
- Manually created fields, i.e. a FieldConfig is saved
All of these settings will be exported to a single config object, so that DrupalCMS can opt into this feature by providing the necessary config file in a config/install or config/optional folder. DrupalCMS will then also need to make sure whatever config files it has for default content types and their fields are set to use Tagify as Tagify will not touch existing fields.
Finally, a bit of an afterthought but IMO still important. Instead of hard-coding the config schema of all widgets' settings, Tagify's new config object will use the schema notation to inherit the widget settings' schema so that the config object generated by Tagify can be fully validated.
Couldn't people change this from their services.yml file which is automatically picked up because it's in the container_yamls setting?
Which reminds me we didn't add this new container param to default.services.yml. We might want to do that as it gives us the opportunity to write some brief documentation in its comment.
As I stated before, there is a reason I abandoned this approach in Group. We cannot possible translate all access logic into simple conditions, this is also the case for the node grants system.
So rather than try to pigeon hole all access logic into conditions, we need a system where you can choose to use simple conditions but at the same time have the freedom to alter the query more thoroughly if necessary.
We could perhaps come up with a few custom value objects such as OwnerCheck, but again it needs instructions on where to find that owner. is it on the same table and, if so, under what column? Or is it in a different table and how do we join that one?
So, even though this has been on my todo list for a long time and I really wish I'd have more time to work on it, it requires more thought.
We need a system that:
- Works for simple cases with good DX
- Is flexible enough for complex cases without breaking item 1
- Supports any entity type and entity operation and can therefore replace node grants completely
We need to both kill node grants with a flamethrower and come up with something that people can easily switch to. Given a proper budget I could spearhead this effort with the experience I've gained from writing Group's query alters but, even though it's a critical part of core we need to urgently fix, I'm not sure anyone is willing to free up said budget.
I'm already grateful we did get funding for the Access Policy API in 2023, allowing it to get into core in 2024.
https://www.drupal.org/node/3512407 →
Does this seem sufficient? It won't "kill" migrations per se, but it will trigger more batches and as a result may end up loading the same entities into memory over and over for every batch, leading to slower migrations.
I'll draft a CR, one sec.
I'd rather we leave it at 1k but at least mention this in the release notes, accompanied with a CR explaining you can change this behavior with a container param. Adding dynamic memory profiling is perhaps a bit overkill and may lead to obscure bugs.
Should we document 'entity.memory_cache.slots: 1000' and/or its usefulness for migrations somewhere? Seems like something that could save a lot of time if we explain it beforehand rather than when found during debugging of your migration's memory issues.
MigrateExecutable used to release the entities from memory but no longer does, yet we have (minor) concerns from @heddn that people (with free hosting plans) will run out of memory. To me, that seems like a behavioral change that could lead to some nasty hour-long debugging sessions figuring out why stuff isn't working anymore.
So I'm not sure about RTBC. Seems like we may have some collateral damage here if we release this without a CR or documentation.
Okay everything I wrote holds true, just the bug I reported in #25 is not present in the other MR. Everything else still fits as the grants cache context is added too late for the stopgap to work.
You could erase the VC error coming from 1 vs 3 by moving the node grants cache context higher up in checkViewAccess(). That could be a stopgap solution, but we really need to look into an entity status cache context instead. Visiting node pages after changes to said node will now face several extra queries that we ideally should not even run.
I'm not sure this would work. There's a bug in the MR and the intended solution leads to several cache inefficiencies.
NACH::checkViewAccess() currently only checks if you can view your own unpublished nodes. It also properly adds any cacheable metadata to the CacheableMetadata object that's passed into it and said metadata is merged in ::checkAccess() wherever we return.
So not returning early, still running the grants check and then returning the grants check means we killed off access for viewing your own unpublished node. As we do not return said access check anymore. It also means that the following lines do nothing.
if ($view_access_result) {
$access_result->addCacheableDependency($view_access_result);
}
Now, this can be fixed. So let's move on to the changes in ::viewAccess() instead and see what's changed.
- In case the node is published, we skip all of the ::viewAccess() stuff and end up adding only 'node_grants:view' if there are implementations
- In case the node is unpublished and the user has no permission, we end up adding only 'user.permissions', which does nothing
- In case the node is unpublished and the user is anonymous and has the permission, we end up adding 'user.permissions', which does nothing, and 'user.roles:authenticated'.
- In case the node is unpublished and the user is authenticated and has the permission, we end up adding 'user.permissions', which does nothing, 'user.roles:authenticated', 'user' and 'user.node_grants:view' if there are implementations. The bold part is new.
Now 1 soft collides with 2 for someone who can't view own unpublished and grant implementations exist. If we visit a published first and then visit the same node when unpublished, we will overwrite the CacheRedirect we had for 'node_grants:view' with a direct cache entry because it varies by nothing extra. Vice versa if we visit unpublished first and then publish the node, we write a cache redirect and then the cache entry.
So in principle VC will work fine, it's just slightly inefficient to kill off the redirect path when the status changes or to follow a redirect path that we know will lead to an invalidated item. No big deal, I suppose.
1 and 3 will still lead to a VC error for anonymous people with the permission while grant implementations exist because, depending on the order, we will either try to overwrite a redirect for 'node_grants:view' with 'user.roles:authenticated' or vice versa. Anonymous people shouldn't get the permission, but still...
1 and 4 will no longer lead to a VC error for authenticated users with the permission when grants are implemented, because we will always have a redirect for 'node_grants:view' regardless of node status. However, because the node status is still not reflected in the redirect chain, we will see the same inefficiencies we saw between 1 and 2.
Same story when grants are not implemented, but here we won't see 'node_grants:view' at all and it's merely the difference between nothing extra and some contexts added by checkViewAccess() so it's also the same as the behavior between 1 and 2.
TL;DR: We still have errors in some ill advised configurations and we definitely still have inefficiencies across the board. The essence remains that NACH is varying by node status but we have no context in core to reflect that.
If we don't want to support such cache contexts, then the problem becomes a VC problem because now we need to be able to add cache tags to cache redirects and we get to the possible suggestion @catch made.
After sleeping on it, it does not seem like a VC issue after all because ::checkViewAccess() does add a cache context whenever it varies by something, only it fails to do so for the status check. That is the root of the issue.
Moving back to node subsystem for that reason. If the solution is to add a status cache context, we can do that in a different issue which is for the cache subsystem.
I'm wondering if we can have a node status cache context that always returns TRUE (or whatever), but it adds the node ID as the cache tag (possibly without loading it as an optimisation) so that the redirect gets tagged when it's present.
You mean a cache context that only serves a purpose to "hint" to VC that it should add some cache tags to the redirect where it's present? Now that would be an awesome alternative. Would have to be its own kind of cache context, though. We should not mix that behavior with regular cache contexts.
Example with cache context "entity_status:node|1"
- Returns TRUE, so there is no performance hit trying to load stuff
- Extends an interface that gets aggregated so we have a list of these contexts that VC can compare against
- if VC finds it's a "special" cache context when trying to write a redirect, it calls the ::getCacheableMetadata() method of the cache context and adds those tags to the redirect
There's still the question of what happens if it's present on a node page when all caches are warm. Because at that point we'd try to write a redirect with the following added:
- 'route'
- 'entity_status:node|1',
- 'user.roles:authenticated'
- 'user'
If this gets optimized on a second visit to, say, a taxonomy page then self-healing would break it up and store the route redirect. That would erase the node stuff and next time we visit said node we'd store the redirect without 'route' and tag that with the node tag.
HOLY SHIT THAT WOULD WORK. That's awesome.
We could even remove the "special" cache contexts from the redirect if we want, but it might be better to keep them for debugging purposes. We could then go as far as to allowing these cache contexts to not have a value at all, only return a set of cache tags.
For this particular case, what we have is this:
- BlockViewBuilder::build() builds a render array with keys
['entity_view', 'block', $entity->id()]
- The Renderer will automatically add in the required cache contexts:
- 'languages:language_interface',
- 'theme',
- 'user.permissions',
- LocalTaskManager::getLocalTasks() adds the 'route' cache context
- NACH adds extra cache contexts
So this leads to a top-level redirect with just the required cache contexts, pointing to 'route' and whatever else the local tasks of the very first page after a cache rebuild specifies. As we visit more pages, the cacheable metadata of the local tasks will vary but always contain route and as such we get to a second redirect through self-healing seen in #15.
So now we have for cache keys ['entity_view', 'block', $entity->id()]
:
- The top level cache entry as a CacheRedirect to
['languages:language_interface', 'theme', 'user.permissions']
- This is the case for ALL local task blocks - The second level cache entry as a CacheRedirect to
['route']
- This comes from a massive amount of different routes, not per se node related - A third level cache entry with a CacheRedirect for our node specific cache contexts
Now imagine we hit the node page first and tag both the first and second tier redirect with 'node:1'. We then visit a few more pages, build more redirects and warm some caches and node 1 gets saved. Boom, the local tasks of all those pages cannot find their redirect path to their cache entry any more and have to calculate their contents anew.
Trying to be smart about this during self-healing is also impossible because you cannot know at what point the tag became relevant. Set it too early and you clear too many caches, set it too late and you might have a security risk on your hands.
Now imagine we did have a cache context to represent an entity status. Then we would have the following:
- The top level cache entry as a CacheRedirect to
['languages:language_interface', 'theme', 'user.permissions']
- The second level cache entry as a CacheRedirect to
['route']
- ANY REDIRECT AFTER THIS POINT IS FOR THE SPECIFIC NODE ON THE ROUTE ONLY - A third level cache entry as a CacheRedirect to 'entity_status:node|1'
- A fourth level cache entry with a CacheRedirect for our node specific cache contexts that varied by status
This is fine, because if we subsequently visit the node 2 page, we would only find the first two redirects and then resolving the route cache context would lead to a miss, at which point we can store a redirect for 'entity_status:node|2'.
Finally, there's still a very real chance the final page would be tagged with the node cache tag, but if that one gets invalidated, the redirect chain remains intact as it is not affected by any change on the node whatsoever. Status 0 will always point to one CacheRedirect and status 1 to the other. That will never, ever change.
So I'm starting to really like the prospect of an EntityStatusCacheContext for this scenario. tagging redirects is a no-go and flushing them all whenever the end of the chain is invalidated is just enormously wasteful.
Oh and before anyone panics: There is no immediate risk here.
The hardening in VC will complain about the redirect, but it will then store the new redirect just fine. The risk is that the goal was to turn the warning into an exception and then we would be in trouble, so we can't do that until we figure this one out.
Keep in mind that the bug only exists because we have no entity status cache context.
Basically the node access check outcome doesn't depend on the entity status, it varies by it. So if in NACH::checkViewAccess() we could do the following:
- // If the node status changes, so does the outcome of the check below, so
- // we need to add the node as a cacheable dependency.
- $cacheability->addCacheableDependency($node);
+ $cacheability->addCacheContexts(['node_status:1']);
if ($node->isPublished()) {
return NULL;
}
$cacheability->addCacheContexts(['user.permissions']);
Then we would be in the clear because the second redirect would be moved to the third and in its place would be a new redirect which only redirects to said node's status.
Now the question is whether such a cache context would make sense, because it could require us to load an entity that's not on the route. Here it is, but for argument's sake... And given how cache contexts need to be fast I'm not sure that's such a great idea.
So the second best thing would be to add cache tags to redirects, but then they might get invalidated all the time, killing the VC performance gain. Because, by the time we get to VC, we have the cache tags of the entire thing and we can't know at what point they became relevant. So we might be tagging the first redirect with node:1 even though the status of said node will never affect it.
The whole point of not flushing redirects is because everything that could lead to a redirect must be representable by a cache context. In the case of NACH it's not. Oof.
When we get to NACH::checkAccess()'s last return statement, the access result only has 'user.node_grants:view' cache context on it. This is because the node is published and checkViewAccess() does nothing. But it does also have the node:1 cache tag. So if the node gets unpublished, the entire cache entry for said node should vanish.
I'm thinking this is a VariationCache bug where it still finds the old redirects.
This comment is actually not true:
// Cache redirects are stored indefinitely and without tags as they never
// need to be cleared. If they ever end up leading to a stale cache item
// that now uses different contexts then said item will either follow an
// existing path of redirects or carve its own over the old one.
In this scenario, we have the following chain:
array (
'entity_view:block:olivero_primary_local_tasks:[languages:language_interface]=en:[theme]=olivero:[user.permissions]=060b2fe6874d87de20e63c5747f201eba0ae8bc76218084bdefae863baed5349' =>
(object) array(
'cid' => 'entity_view:block:olivero_primary_local_tasks:[languages:language_interface]=en:[theme]=olivero:[user.permissions]=060b2fe6874d87de20e63c5747f201eba0ae8bc76218084bdefae863baed5349',
'data' =>
\Drupal\Core\Cache\CacheRedirect::__set_state(array(
'cacheContexts' =>
array (
0 => 'languages:language_interface',
1 => 'theme',
2 => 'user.permissions',
3 => 'route',
),
'cacheTags' =>
array (
),
'cacheMaxAge' => -1,
)),
'created' => '1741266457.894',
'expire' => '-1',
'serialized' => '1',
'tags' =>
array (
),
'checksum' => '0',
'valid' => true,
),
'entity_view:block:olivero_primary_local_tasks:[languages:language_interface]=en:[route]=entity.node.canonical02d270e527ee2db38f14edbd305a4fea048b224ab212dd8a734fefc49e4a04ab:[theme]=olivero:[user.permissions]=060b2fe6874d87de20e63c5747f201eba0ae8bc76218084bdefae863baed5349' =>
(object) array(
'cid' => 'entity_view:block:olivero_primary_local_tasks:[languages:language_interface]=en:[route]=entity.node.canonical02d270e527ee2db38f14edbd305a4fea048b224ab212dd8a734fefc49e4a04ab:[theme]=olivero:[user.permissions]=060b2fe6874d87de20e63c5747f201eba0ae8bc76218084bdefae863baed5349',
'data' =>
\Drupal\Core\Cache\CacheRedirect::__set_state(array(
'cacheContexts' =>
array (
0 => 'languages:language_interface',
1 => 'theme',
2 => 'user.permissions',
3 => 'route',
4 => 'user.roles:authenticated',
5 => 'user',
),
'cacheTags' =>
array (
),
'cacheMaxAge' => -1,
)),
'created' => '1741266457.895',
'expire' => '-1',
'serialized' => '1',
'tags' =>
array (
),
'checksum' => '0',
'valid' => true,
),
'entity_view:block:olivero_primary_local_tasks:[languages:language_interface]=en:[route]=entity.node.canonical02d270e527ee2db38f14edbd305a4fea048b224ab212dd8a734fefc49e4a04ab:[theme]=olivero:[user]=2' => false,
)
The first redirect never changes no matter how many times we save the node. And because we don't tag redirects it means it will always lead to the second redirect where (if the first visit was unpublished) we have the following contexts:
array (
0 => 'languages:language_interface',
1 => 'theme',
2 => 'user.permissions',
3 => 'route',
4 => 'user.roles:authenticated',
5 => 'user',
),
Now, the 'user.roles:authenticated' and 'user' cache context should only ever be added if the node is unpublished. But because we never flushed the redirects, the cache GET when the node gets published finds the still accurate first redirect, which leads to the stale second redirect and that's where shit hits the fan.
Oh dear.
When the node is unpublished, $links contains:
\Drupal\Core\Access\AccessResultAllowed::__set_state(array(
'cacheContexts' =>
array (
0 => 'user.permissions',
1 => 'user.roles:authenticated',
2 => 'user',
),
'cacheTags' =>
array (
0 => 'node:1',
),
'cacheMaxAge' => -1,
))
Managed to reproduce following steps in IS.
On my latest branch where blocks are placeholdered, it shows the following two errors:
User warning: Trying to overwrite a cache redirect for "entity_view:block:olivero_secondary_local_tasks:[languages:language_interface]=en:[route]=entity.node.canonical02d270e527ee2db38f14edbd305a4fea048b224ab212dd8a734fefc49e4a04ab:[theme]=olivero:[user.permissions]=060b2fe6874d87de20e63c5747f201eba0ae8bc76218084bdefae863baed5349" with one that has nothing in common, old one at address "languages:language_interface, theme, user.permissions, route" was pointing to "user.roles:authenticated, user", new one points to "user.node_grants:view". in Drupal\Core\Cache\VariationCache->set() (line 185 of core/lib/Drupal/Core/Cache/VariationCache.php).
User warning: Trying to overwrite a cache redirect for "entity_view:block:olivero_primary_local_tasks:[languages:language_interface]=en:[route]=entity.node.canonical02d270e527ee2db38f14edbd305a4fea048b224ab212dd8a734fefc49e4a04ab:[theme]=olivero:[user.permissions]=060b2fe6874d87de20e63c5747f201eba0ae8bc76218084bdefae863baed5349" with one that has nothing in common, old one at address "languages:language_interface, theme, user.permissions, route" was pointing to "user.roles:authenticated, user", new one points to "user.node_grants:view". in Drupal\Core\Cache\VariationCache->set() (line 185 of core/lib/Drupal/Core/Cache/VariationCache.php).
With both having the same stacktrace of:
Drupal\Core\Cache\VariationCache->set() (Line: 117)
Drupal\Core\Render\RenderCache->set() (Line: 121)
Drupal\Core\Render\PlaceholderingRenderCache->set() (Line: 539)
Drupal\Core\Render\Renderer->doRender() (Line: 203)
Drupal\Core\Render\Renderer->render() (Line: 120)
Drupal\Core\Render\Renderer->Drupal\Core\Render\{closure}() (Line: 593)
Drupal\Core\Render\Renderer->executeInRenderContext() (Line: 119)
Drupal\Core\Render\Renderer->renderInIsolation() (Line: 146)
Drupal\Core\Render\Renderer->doRenderPlaceholder() (Line: 183)
Drupal\Core\Render\Renderer->renderPlaceholder() (Line: 715)
Drupal\big_pipe\Render\BigPipe->renderPlaceholder() (Line: 495)
Drupal\big_pipe\Render\BigPipe->Drupal\big_pipe\Render\{closure}()
Fiber->start() (Line: 502)
Drupal\big_pipe\Render\BigPipe->sendPlaceholders() (Line: 254)
Drupal\big_pipe\Render\BigPipe->sendContent() (Line: 113)
Drupal\big_pipe\Render\BigPipeResponse->sendContent() (Line: 395)
Symfony\Component\HttpFoundation\Response->send() (Line: 20)
So that definitely narrows it down to the olivero_secondary_local_tasks and olivero_primary_local_tasks blocks having the wrong cacheable metadata. Both use the local_tasks_block block plugin.
Debugging on 11.x I was able to reproduce it. It is coming from the local tasks for nodes under certain conditions after the entity status is changed.
At a quick glance, the NodeAccessControlhandler definitely adds the node as a cacheable dependency. So if the status changes, anything that was set in there should get invalidated and the error you report should not occur. I'll see if I can reproduce this locally. Might only be tomorrow or next week, though.
LocalTasksBlock::build() seems to take care with its cacheable metadata, though...
One reason for that is the account parameter, because due to that, you never know if it's really/always for the current user or not. There are issues open about that problem.
Yeah, I know. This is one of them: 🐛 Access result caching per user(.permissions) does not check for correct user Needs work
That shouldn't stop us from fixing all the places where currently cacheable metadata is thrown in the garbage, though. As we are moving pieces to placeholders, these issues are popping up left and right and we should try to put an end to them in bulk. Deprecating a boolean return value from access checks would go a long way already.
Perhaps it would put more pressure on us to finally fix the issue regarding non-current-user access checking. We have one in AccessPolicyProcessor using the account switcher, but that's only possible because said processor's internal cacheable metadata will never bubble up and we know exactly which user we're dealing with. This was by design as I was honestly annoyed with how tightly coupled our access checks are to the user cache contexts (and therefore current user).
For context, the reason it is like this is BC. This was introduced late in D8 development phase and it wasn't feasible to break all existing code using those API's.
Yeah I'm aware of the history. I could have phrased the IS as "How is this still a thing", I suppose.
FWIW, I don't really see it as the same thing. entity query forces you to make a decision, $entity->access()->isAllowed() doesn't force you to make a decision or do anything with that it just requires to type out an extra method call.
I disagree here. As long as we allow people to throw out cacheable metadata by using the boolean return value, we are letting them live in perfect ignorance of what their access check really meant.
If we start forcing everyone to use AccessResultInterface instead, it will make people aware of what they're dealing with and hopefully that will lower the barrier for people to properly add their access checks' cacheability to whatever they're building.
You could still get away with just calling ->isAllowed(), sure, but at that point you'd at least should be aware that you're dealing with a value object which can do more than just tell you whether access is granted. So at that point it would then be more fair to say the developer was aware and failed to add the right cacheable metadata. Right now the default $return_as_object is FALSE, so people don't even see that parameter and might not know they could be asking for a proper value object instead.
Does a folow-up exist?
Not yet, I can probably create on later this week. if others want to beat me to it, go ahead.
Also not sure why this is adding a new entry to the phpstan baseline, is it somehow picking up a pre-existing issue it didn't pick up before?
Yes.
Nic pointed me to this issue and I just wanna slap a big fat +1 on dropping the comment.
An example from my conversion in Group:
/**
* Implements hook_help().
*/
#[Hook('help')]
public function help(string $route_name, RouteMatchInterface $route_match): string {
return match ($route_name) {
'...' => '<p>' . '...' . '</p>',
default => ''
};
}
I mean, I think it's clear enough judging from the attribute and method name what we're dealing with.
Having said all of that, I agree that making the docs support attributes is probably the best way forward as more systems can benefit from that now that we're moving so much stuff over to attributes (e.g. entity type definitions).
Here's what I did in Group: 📌 Convert hooks to OOP hooks with attributes Active
I looked at which NAME.api.php file hooks were in and grouped them, unless I only used one hook from said file and it was unlikely I would use more hooks. Example being hook_help() from help.api.php. Those hooks I put into a CoreHooks class instead.
I ended up with:
- CoreHooks
- EntityHooks
- FieldHooks
- FormHooks
- QueryHooks
- ThemeHooks
- UserHooks
UserHooks contains the entity hooks for that entity type along with some user-specific hooks. I only put the generic entity hooks in EntityHooks.
I marked all classes as final and added typehinting for everything.
Should go over each hook manually to see if it still works.
Follow-ups:
- Deprecate _group_relation_type_manager() (v2/3) and remove in v4.
- Add aliases to our services so they can be autowired (v2/3/4)
Should probably kill _group_relation_type_manager() now, but that requires a CR.
I think we should merge changes in this issue that already improves the situation and think about how to slowly kill and replace AccessResult in Drupal 14/15.
Exactly. Make it "less worse" now, make it "great" (i.e. dead?) later.
Because entity group field is limited to the assignment of a group. I want this system to be able to hand out core permissions based on which group is active. Being able to add a new node is merely one of said permissions. One that we will have to accommodate the UI for, though.