Entity lazy multiple front loading

Created on 3 August 2011, over 13 years ago
Updated 30 December 2024, 8 days ago

Problem/Motivation

Drupal 7 introduced multiple loading of entities and the controller classes. This means that especially when building lists of content, the best case can be a single multiget from cache to get all the fully populated entity objects (when using entitycache module) or that all nodes are loaded with a flat number of database queries regardless of the number of nodes. This improved on the situation in Drupal 6 where each load was loaded individually, and each node_load() could execute a decent number of database queries each.

For this to work, there is a general pattern of "get the IDs, multiple load the nodes based on the IDs, do stuff with them" - this is great for content listing, however it does not work if for example you need to load 7 different nodes in seven different blocks - those still get grabbed one at a time.

For the WSCCI iniatiative (especially after the irc meeting last night and discussions in #1177246: Context values: Keys or objects? ), there is a desire to have the publicly accessible context properties be either literals or objects. In the case of entities, this would be $context['node'] or $context['user'] most likely.

There is also a desire to improve block caching, so that from just context, you can build a cache key and load a block from cache (core already allows you to do this via drupal_render(), #cache, and pre_render callbacks although it is only partly formalized).

This means that if we just got an $nid from somewhere, use the $nid as part of a cache key, then get a cache hit, we won't actually need to load the node (or a 'parent' page won't need to load the node, but an ESI callback might on a cache miss in a different PHP process altogether).

But... passing around $node objects, and yet only needing $nid to generate a cache key seem mutually exclusive I hear you say!

Proposed resolution

There has been a lot of discussion about making entities into proper classes (rather than stdClass) with methods etc. Additionally, that we should keep the Load controller (and associated other controllers as they come into play) as separate classes to the actual entity class (so $entity->save() calls a method from an EntitySave class, the code wouldn't be baked into the Entity class). Details are not fully worked out yet but that is the general direction.

This means I think we should be able to construct an entity object like this:

$entity = new Entity($type, $id); // $id likely optional so that mock entities can be created, potentially new ones.

This means I can do $node = new Entity('node', $nid); $node->id (or $node->id() or whatever), and that will work fine.

What occurred to me yesterday, is there may well be a way to reconcile this with multiple load, would look something like this:

* When you instantiate the class with an $id, the $id gets added to the EntityLoad controller - which maintains a list of $ids of that type that are in use.

* (two options), when you call $entity->load(), or just access a property that is missing (triggering __get() or similar), the $entity object calls back to the EntityLoad controller to fetch the actual loaded node. I am not yet tied to the magic method vs. the explicit ->load() method for this, the internal implementation in terms of this issue could be very similar.

* When ->load() (either directly called or via magic) asks the EntityLoad controller, it inspects the list of entity IDs that it has, vs. the ones that are already loaded. At this point, it is able to multiple load all of the nodes that weren't loaded already - so they're ready for later when requested. (we could also add a method or property to disable this behaviour, allow the default to be overridden or similar).

This would give us the following:

- for lists, multiple load works more or less the same as now, except you could just foreach over $nids, instantiate the class and load it - no get $nids, load, then foreach pattern. So it should be a simpler for people who just want to load nodes and do stuff with them. The front-loading would be encapsulated in the class logic rather than a pattern that has to be copy/pasted around.

- for non lists (like multiple blocks on a page with different nodes based on different relationships), if we build the context for the blocks, then go through and execute the block handlers in sequence (either building the actual render array or rendering the string, doesn't matter for this), multiple load will work for this case too, whereas it currently doesn't.

- For sites using ESI/big pipe or similar, processes that don't need to load entities won't do so just to generate cache keys or similar.

- There are advantages for doing things like mocking the context object too (i.e. a block callback never has to call $node = node_load($nid); it just acts on a class that was passed into it), so dependency injection, consistency etc. It keeps us closer to menu_get_object() and passing parameters vs. node_load(arg(1));

- This approach should very compatible with adding an LRU cache for entities, see #375494: Restrict the number of nodes held in the node_load() static cache and #1199866: Add an in-memory LRU cache .

Remaining tasks

No patch yet, we really need to get #1018602: Move entity system to a module in.

User interface changes

None.

API changes

Probably.

📌 Task
Status

Active

Version

11.0 🔥

Component

entity system

Created by

🇬🇧United Kingdom catch

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

  • Needs issue summary update

    Issue summaries save everyone time if they are kept up-to-date. See Update issue summary task instructions.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇬🇧United Kingdom catch

    Ten years later...

    The main barrier to pursuing this, apart from ::__get() or anything else is that when you call Entity::load()/EntityStorage::load() you either get an entity or FALSE back, you never get an entity object that might or might not exist in the database. That makes the original approach here very hard to reconcile with entity caching - we never want to run a query just to check if we have an entity or not, when we could just get it from cache anyway.

    📌 Try to replace path alias preloading with lazy generation Active has a proof of concept for changing how path alias preloading works, it relies on Fibers which didn't exist until PHP 8.1. By doing so it doesn't rely on any changes to entity classes, magic methods etc. and I think the same thing can work here.

    Let's say we have three independent entity loading calls in different placeholders (or any code that can run inside a Fiber, which will be most of it after 🌱 Adopt the Revolt event loop for async task orchestration Active .

    // Placeholder 1.
    Node::load(4);
    // Placeholder 2.
    Node::load(3);
    // Placeholder 3.
    Node::loadMultiple(1, 2, 3);
    

    With the Fibers approach, when each of these calls happen, we can check the static cache, and return as normal in those cases.

    When we get a cache miss, we can add the entity ID to a class property that holds 'a list of entities that need to be loaded' (i.e. haven't been returned from static cache), and suspend the Fiber.

    So during the page request, node 4 gets added to the class property first, the fiber is suspended, we go to the next placeholder, same thing happens for node 3, go to the third placeholder, 1 and 2 are added (3 is already in there).

    Then we run out of placeholders to loop through and end up back at the first one. We now have nodes 1, 2, 3, and 4 all in the 'to be loaded' property, then we can run the rest of multiple loading - first check the persistent cache, then the database. When we reach the later placeholders, they check the static cache again, find that the entity was already loaded, and go ahead.

    What would have been three separate database loading operations now gets collapsed into one.

    What this won't affect is the following code in a single placeholder:

    Node::load(1);
    Node::load(2);
    

    e.g. when we Fiber::suspend(), we can only get back to Node::load(1) and complete that before going to Node::load(2).

    But if this happens in two placeholders:

    // Placeholder 1.
    Node::load(4);
    Node::load(3);
    // Placeholder 2.
    Node::load(2);
    Node::load(1);
    

    In this case, the approach would mean that node 4 and 2 are multiple loaded together, and then node 3 and 1 are multiple loaded together, so two multiple loads instead of four single loads.

    And the worst case is that we Fiber::suspend(), there are no other entities of the same type to load, and we end up back where we were - then it's effectively a no-op and will have the overhead of only a couple of cheap function calls.

  • Pipeline finished with Failed
    8 days ago
    Total: 114s
    #381744
  • 🇬🇧United Kingdom catch

    Pushed a branch. It's going to need 📌 Try to replace path alias preloading with lazy generation Active to be able to show any profiling numbers and for manual testing.

    I think kernel test coverage of this should be pretty straightforward. We need a hook_entity_load() test implementation that logs how many times it's called (might already exist), create three entities, then load the three entities inside a fiber each, and see how many times the hook implementation is called.

  • 🇬🇧United Kingdom catch

    📌 Create placeholders for more things Active isn't enough for manual testing at least with stock Umami. I think for it to work, we'd need to be rendering entities in placeholders, then it would help with entity references - that currently doesn't happen. Will open another issue. Can still add test coverage here to demonstrate the concept.

Production build 0.71.5 2024