Entity::save removes the entity from the static cache resulting to incoherences to all the entity-references still in the cache

Created on 17 July 2018, almost 7 years ago
Updated 23 April 2025, 11 days ago

Overview

I have noticed a behaviour of the Entity system of Drupal 8 that is rather confusing. When an entity (e.g. node1) is saved using Entity:save() it is evicted by both the static and persistent cache. Other entities (e.g. node2) that reference that specific entity through an Entity reference field and that have been already loaded will hold a reference to the entity object that has just been evicted from the cache. This leads to an incoherent state in which any loads of the referencing entities (node2) will keep on referencing the entity that has been already evicted. The incoherence manifests if we load the referenced entity again and try to modify it. Any changes made to the referenced entity (node1) from then onwards will not be accessible by the referencing entity (node2) unless we evict the referencing entity from the cache.

The problem is better illustrated by the following example.

Example

Model:

Type1:

  • field_integer1: integer

Type2:

  • field_ref_type1: Entity reference to type1

Data:

Nodes:
Node(id=1) of type1
Node(id=2) of type2 with field_ref_type1 referencing Node(id=1)

Code

// type1 node
$n1 = \Drupal::entityManager()->getStorage('node')->load(1);
// type2 node
$n2 = \Drupal::entityManager()->getStorage('node')->load(2);
$n1->field_integer1 = 100;

print("field_integer1 through n1: {$n1->field_integer1->value}\n");
print("field_integer1 through n2: {$n2->field_ref_type1->entity->field_integer1->value}\n");
print('n1 addr: ' . spl_object_hash($n1)."\n");
print('n2 addr: ' . spl_object_hash($n2->field_ref_type1->entity)."\n");

$n1->save();
$n1 = \Drupal::entityManager()->getStorage('node')->load(1);
$n1->field_integer1 = 200;

print("field_integer1 through n1: {$n1->field_integer1->value}\n");
print("field_integer1 through n2: {$n2->field_ref_type1->entity->field_integer1->value}\n");
print('n1 addr: ' . spl_object_hash($n1)."\n");
print('n2 addr: ' . spl_object_hash($n2->field_ref_type1->entity)."\n");

$n2 = \Drupal::entityManager()->getStorage('node')->load(2);

print("field_integer1 through n1: {$n1->field_integer1->value}\n");
print("field_integer1 through n2: {$n2->field_ref_type1->entity->field_integer1->value}\n");
print('n1 addr: ' . spl_object_hash($n1)."\n");
print('n2 addr: ' . spl_object_hash($n2->field_ref_type1->entity)."\n");

Output

field_integer1 through n1: 100
field_integer1 through n2: 100
n1 addr: 000000004ca1b1840000000013d89c9c
n2 addr: 000000004ca1b1840000000013d89c9c
field_integer1 through n1: 200
field_integer1 through n2: 100
n1 addr: 000000004ca1bcac0000000013d89c9c
n2 addr: 000000004ca1b1840000000013d89c9c
field_integer1 through n1: 200
field_integer1 through n2: 100
n1 addr: 000000004ca1bcac0000000013d89c9c
n2 addr: 000000004ca1b1840000000013d89c9c

Observations

My issue with the above behaviour is that it is not clear how should somebody load an entity utilising the entity cache, whenever possible, without having the above coherency issues.

Just let me note here that if the above synthetic scenario seems far fetched think of a reasonably complex backend. Each function of the backend has to take several entities as arguments. Somebody might opt for passing arguments as entity ids or entity objects. In the case of the former what the above example demonstrates is that unless somebody clears the cache before loading any entity it is not possible to write a function that will always read the latest value of an entity field. This is because loading an entity by its id might lead to incoherences depending on the code that has been executed before. On the other hand in the case of passing arguments like entity objects there will be a function in the calling hierarchy that has to load the objects for the first time by their id (e.g. passed as an entity id from some http request). Therefore the latter case is "reduced" to the first one.

Proposal

To my knowledge this behaviour deviates from common practice in other Object/Relational Mapping frameworks
(e.g. hibernate in Java) where within a specific context (session) it is guaranteed that entities are accessed through some sort of proxy object whose id->address mapping is constant. This way any entity references are guaranteed to be valid within that context. To my understanding, such a context is not clearly defined in the case of the Drupal 8 Entity API. The underlying issue is most probably due to the requirement that Entity::save evicts the entity object from the cache. I am not aware of the design issues that impose this requirement but if Entity::save were not evicting an entity from the cache this incoherence issue would solve this issue.

Request

Could somebody kindly confirm that the behaviour demonstrated in the example above is as expected. If yes, but somehow the example does not follow the Entity API paradigm could you please indicate any ways to avoid the issue without compromising code modularity?

🐛 Bug report
Status

Postponed: needs info

Version

11.0 🔥

Component

entity system

Created by

🇪🇸Spain Jordi Puig

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇦🇺Australia acbramley

    Currently the following code evaluates to FALSE for statically cacheable entity types, but I think that it should evaluate to TRUE both for consistency and performance

    That would only be the case if nothing at all changed on the entity, right? There is probably a very small set of entity types that this would apply to, and nothing in core afaik.

    You would need to:
    - Have no changed date
    - No revisioning
    - No other data changed between saves.

    I agree that the example in the IS with entity reference fields is probably an edge case that someone could run into, but it seems unlikely to do that in actual runtime code?

  • 🇨🇭Switzerland berdir Switzerland

    It's not about not having any changes.

    The proposal is essentially to automatically write to the static (and persistent?) entity cache on save, instead of ist invalidating. That would indeed likely result in a performance improvement but it also comes at a certain risk IMHO. In some cases, we'll never load data from the actual storage, and there *can* be some edge case mismatches. For example when saving an entity with multiple values in a field that only allows one on the storage level. That's not enforced at runtime, so you can add those values, but they are not persisted. But with such a change, those values would be available as long as the entity remains cached.

    However, that wouldn't solve the reported issue anyway because entity reference fields do not use a global context to store their reference, it's stored directly on their field item/property object. If we want to make this consistent, then what we'd have to do is not store the entity for non-new entities with a static cache, but that could also have some side effects. We can not know if the entity that was set there has been changed, it might not be the same instance as the one that might have been loaded, it could be cloned and what not. Paragraphs/ERR for example have a built-in feature to save referenced entities based on a flag when the host entity is saved.

    Setting back to active, because we don't really need more info, but I'm tempted to make this a won't fix.

Production build 0.71.5 2024