referencedEntities: Use loadMultipleRevisions instead of loadRevision

Created on 18 May 2022, about 2 years ago
Updated 20 June 2024, 8 days ago

Problem/Motivation

I have entities with many paragraph references. Using loadMultipleRevisions instead of loadRevision in EntityReferenceRevisionsFieldItemList::referencedEntities improves performance a lot in this situation, since it executes the necessary queries a single time instead of delta+1-times.

Steps to reproduce

Create an entity with an entity_reference_revisions Field (i.e. a paragraph) and many references in it (i.e. 100+).
Watch performance with loadRevision vs loadMultipleRevisions

Proposed resolution

Replace loadRevision with loadMultipleRevisions in EntityReferenceRevisionsFieldItemList::referencedEntities

     // Load and add the existing entities.
     if ($ids) {
       $target_type = $this->getFieldDefinition()->getSetting('target_type');
+      $entities = \Drupal::entityTypeManager()->getStorage($target_type)->loadMultipleRevisions($ids);
       foreach ($ids as $delta => $target_id) {
-        $entity = \Drupal::entityTypeManager()->getStorage($target_type)->loadRevision($target_id);
-        if ($entity) {
-          $target_entities[$delta] = $entity;
+        if (isset($entities[$target_id])) {
+          $target_entities[$delta] = $entities[$target_id];
         }
       }
       // Ensure the returned array is ordered by deltas.

loadMultipleRevisions only exists for RevisionableStorageInterface storages. If that is for some reason a problem one could also check for that

     // Load and add the existing entities.
     if ($ids) {
       $target_type = $this->getFieldDefinition()->getSetting('target_type');
+      $storage = \Drupal::entityTypeManager()->getStorage($target_type);
+      $entities = $storage instanceof RevisionableStorageInterface ? $storage->loadMultipleRevisions($ids) : [];
       foreach ($ids as $delta => $target_id) {
-        $entity = \Drupal::entityTypeManager()->getStorage($target_type)->loadRevision($target_id);
+        $entity = $entities[$target_id] ?? $storage->loadRevision($target_id);
         if ($entity) {
           $target_entities[$delta] = $entity;
         }

Remaining tasks

if, from your point of view, nothing speaks against it, just implement this in your module.

User interface changes

None

API changes

Hopefully none

Data model changes

None

📌 Task
Status

Needs work

Version

1.0

Component

Code

Created by

🇩🇪Germany h1nds1ght

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇪🇪Estonia rang501 Viljandi

    I can add that it seems to work on our projects, no issues so far.

  • 🇨🇭Switzerland Berdir Switzerland

    #12 hasn't been addressed, as suggested there, this should use the optimized approach that assumes that the default revision is referenced and optimize for that and only fallback to loading the revision if that doesn't match.

    loadRevisionMultiple() should only lead to a marginal performance improvement, it just does grouped queries but it does not use a static or persistent cache.

    Also, I recommend to _not_ use this method at all, at least not until it's optimized. just loop over the items, that will be faster assuming you are viewing mostly default revisions, it will do single loads but from cache.

  • 🇮🇳India lhuria94

    Ok we want to use it only specifically JSON:API Search API endpoints, so nothing custom over there.

    And after applying the patch we saw a considerable improvement overall. Like earlier it used to to take 15s and it has gone to half.

  • 🇨🇭Switzerland Berdir Switzerland

    Right, json:api uses that. Not sure if that's a good idea, but out of scope.

    What I'm proposing isn't rocket science, just do a loadMultiple() call first, compare the revision ID and follow-up with a loadRevisionMultiple() for those where it doesn't match. The result, assuming you actually have default revisions, should be a lot better than half of 15s.

  • 🇮🇳India lhuria94

    Gotcha, Thanks for the input. Appreciate it.
    We will try that, compare and contrast and hopefully put up here a optimised patch if it works all good.

  • 🇮🇳India sumit-k

    Hi Team,

    Have done some optimization as suggested by berdir #17. Used loadMultiple(), Earlier API call was taking around 30s, can notice 15-20s improvements after changes. Sharing patch for review.

  • 🇨🇭Switzerland Berdir Switzerland

    You have a typo in there (enity ids), you also need to check the loaded entity if it's matching the revision id.

  • 🇪🇪Estonia rang501 Viljandi

    About #19, doesn't

    $item->entity->id()
    

    cause extra separate entity load?

    Although, it seems to improve saving time, in my case ~200 less db queries (originally ~3600, #7 ~2900, #19 ~2700).

  • 🇨🇭Switzerland Berdir Switzerland

    Yes, that already loads the entity, correct. It should use $item->target_id instead.

  • 🇮🇳India sumit-k

    Thanks for reviewing berdir and rang501. Sharing updated patch. Here trying to load the same entity which has a revision id.

    elseif ($item->target_revision_id !== NULL) {
            $ids[$delta] = $item->target_revision_id;
            $entity_ids[$item->target_revision_id] = $item->target_id;
          }

    Loading the same entity_ids.

    if ($ids) {
          $target_type = $this->getFieldDefinition()->getSetting('target_type');
          $storage = \Drupal::entityTypeManager()->getStorage($target_type);
          $entities = $storage->loadMultiple($entity_ids);
  • 🇪🇪Estonia rang501 Viljandi

    The new patch seem to give same results, dropping from 2900 queries to 2700.

    It seems that it improves validation in ParagraphsLibraryItemHasAllowedParagraphsTypeConstraintValidator which caused 200 queries before because it scans all paragraphs.

  • 🇩🇪Germany h1nds1ght

    loadRevisionMultiple() should only lead to a marginal performance improvement, it just does grouped queries but it does not use a static or persistent cache.
    Also, I recommend to _not_ use this method at all, at least not until it's optimized. just loop over the items, that will be faster assuming you are viewing mostly default revisions, it will do single loads but from cache.

    Well loadRevision executes loadMultipleRevisions, so they share the same drawbacks, except that loadMultipleRevisions on multiple revision ids reduces the amount of queries, which on the other hand has a larger effect the more revisions you reference.

    But using loadMultiple is definitely a point, since it will ensure using the entity cache system, which is unfortunately not available for revisions and I think this is out of scope of this issue.

    So, why not combining both? Loading default revisions with loadMultiple and everything else with loadMultipleRevisions?

  • 🇨🇭Switzerland Berdir Switzerland

    > So, why not combining both? Loading default revisions with loadMultiple and everything else with loadMultipleRevisions?

    That is exactly what I'm saying says, and the patches are moving in that direction, but are not yet correctly checking that.

    The problm is that you don't know if a id/revision id pair is the default revision or not, so it needs to be loaded and then the revision id compared and reloaded if not. See #12/15.

  • 🇩🇪Germany h1nds1ght

    OK, created another patch. The problem is a little bit that loadMultiple has entity_ids as keys and we cannot rely on that entity_id = revision_id (the last patch did so). This makes it somewhat uncomfortable but one can resolve that with another loop. I checked for revision_ids now, and not for isDefaultRevision, since I think we are interested in revisions here and each revision_id should be unique per storage and should belong to a specific entity_id. But maybe this assumption is wrong. If so we need to add the check for isDefaultRevision.

    Also I am not sure if the checks for RevisionableInterface and RevisionableStorageInterface are really necessary. If we can forget about RevisionableStorageInterface we can also remove the last loadRevision.

    One thing that comes in my mind are deleted revisions. With that approach a deleted revision might involve 3 db queries.

  • 🇨🇭Switzerland Berdir Switzerland
    +++ b/src/EntityReferenceRevisionsFieldItemList.php
    @@ -20,26 +21,36 @@ class EntityReferenceRevisionsFieldItemList extends EntityReferenceFieldItemList
    +        if ($entity instanceof RevisionableInterface && in_array($entity->getRevisionId(), $revision_ids)) {
    +          unset($ids[$entity->getRevisionId()]);
    +          $entities[$entity->getRevisionId()] = $entity;
    +        }
    +      }
    

    you need to unset in $revision_ids not the $ids?

  • 🇩🇪Germany h1nds1ght

    Can't see why. Note that i renamed the variable $ids to $revision_ids in the first loop. $revision_ids now holds the delta => revision_id relationship for the field list and $ids is a mapping of revision_id => entity_id, which is only used as temporary variable for loadMultiple and loadMultipleRevisions.

    If I remove entries from $revision_ids they are also removed from the final field list.

  • 🇨🇭Switzerland Berdir Switzerland

    The logic there is to not load the revisions of references that we matched, $ids is not used anymore after that. Right now you still load all revisions again?

  • 🇩🇪Germany h1nds1ght

    Not sure if I get your point.

    $ids is a mapping of revision_id => entity_id. The loadMultiple call should therefore load each entity in its current revision. The corresponding loaded revision_id of the loaded entity is removed from $ids and the entity aka current revision is added to $entities.

    +      foreach ($storage->loadMultiple($ids) as $entity) {
    +        if ($entity instanceof RevisionableInterface && in_array($entity->getRevisionId(), $revision_ids)) {
    +          unset($ids[$entity->getRevisionId()]);
    +          $entities[$entity->getRevisionId()] = $entity;
    +        }
    +      }
    

    $ids is then used one line after that again:

    +      $entities += $storage instanceof RevisionableStorageInterface ? $storage->loadMultipleRevisions(array_flip($ids)) : [];
    

    and loads the remaining revisions, which haven't been loaded in the first step by loadMultiple (the array_flip passes the remaining revision ids). This should be in most cases an empty array, except if the current revision of an loaded entity does not equal an referenced revision. The call for loadMultipleRevisions should then deliver an array with revisions keyed by revision_id, and the result is then added to $entities, which should add the missing revisions since it is also an array of revisions keyed by revision_id.

    Finally the collected $entities (aka revisions keyed by revision_id) is used for the field list, with a fallback to the default loadRevision (= current solution).

    +      foreach ($revision_ids as $delta => $revision_id) {
    +        $entity = $entities[$revision_id] ?? $storage->loadRevision($revision_id);
    

    As mentioned this last loadRevision seems somehow unnecessary.

    Maybe this can be written more reasonable, but I can't see what should be wrong with that approach.

  • 🇪🇪Estonia rang501 Viljandi

    Is this still relevant after 🐛 referencedEntities() causes data loss Needs review ?

  • 🇩🇪Germany h1nds1ght

    Yes I think so, because the topic here is that EntityReferenceRevisionsFieldItemList::referencedEntities loads referenced entities using single entity loads, which still happens via $item->entity.

      public function referencedEntities() {
        $target_entities = [];
        foreach ($this->list as $delta => $item) {
          if ($item->entity) {
            $target_entities[$delta] = $item->entity;
          }
        }
        return $target_entities;
      }
    

    EntityReferenceFieldItemList::referencedEntities i.e. uses loadMultiple...

Production build 0.69.0 2024