Use loadMultiple() and reset memory cache in ContentEntity source

Created on 14 April 2023, over 1 year ago

Problem/Motivation

For larger datasets, the migration can fail because of full memory.

Steps to reproduce

Proposed resolution

This is useful for a large amount of datasource records using Drupal\migrate_drupal\Plugin\migrate\source\ContentEntity

We propose to use LoadMultiple in \Drupal\migrate_drupal\Plugin\migrate\source\ContentEntity::yieldEntities to reduce the load calls, and to empty the cache regularly to improve memory usage.

This issue is split off from 🐛 Give batch_size feature to Drupal\migrate_drupal\Plugin\migrate\source\ContentEntity so it can scale Postponed which also adds batch size.

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

📌 Task
Status

Needs work

Version

10.1

Component
Migration 

Last updated 2 days ago

Created by

🇨🇭Switzerland mathilde_dumond

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @mathilde_dumond
  • Status changed to Needs review over 1 year ago
  • 🇨🇭Switzerland mathilde_dumond

    Fixed the space issue after the while

  • 🇨🇭Switzerland berdir Switzerland
    +++ b/core/modules/migrate_drupal/src/Plugin/migrate/source/ContentEntity.php
    @@ -181,15 +181,22 @@ protected function initializeIterator() {
    +    $i = 0;
    +    while ($ids_chunk = array_slice($ids, $i, 50)) {
    +      foreach ($storage->loadMultiple($ids_chunk) as $entity) {
    

    maybe add a comment to explain what we are doing?

  • 🇬🇧United Kingdom joachim
    +++ b/core/modules/migrate_drupal/src/Plugin/migrate/source/ContentEntity.php
    @@ -181,15 +181,22 @@ protected function initializeIterator() {
    +    while ($ids_chunk = array_slice($ids, $i, 50)) {
    

    I think this would be easier to read with array_splice() (called with an empty replacement) -- because then there is no $i to keep track ok.

  • 🇨🇭Switzerland berdir Switzerland

    See my comment in the batch size issue. array_splice() is *much* slower, at least on a numeric array because it reorders the array I guess. But yes, explaining exactly that is what belongs in the comment I suggested to add.

  • 🇨🇭Switzerland mathilde_dumond

    Thanks for your feedbacks :) Now with a comment explaining why we chose array_slice, and what we do in the following code

  • 🇧🇷Brazil murilohp
    +++ b/core/modules/migrate_drupal/src/Plugin/migrate/source/ContentEntity.php
    @@ -181,15 +181,26 @@ protected function initializeIterator() {
    +      \Drupal::service('entity.memory_cache')->deleteAll();
    

    Don't we need to inject this service on the class? Or maybe just move it outside the while loop(at least this way we avoid calling the service method on each loop)

  • 🇨🇭Switzerland berdir Switzerland
    +++ b/core/modules/migrate_drupal/src/Plugin/migrate/source/ContentEntity.php
    @@ -181,15 +181,26 @@ protected function initializeIterator() {
    +    $i = 0;
    +    // We choose array_slice over array_splice or array_chunk for performance
    +    // reasons.
    +    // We slice the array of all ids in chunks of 50 to loadMultiple entities
    +    // 50 at a time.
    +    while  ($ids_chunk = array_slice($ids, $i, 50)) {
    +      foreach ($storage->loadMultiple($ids_chunk) as $entity) {
    

    Instead of we something, just write it as a fact, I'd also reverse the two sentences:

    Slice the IDs into chunks of 50 and load them together. array_slice() with an incrementing position is used as it is faster than array_splice() and doesn't use extra memory like array_chunk().

    Also, maybe put the 50 in a $chunk_size variable, then you don't need to repeat it and it would be easier to make it configurable.

  • Status changed to Needs work over 1 year ago
  • The Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

    This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

  • 🇮🇳India Akram Khan Cuttack, Odisha

    added updated patch address #9 and #10 both

  • Status changed to Needs review over 1 year ago
  • 🇮🇳India Akram Khan Cuttack, Odisha
  • 🇧🇷Brazil murilohp
    +++ b/core/modules/migrate_drupal/src/Plugin/migrate/source/ContentEntity.php
    @@ -181,16 +181,27 @@ protected function initializeIterator() {
    +    \Drupal::service('entity.memory_cache')->deleteAll();
    

    My idea was juts to properly inject it, or create a variable to load the entity memory cache and then call the delete the cache inside the loop, I think we still need to clean the cache on every loop to avoid memory errors

    +++ b/core/modules/migrate_drupal/src/Plugin/migrate/source/ContentEntity.php
    @@ -248,7 +259,7 @@ public function count($refresh = FALSE) {
    +    // @todo Determine a better way to retrieve a valid count for translations.
    

    Is this part of the scope of this issue?

    +++ b/core/modules/migrate_drupal/src/Plugin/migrate/source/ContentEntity.php
    @@ -181,16 +181,27 @@ protected function initializeIterator() {
    +    // We choose array_slice over array_splice or array_chunk for performance
    +    // reasons.
    +    // We slice the array of all ids in chunks of $chunk_size to
    +    // loadMultiple entities $chunk_size at a time.
    

    We still need to address the #10 comment here.

    Here's a new patch addressing what I mentioned here and an interdiff.

  • Status changed to Needs work over 1 year ago
  • 🇨🇭Switzerland berdir Switzerland

    Injecting it makes sense, yes, but then you need to do the deprecation dance unfortunately. optional constructor argument, check if it's empty, and then get it from the container and do a deprecation message, like this:

    @trigger_error('Calling ' . __METHOD__ . '() without the $current_user argument is deprecated in drupal:9.2.0 and will be required in drupal:10.0.0.', E_USER_DEPRECATED);
    
Production build 0.71.5 2024