File migration slows down and eats more and more memory, eventually stops

Created on 16 March 2016, almost 9 years ago
Updated 25 June 2024, 6 months ago

Problem/Motivation

I am attempting to migrate around ~300,000 files from Drupal 7. As I do the migration import, I hit this:

Memory usage is 435.23 MB (85% of limit 512 MB), reclaiming memory.                                                                                                                                        [warning]
Memory usage is now 439.99 MB (86% of limit 512 MB), not enough reclaimed, starting new batch                                                                                                              [warning]

What's interesting, is that on the first run, I got about 70,000 files in one go before hitting the wall, then it halfed, then it halfed again, and now I'm to less than 1,000 per run before hitting the memory limit.

Proposed resolution

Figure out what's causing the memory usage to be so high.

Remaining tasks

  1. Figure out what the problem is
  2. Write Patch

User interface changes

N/A

API changes

N/A

Data model changes

N/A

πŸ› Bug report
Status

Needs work

Version

11.0 πŸ”₯

Component
MigrationΒ  β†’

Last updated about 12 hours ago

Created by

πŸ‡ΊπŸ‡ΈUnited States davidwbarratt

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡«πŸ‡·France andypost

    I did re-roll of πŸ“Œ Move memory management from MigrateExecutable to an event subscriber Needs review

    But is it still a blocker?

  • πŸ‡·πŸ‡ΊRussia qzmenko Novosibirsk

    This is still a problem, but in our case for nodes migration.

    We need to migrate ~2 million nodes. At the beginning of the migration, ~10 nodes per second are imported. After 50k imported nodes, the speed becomes ~2 nodes per second.

    I tried changing the batch_size in the migration, but it did not affect the migration speed at first glance.

  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    I'm affected by this as well. In this case user's migration. Memory keeps creeping up (around 2 million users).

    I've tried different options and no luck. The last thing I am trying came from this article, where it tries to play with the limit option in a loop for the migration as seen in the script suggested.

    This is currently running so I don't know the result of it. It's still not ideal, because when using the --limit option, it stills tries to do some gathering of the previous runs.

    For example, if I run drush migrate:import my_user_migration --limit 100, the output the first time would be

      Migration my_user_migration [100 inserted, 0 updated...]
    

    But then, on the second run, drush migrate:import my_user_migration --limit 100, the output would be

      Migration my_user_migration [0 inserted, 0 updated...]
      Migration my_user_migration [100 inserted, 0 updated...]
    

    Note the 0 inserted, 0 updated.

    --

    I even tried with a postSave event subscriber where I'd crear some caches but it would still not make a difference. This is what I tried:

          // '@config.factory', '@entity.memory_cache', '@entity_type.manager'
    
          $this->memoryCache->deleteAll();
          $this->configFactory->clearStaticCache();
          // Entity storage can blow up with caches so clear them out.
          foreach ($this->entityTypeManager->getDefinitions() as $id => $definition) {
            $this->entityTypeManager->getStorage($id)->resetCache();
          }
    
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    There might be some other module that keeps things in memory, due to post processing.

    resetCache() is a persistent cache clear, so it's fairly expensive and adds costs on its own. It will not add anything useful on top @entity.memory_cache->resetAll() which you do as well.

    However, that can only clear the usage of those objects within the entity storage, if anything else holds on to these objects, they will remain in memory. Pretty impossible to say what it would be in your case, probably would require some kind of profiling with xhprof or blackfire or something like that. If it is specific to users, you could try to look for user presave/insert/update hook implementations.

  • heddn Nicaragua

    For a 2M user migration, I stripped down the user source plugin so it only pulls back uids. Then I moved the actual gathering of data into a prepareRow. It had an amazing effect on the speed and memory usage of the user migration. By default the user source does what is essentially a select * from users. What you want is something more like seelct uid from users.

  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    @heddn - this is the migration and plugin that I am using:
    - Migration: https://git.drupalcode.org/project/drupalorg_migrate/-/blob/1.0.x/migrat...
    - User plugin: https://git.drupalcode.org/project/drupalorg_migrate/-/blob/1.0.x/src/Pl...

    So, your suggestion would be to override the User::query method to:

      public function query() {
        return $this->select('users', 'u')
          ->fields('u', ['uid'])
          ->condition('u.uid', 0, '>');
      }
    

    Then in the prepareRow, do you do:
    - A select * from users where uid=$uid
    - And then several $row->setSourceProperty for each property?

    I am going to try the above locally but wanted to also ask about the approach to make sure I understood you correctly.

  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    For what is worth, I am not seeing any significant increase in speed after doing the above.

    Before the change it was around 1100 records per minute
    After the change it seems to be around 1150 records per minute

    But this difference might just be the output number of the migration or me just looking a second late/early.

    The code I did:

      public function query() {
        // Query by UID earlier to speed up queries.
        return $this->select('users', 'u')
          ->fields('u', ['uid'])
          ->condition('u.uid', 0, '>');
      }
    
      public function prepareRow(Row $row) {
        // Try to determine early if this row needs to be skipped.
        $prepare_row = SourcePluginBase::prepareRow($row);
        if ($prepare_row) {
          $uid = $row->getSourceProperty('uid');
    
          // Set all properties here as we only queried by UID earlier.
          $row_data = $this->select('users', 'u')
            ->fields('u')
            ->condition('u.uid', $uid)
            ->execute()
            ->fetchAssoc();
          foreach ($row_data as $field => $value) {
            $row->setSourceProperty($field, $value);
          }
    
          return parent::prepareRow($row);
        }
    
        return FALSE;
      }
    
    
  • heddn Nicaragua

    Speed should be about the same, especially in the beginning of the migration. But by the time you get to the 1M row mark, you're memory usage should be in a better place. That's where this alternative approach (which you outlined well) really starts to shine.

  • πŸ‡ͺπŸ‡ΈSpain fjgarlin

    Great. Thanks for the info.

    I went ahead and committed the above here https://git.drupalcode.org/project/drupalorg_migrate/-/commit/044bdebd94... and I will trigger again the full migration and monitor things.

Production build 0.71.5 2024