Add static cache to Migration FieldableEntity::getFields

Created on 25 January 2022, almost 3 years ago
Updated 7 September 2023, over 1 year ago

Problem/Motivation

When performing a large migration with millions of rows, the getFields() method of the Drupal\user\Plugin\migrate\source\d7\FieldableEntity class is not efficient.

For example, in the Drupal\user\Plugin\migrate\source\d7\User class, the following occurs in prepareRow().

    // Get Field API field values.
    foreach ($this->getFields('user') as $field_name => $field) {
      // Ensure we're using the right language if the entity and the field are
      // translatable.
      $field_language = $entity_translatable && $field['translatable'] ? $language : NULL;
      $row->setSourceProperty($field_name, $this->getFieldValues('user', $field_name, $uid, NULL, $field_language));
    }

If you have a users table with a million users, you might set a batch size of 10k during a migration. This would incur only 100 sql queries to fetch the million rows of data.

But, when prepareRow() is called for each row, another query will be fired by getFields().

For a table with a million rows, you will now incur 1 million sql queries to get the field instances.

Its one thing when it occurs in getFieldValues(), since the data will be different per row. (Though there are ways we could optimize that, too.)

But, in the case of fetching the fields for an entity type/bundle, it is idempotent and therefore will be the same for every entity type/bundle combination.

There is no need to incur this overhead on every invocation.

The cost is even higher when the DB is remote, given the latency of the connection.

Steps to reproduce

1. Run a migration...large or small.
2. Put a breakpoint in the getFields() method. (Or add a watchdog message. Or whatever is your preferred method to track the number of calls.)
3. Observe the number of times this method is called.

Proposed resolution

Add caching, using a class property, to \Drupal\migrate_drupal\Plugin\migrate\source\d7\FieldableEntity::getFields to reduce the cost of fetching the fields to basically once per entity type/bundle.

Note that the d6 source plugin, Node, is already using caching for the fields.

📌 Task
Status

Fixed

Version

11.0 🔥

Component
Migration 

Last updated 3 days ago

Created by

🇺🇸United States TomTech

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024