Cumulative improvements

Open on Drupal.org →

Created on 28 June 2021, about 4 years ago

Updated 11 May 2023, about 2 years ago

Migrate: Cumulative patch

This is a cumulative patch with more improvements we made during the development of a very large migration.

Note: The patch contains also the following patches:

Add an option to exclude records from import and rollback, https://www.drupal.org/project/migrate/issues/3183347 ✨ Add an option to exclude records from import and rollback Needs review
Flush queued messages for skipped row when trackChanges is active, https://www.drupal.org/project/migrate/issues/3188939 🐛 Flush queued messages for skipped row when trackChanges is active Needs review

Changes and features:

Add '--refresh-hash-only' option to drush migrate-import command

Refresh only tracking changes hashes in migration map based on current data source.
Useful if a source SQL query is changed, source data stay unchanged, and you have a large number of records already imported.

Support for rollback with --idlist with multi value source keys

Fix for rollback() method. It allows to specify --idlist with multi value keys for rollback operations, like:

drush mr Job --idlist=keyA1:keyB1,keyA2:keyB2

The same for --exclude-idlist option:

drush mr Job --exclude-idlist=keyA1:keyB1,keyA2:keyB2

It solves issue: The --idlist parameter should support multiple source keys on rollback →

Enrich migration progress message

Adds time estimation info to the migration progress message. It is the message controlled by
--feedback option. That allows you to know how much time the migration will spend.

Example:
Processed 2547 (0 created, 2547 updated, 0 failed, 0 ignored) in 60 sec (2547/min - EST: 0:39) - continuing with 'Migration1'

Allow extending status table header and data

You can create your custom addMigrateStatusHeaders() and addMigrateStatusData() methods
and define more useful columns.

Example of results:

 Group: jobs     Total   Imported  Unprocessed  Status  Last imported        agv speed  estimation  info
 Job             210948  0         210948       Idle    2021-06-15 16:50:30  272/min    12:56       tracking changes

Example of methods:

/**
   * Get additional columns to migration status table.
   *
   * @param string $groupName
   *   Name of processing group.
   *
   * @return string[]
   *   List of new header columns.
   */
  public function addMigrateStatusHeaders($groupName) {
    return [
      'agv speed',
      'estimation',
      'info',
    ];
  }

  /**
   * Add data to extended migration status table.
   *
   * @param \Migration $migration
   *   Current migration.
   * @param array $tableRow
   *   Info about a migration passed from drush_migrate_status().
   *   Format: migration name, total, imported, unprocessed, status, last date.
   *
   * @return array
   *   Additional data to migration status row.
   */
  public function addMigrateStatusData(Migration $migration, array $tableRow) {
    $trackInfo = $this->getTrackChanges() ? 'tracking changes' : '';
    $speed = $this->getMigrationSpeed($migration);
    // Index 3 means unprocessed records.
    $estimation = $this->getTimeToEndEstimation($tableRow[3], $speed);
    $speedInfo = !empty($speed) ? $speed . '/min' : '';
    return [
      $speedInfo,
      $estimation,
      $trackInfo,
    ];
  }

Initial migration run - speed up batching process

If we're running migration with Tracking changes enabled, all source records are loaded from the beginning, even already successfully migrated records. It is a performance killer when we run very large migration in manual batches, like:

drush mi Migration1 --limit=10000

Set 'initial_run' option to TRUE is indicating the initial migration run. It is used for calculating the batch offset. If we're running the initial migration run, all already migrated records are skipped (not loaded from SQL data source). Migrated record is only such records with MigrateMap::STATUS_IMPORTED.

Example of usage:

$this->source = new MigrateSourceSQL($this->getSourceQuery(), [], $this->getCountQuery(),
    [
      'map_joinable' => FALSE,
      'batch_size' => 5000,
      'track_changes' => TRUE,
      'initial_run' => TRUE,
      'cache_counts' => FALSE,
    ]);

Add importedOnlyCount() method

As the current importedCount() method returns not only imported records but also records marked for updating, the new method is created for getting a number of imported records only.

Add new getters

- isTrackChanges()
- getMultikeySeparator()

✨ Feature request

Status

RTBC

Version

2.0

Component

Code

Created by

🇨🇿Czech Republic martin_klima

Live updates comments and jobs are added and updated live.

Incomplete comments

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Open in Jenkins → Open on Drupal.org →
Core: 7.x + Environment: PHP 7.4 & MySQL 8
last update about 2 years ago
14 pass
Open in Jenkins → Open on Drupal.org →
Core: 7.x + Environment: PHP 8.2 & MySQL 8
last update about 2 years ago
14 pass
Comment about 2 years ago →
🇨🇦Canada joseph.olstad
retriggered tests
Status changed to RTBC about 2 years ago7:12pm 11 May 2023
Comment about 2 years ago →
🇨🇦Canada joseph.olstad
automated tests are passing for multiple PHP versions
Open in Jenkins → Open on Drupal.org →
Core: 7.x + Environment: PHP 8.1 & MySQL 5.7
last update about 2 years ago
14 pass
Open in Jenkins → Open on Drupal.org →
Core: 7.x + Environment: PHP 8.0 & MySQL 5.7
last update about 2 years ago
14 pass
Open in Jenkins → Open on Drupal.org →
Core: 7.x + Environment: PHP 7.3 & MySQL 8
last update about 2 years ago
14 pass

contrib.social Blog FAQ Discussions

Production build 0.71.5 2024