Migrate: Cumulative patch
This is a cumulative patch with more improvements we made during the development of a very large migration.
Note: The patch contains also the following patches:
Changes and features:
Add '--refresh-hash-only' option to drush migrate-import command
Refresh only tracking changes hashes in migration map based on current data source.
Useful if a source SQL query is changed, source data stay unchanged, and you have a large number of records already imported.
Support for rollback with --idlist with multi value source keys
Fix for rollback() method. It allows to specify --idlist with multi value keys for rollback operations, like:
drush mr Job --idlist=keyA1:keyB1,keyA2:keyB2
The same for --exclude-idlist option:
drush mr Job --exclude-idlist=keyA1:keyB1,keyA2:keyB2
It solves issue:
The --idlist parameter should support multiple source keys on rollback →
Enrich migration progress message
Adds time estimation info to the migration progress message. It is the message controlled by
--feedback option. That allows you to know how much time the migration will spend.
Example:
Processed 2547 (0 created, 2547 updated, 0 failed, 0 ignored) in 60 sec (2547/min - EST: 0:39) - continuing with 'Migration1'
Allow extending status table header and data
You can create your custom addMigrateStatusHeaders() and addMigrateStatusData() methods
and define more useful columns.
Example of results:
Group: jobs Total Imported Unprocessed Status Last imported agv speed estimation info
Job 210948 0 210948 Idle 2021-06-15 16:50:30 272/min 12:56 tracking changes
Example of methods:
/**
* Get additional columns to migration status table.
*
* @param string $groupName
* Name of processing group.
*
* @return string[]
* List of new header columns.
*/
public function addMigrateStatusHeaders($groupName) {
return [
'agv speed',
'estimation',
'info',
];
}
/**
* Add data to extended migration status table.
*
* @param \Migration $migration
* Current migration.
* @param array $tableRow
* Info about a migration passed from drush_migrate_status().
* Format: migration name, total, imported, unprocessed, status, last date.
*
* @return array
* Additional data to migration status row.
*/
public function addMigrateStatusData(Migration $migration, array $tableRow) {
$trackInfo = $this->getTrackChanges() ? 'tracking changes' : '';
$speed = $this->getMigrationSpeed($migration);
// Index 3 means unprocessed records.
$estimation = $this->getTimeToEndEstimation($tableRow[3], $speed);
$speedInfo = !empty($speed) ? $speed . '/min' : '';
return [
$speedInfo,
$estimation,
$trackInfo,
];
}
Initial migration run - speed up batching process
If we're running migration with Tracking changes enabled, all source records are loaded from the beginning, even already successfully migrated records. It is a performance killer when we run very large migration in manual batches, like:
drush mi Migration1 --limit=10000
Set 'initial_run' option to TRUE is indicating the initial migration run. It is used for calculating the batch offset. If we're running the initial migration run, all already migrated records are skipped (not loaded from SQL data source). Migrated record is only such records with MigrateMap::STATUS_IMPORTED.
Example of usage:
$this->source = new MigrateSourceSQL($this->getSourceQuery(), [], $this->getCountQuery(),
[
'map_joinable' => FALSE,
'batch_size' => 5000,
'track_changes' => TRUE,
'initial_run' => TRUE,
'cache_counts' => FALSE,
]);
Add importedOnlyCount() method
As the current importedCount() method returns not only imported records but also records marked for updating, the new method is created for getting a number of imported records only.
Add new getters
- isTrackChanges()
- getMultikeySeparator()