Allow a migration to be imported concurrently

Created on 21 November 2022, almost 2 years ago
Updated 17 August 2023, about 1 year ago

Problem/Motivation

In the import method of MigrateExecutable there is code that stops a migration from running if it is already running. I'm sure this is there for a valid reason like keeping things predictable/making sure the same items don't get operated on simultaneously. However, it does mean that concurrent processing of individual source items from the same migration isn't possible. For example, I have an API that requires me to visit a single URL for every single product. Ideally, I would like to split up each url into its own queue and process items in the queue concurrently to speed up the migration. Not really sure how/if this could work but thought I'd put it up for discussion.

  /**
   * {@inheritdoc}
   */
  public function import() {
    // Only begin the import operation if the migration is currently idle.
    if ($this->migration->getStatus() !== MigrationInterface::STATUS_IDLE) {
      $this->message->display($this->t('Migration @id is busy with another operation: @status',
        [
          '@id' => $this->migration->id(),
          '@status' => $this->t($this->migration->getStatusLabel()),
        ]), 'error');
      return MigrationInterface::RESULT_FAILED;
    }

Steps to reproduce

N/A

Proposed resolution

Add new allow_concurrency key to migration files that allows disabling the check. It will be false by default so current behavior is preserved.

Remaining tasks

User interface changes

API changes

New allowsConcurrency method added to MigrationInterface

Data model changes

N/A

Release notes snippet

✨ Feature request
Status

Closed: works as designed

Version

11.0 πŸ”₯

Component
MigrationΒ  β†’

Last updated about 11 hours ago

Created by

achap πŸ‡¦πŸ‡Ί

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • last update about 1 year ago
    Custom Commands Failed
  • @achap opened merge request.
  • last update about 1 year ago
    Custom Commands Failed
  • last update about 1 year ago
    29,959 pass
  • achap πŸ‡¦πŸ‡Ί

    While this approach allows migrations to run concurrently the status becomes useless because the value is stored in key value storage with the id of the migration as the key. So even if you instantiated a migration for each individual item the status would just keep being overridden and not be very helpful. I wonder if there is a way to have a uuid for each instance returned by createInstance?

    In the Migration plugin class these two methods use the plugin id:

      /**
       * {@inheritdoc}
       */
      public function setStatus($status) {
        \Drupal::keyValue('migrate_status')->set($this->id(), $status);
      }
    
      /**
       * {@inheritdoc}
       */
      public function getStatus() {
        return \Drupal::keyValue('migrate_status')->get($this->id(), static::STATUS_IDLE);
      }
    
  • Status changed to Closed: works as designed about 1 year ago
  • achap πŸ‡¦πŸ‡Ί

    So I quickly realized that this wasn't going to work/be very difficult and the way I was going about it was sending me down a rabbit hole. Stepping back... I realized that I can achieve the same result by using the deriver key in my migration and then passing the product ids I need to generate migration plugins for using the MemoryCache service. That way, each product id has a migration plugin created for it which can be added to a queue. The queue can then process migrations concurrently!

  • achap πŸ‡¦πŸ‡Ί

    Discovered another issue with the derivative approach. It creates a mapping table for each derivative migration, so I think that also won't work as I will have thousands of new tables...

Production build 0.71.5 2024