Skip already migrated rows early (performance improvement)

Created on 11 December 2023, 7 months ago
Updated 19 December 2023, 6 months ago

Problem/Motivation

Sometimes we need to run migrations in chunks and use the --limit option.

The migration source plugin loads the source rows from the beginning and checks if they have already been migrated, if they need to be updated, etc.

Most of the time, we can skip already migrated rows very early, saving time and improving migration performance.

This is the case when all the following conditions are met:
- the row has already been migrated
- it doesn't need an update
- the track changes feature is disabled
- the row is not above the high water mark

✨ Feature request
Status

Needs work

Version

11.0 πŸ”₯

Component
MigrationΒ  β†’

Last updated about 16 hours ago

Created by

πŸ‡¨πŸ‡ΏCzech Republic martin_klima

Live updates comments and jobs are added and updated live.
  • Needs issue summary update

    Issue summaries save everyone time if they are kept up-to-date. See Update issue summary task instructions.

  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

  • Needs subsystem maintainer review

    It is used to alert the maintainer(s) of a particular core subsystem that an issue significantly impacts their subsystem, and their signoff is needed (see the governance policy draft for more information). Also, if you use this tag, make sure the issue component is set to the correct subsystem. If an issue significantly impacts more than one subsystem, use needs framework manager review instead.

Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @martin_klima
  • Status changed to Needs work 7 months ago
  • πŸ‡ΊπŸ‡ΈUnited States cilefen
  • πŸ‡¨πŸ‡ΏCzech Republic martin_klima

    I checked the reason for the failed test.

    Spell-checking found 2 issues:
    SourcePluginBase.php:416:36 - Unknown word (sourceid)
    SourcePluginBase.php:418:25 - Unknown word (Highwater)

    in this part of the code:

          if (!empty($row->getIdMap()['sourceid1'])
            && !$row->needsUpdate()
            && !$this->aboveHighwater($row)
            && !$this->trackChanges) {
            continue;
          }
    

    I have no idea how to fix it. Issues found are not words. One is the array key and the second is the method name.
    Do you have any hint of what I can do @cilefen?

  • Status changed to Needs review 7 months ago
  • πŸ‡¨πŸ‡ΏCzech Republic martin_klima

    I am saving the current patch for reference.

  • Open in Jenkins β†’ Open on Drupal.org β†’
    Environment: PHP 8.1 & MariaDB 10.3.22
    last update 7 months ago
    Custom Commands Failed
  • Status changed to Needs work 6 months ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Thanks for reporting.

    Issue summary appears to be missing several sections, recommended to use the standard issue template. Will probably need statistics to show the performance improvement

    New feature will require test coverage.

    All will most likely need a submaintainer approval.

    Thanks.

  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada

    The spelling errors can be skipped by adding the words to core/misc/cspell/dictionary.txt or by using a cspell:ignore line much like core/modules/filter/src/Plugin/migrate/process/FilterID.php does, for example. I am not familiar with current best practices in which one is preferred, but a) core/modules/migrate/tests/src/Kernel/HighWaterTest.php already has a Highwater ignore line b) repeated ignores seemingly are not a problem because there are multiple sourceid ignores in migrate. So: I think you should specifically ignore them instead of editing the dictionary.

  • πŸ‡¬πŸ‡§United Kingdom jofitz

    Added cspell:ignore, as suggested.

    Status remains at NW because patch still needs tests etc

  • Open in Jenkins β†’ Open on Drupal.org β†’
    Environment: PHP 8.1 & MariaDB 10.3.22
    last update 6 months ago
    Build Successful
Production build 0.69.0 2024