[meta] Preserving auto-increment IDs on migration is fragile

Created on 14 June 2016, over 8 years ago
Updated 10 November 2024, about 1 month ago

Problem/Motivation

It is typical and normal that a site cannot migrate into D8 all in one migration. Most of the site is migrated on one date. Then a few days later a final cut-over is declared and all content that wasn't migrated previously is caught up. This isn't all that hard to do. Run drush ms and see what things are unprocessed and run those individual migrations another time.

There are two problems with this:

1. With the user picture migration, the source database does not have entity IDs, so newly created IDs on the destination site can then conflict with other migrations that do (like files), this has a dedicated issue at #2826047: 6-8 user picture migration must create new fids, which can conflict with fids used by other migrations .

2. With nodes and similar entities, after the initial migration, it's possible for entities to be created both on the source and destination sites. When this happens, it's impossible then to correctly preserve IDs - since the same numeric ID represents a different entity.

Audiences to consider

  1. Non-technical users of the core UI
  2. Drush novices - simply running an upgrade command
  3. Technical users doing custom migrations

Requirements to consider

  1. Maintain data integrity
  2. No surprises
  3. Provide a path forward
  4. Preserve URLs (e.g., node/17).
  5. Minimize technical debt.
  6. Minimize effort to implement.

Proposed resolution

Possible blockers to things are:

#2818147: Use Migration process plugin in *all* places
#2890690: MigrateLookup plugin has inconsistent return values.
🐛 MigrationLookup doesn't create stub when there's multiple migrations. Needs work
📌 Migration Lookup plugin does not return multiple values when matched Needs work

Possible solutions

Pre-requisite - thorough documentation on drupal.org, and a clear description on the preparatory screen of the core UI, of the risks associated with maintaining IDs, and describing approaches - avoid adding manual content until after all migration is done, doing a custom migration which does not preserve IDs, ... This is necessary regardless of any additional technical approaches.
Documentation only
We rely entirely on documenting the issues.

  1. Maintain data integrity
    Risk: People don’t pay attention to the documentation and screw themselves.
  2. No surprises
    Risk: If they’ve ignored the documentation, the results will be very surprising.
  3. Provide a path forward
    In the documentation, if they go back and read it.
  4. Preserve URLs (e.g., node/17).
    Yes.
  5. Minimize technical debt.
    No new code to maintain.
  6. Minimize effort to implement.No new code to implement.

Pre-upgrade audit See #2876085: Before upgrading, audit for potential ID conflicts
Run a process before the actual import which identifies conflict risks - cases where an ID in the content to be imported matches an ID that already exists in the destination.

  1. Maintain data integrity
    If this is a required step, the user can choose whether to proceed and overwrite any conflicts, or take an alternative approach (per the docs).
  2. No surprises
    They are told exactly what content could be affected by conflicts.
  3. Provide a path forward
    Yes.
  4. Preserve URLs (e.g., node/17).
    Yes.
  5. Minimize technical debt.
    Audit code should be reasonably isolated and maintainable.
  6. Minimize effort to implement.
    We’d need to first do what the import process normally does, identify and instantiate the necessary migrations. For each migration, we need to identify the destination ID field, see if that field is mapped in the process section, look for any of destination ID values which are not in the migration’s map table, and see if any of *those* IDs are in the source. Oh, but we need to consolidate this across all migrations to the same entity type… Ugh.

Manipulate auto-increment values See #2876086: Manipulate auto-increment values to avoid conflicts
The assumption here is that the migration is run on a clean (content-free) site, and each migration when complete sets “the” table for the destination entity to have an auto-increment a “safe” distance beyond the highest migrated ID.

  1. Maintain data integrity
    Maintains integrity, with the small risk of underestimating how far to bump the auto-increment (too much content added on the source system).
  2. No surprises
    Yes.
  3. Provide a path forward
    No path forward needed.
  4. Preserve URLs (e.g., node/17).
    Yes.
  5. Minimize technical debt.
    API addition to database system. Complex and fragile implementation (see below).
  6. Minimize effort to implement.
    Needs a database schema API for retrieving/setting auto-increment. For each migration, we need to identify the destination ID field, see if it’s being mapped, and figure out what table contains the auto-increment ID.

Identify conflicts during import See #2876090: Warn if migrated content will overwrite manually-created content
Add a process plugin to vulnerable migrations which checks to see if the ID being migrated already exists on the destination side and was not migrated (is not in the migration map table). If the source entity would overwrite a non-migrated entity, throw an error.

  1. Maintain data integrity
    Destination nodes will not be overwritten. Some source records will not be imported.
  2. No surprises
    If someone hasn’t read the docs, they may be surprised that they only got a partial migration.
  3. Provide a path forward
    Difficult path forward - manually migrate the content that got rejected?
  4. Preserve URLs (e.g., node/17).
    Only for the content which was successfully imported. If you’ve manually created, say, node/17, then node/17 on the destination site will show different content from node/17 on the source site.
  5. Minimize technical debt.
    A fairly straight-forward process plugin.
  6. Minimize effort to implement.
    A fairly straight-forward process plugin.

Remaining tasks

User interface changes

API changes

Data model changes

🌱 Plan
Status

Active

Version

11.0 🔥

Component

migration system

Created by

heddn Nicaragua

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024