Add an option to automatically assign unique ids to rows

Created on 15 September 2023, about 1 year ago
Updated 11 June 2024, 5 months ago

Problem/Motivation

Currently the CSV source plugin requires that ids be set to a unique array of fields in source configuration. These are used to determine whether or not a row is "unique" or not, and therefore whether or not to import it.

I have a situation where I always want my rows to be interpreted as unique. I am using Migrate Source UI β†’ , and I want to be able to upload the same CSV multiple times without changing it, and I want new records to be continually produced each time, never matching to ones that were imported previously.

I tried using the create_record_number option:

source:
    plugin: csv
    path: tmp://
    create_record_number: true
    record_number_field: rownum
    ids: [rownum]

This almost works to make rows unique, but importing the same CSV again results in none of them being processed, because they have the same row numbers.

Proposed resolution

I propose that we add a new assign_uuid boolean option to CSV source plugin configurations, which would behave as follows:

If assign_uuid is set to TRUE, then a uuid column will be automatically added to each row, populated with a UUID generated via \Drupal::service('uuid')->generate(), and the ids source configuration will be set to [uuid].

The logic for creating the uuid column and generating the UUID would be in CSV::getGenerator(), alongside the existing logic for generating the record_number_field column. The logic for assigning ids: [uuid] would be in the CSV::_construct() method, before it checks to see if ids is empty.

So, when assign_uuid is set, ids becomes optional.

This would allow for a few interesting possibilities.

Used by itself, assign_uuid: true would accomplish the original goal I described above: the ability to import the same CSV over and over and never find duplicates.

It also would allow someone to create an importer that explicitly adds a uuid column as a normal process pipeline plugin. A CSV that includes UUIDs in that column could be used to update existing records if the uuid column were populated OR create new records if the uuid column were empty.

Notably, the wording "assign UUID" is a bit misleading here, so I'm open to ideas on that, because this does not actually assign the generated UUID to the entity that is created. It only assigns it to the row for uniqueness. This UUID is saved to the migrate_map_* table, but not to the entity itself.

Remaining tasks

  1. Receive/incorporate feedback on the design approach.
  2. Add automated tests.

User interface changes

None.

API changes

None.

Data model changes

None.

✨ Feature request
Status

Needs review

Version

3.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States m.stenta

Live updates comments and jobs are added and updated live.
  • Needs release note

    The major change should have a special release note written to summarize the importance of the change. See Write a release note for an issue.

Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @m.stenta
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 8
    last update about 1 year ago
    16 pass
  • Status changed to Needs review about 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States m.stenta

    Opened a MR for review. It is a relatively simple addition.

  • πŸ‡ΊπŸ‡ΈUnited States m.stenta
  • πŸ‡ΊπŸ‡ΈUnited States m.stenta

    I started to add automated tests for this, but it turned out to be a bit difficult because of #2791041: Migrate source plugins should use dependency injection β†’ .

    Right now my proposed change uses \Drupal::service('uuid') instead of dependency injection because of that, and that makes testing more difficult. I need to learn the proper way to do that...

  • Status changed to Closed: won't fix about 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States m.stenta

    Update: we are probably not going to end up using this approach after all. It still accomplishes the goal that I described above, so it might be useful for someone else, but it turned out that we needed to do something even more complicated in our context which meant taking a slightly different approach (and extending `CSV` with our own source class).

    So I will close this as "won't fix" - but if anyone else finds it useful and wants to push it forward please feel free to reopen!

  • First commit to issue fork.
  • Pipeline finished with Success
    5 months ago
    Total: 140s
    #196590
  • Status changed to Needs review 5 months ago
  • πŸ‡¬πŸ‡§United Kingdom AndyF

    Thanks, I was looking for just that! I've made some updates:

    • Use create_record_uuid as the boolean and record_uuid_field as the field name for consistency with create_record_number and record_number_field.
    • Don't force the ID to be the UUID. (I figured it's a little more flexible this way.)
    • Add a section to the unit test for this.

    I've updated \Drupal\migrate_source_csv\Plugin\migrate\source\CSV::__construct() in a non-BC way. Per Drupal BC policy I think it's not considered public, and a number of people recommend extending create() instead of construct(), so I thought it might be ok to keep it simpler and not add a deprecation.

    Thanks!

  • Pipeline finished with Success
    5 months ago
    Total: 196s
    #196600
  • πŸ‡¬πŸ‡§United Kingdom AndyF

    We'll want to add something to the release notes about changing the constructor signature.

Production build 0.71.5 2024