The migrate Row class API is incomplete

Created on 19 April 2025, about 2 months ago

Problem/Motivation

There is currently no way to remove an empty destination property from a Row once it has been added. This limits the ability of migration event subscribers to alter the outcome of a migration update.

Steps to reproduce

Create a migration that will null out the contents of a field for a particular entity. Use the PRE_ROW_SAVE event to attempt to undo that, and see that the Row class does not provide a public method to accomplish this.

Proposed resolution

Add public methods to the Row class. Rather than add a method specific to remove an empty property, the MR adds functions that can check for and remove any property, whether it is empty or not, which I think is more useful.

Remaining tasks

None

User interface changes

None

Introduced terminology

None

API changes

New methods added to \Drupal\migrate\Row.

Data model changes

None

Release notes snippet

None

πŸ› Bug report
Status

Active

Version

11.1 πŸ”₯

Component

migration system

Created by

πŸ‡ΊπŸ‡ΈUnited States john.oltman

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @john.oltman
  • Pipeline finished with Canceled
    about 2 months ago
    Total: 70s
    #477387
  • πŸ‡ΊπŸ‡ΈUnited States john.oltman

    john.oltman β†’ changed the visibility of the branch 3520065-migrate-row-remove-empty to hidden.

  • Pipeline finished with Failed
    about 2 months ago
    Total: 677s
    #477386
  • Pipeline finished with Success
    about 2 months ago
    Total: 1145s
    #477391
  • Pipeline finished with Success
    about 2 months ago
    Total: 550s
    #477403
  • Pipeline finished with Success
    about 2 months ago
    Total: 759s
    #477409
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Small comments. Tagging for sub-maintainer sign off for API changes.

  • Pipeline finished with Success
    about 2 months ago
    Total: 1246s
    #477813
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    Thanks for suggesting this addition to the Migrate API. I think this is more of a feature request than a bug report, so I am updating the category.

    I am trying to understand the use case for this addition. Is it mostly for migrations that update existing entities? If a migration is creating a new entity, then I think it does not matter whether the Row object has a null property or is just missing that property.

    I also wonder whether we need all four of the new methods. Are hasDestinationOrEmptyProperty() and removeDestinationOrEmptyProperty() really useful? Again, what is the use case?

    For both of those questions, it would help to see a code snippet of how you are using (or plan to use) the new methods.

  • πŸ‡ΊπŸ‡ΈUnited States john.oltman

    Yes, typically an update migration. I have a client with a business rule that states that under certain circumstances, data in an important field should not be altered or removed by a migration import, even if it otherwise would be. This is impossible without patching the Row class. Simply put, the class has two protected variables that work in tandem, destination properties and empty properties. The destination has set, get and remove. The empty only has set and get. It needs remove, just like its sibling. Call it a feature request if you will, but it seems to be an oversight.

    Regarding `hasDestinationOrEmptyProperty` and `removeDestinationOrEmptyProperty`, those are convenience methods - strictly speaking they are not needed, but are very useful when trying to implement the "prevent alteration of field in any way" use case. We might as well add them while we are in there.

  • πŸ‡ΊπŸ‡ΈUnited States john.oltman
  • πŸ‡ΊπŸ‡ΈUnited States john.oltman

    Code snippet in an event subscriber:

    public function onPreRowSave(MigratePreRowSaveEvent $event): void {
      $row = $event->getRow();
      if ($field_name = $this->complexBusinessLogicThatDeterminesWhichFieldNotToAlter($row)) {
        if ($row->hasDestinationOrEmptyProperty($field_name)) {
          $row->removeDestinationOrEmptyProperty($field_name);
        }
      }
    }
    
    public static function getSubscribedEvents(): array {
      return [
        MigrateEvents::PRE_ROW_SAVE => ['onPreRowSave'],
      ];
    }
    
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    Thanks for the feedback.

    I think we should not implement the two convenience methods. There are several reasons:

    The has...() convenience method calls hasDestinationProperty() and (if that returns FALSE) hasEmptyDestinationProperty(). If the code next calls the remove... convenience method, as in your snippet, it calls the same has... method(s). That seems sloppy.

    The method names are long, and they do not adequately describe what the methods do. Anyone reading the code will have to look up what the method does. It is convenient to write, but not to read.

    Adding extra methods means we have to add and maintain test coverage for them.

    In simple cases with content entities, you can get the same effect by adding overwrite_properties to the configuration of the destination plugin. (See Drupal\migrate\Plugin\migrate\destination\EntityContentBase.) I think this covers the most common use cases. I do not want to add convenience methods to the API if they are only going to be used by occasional projects that have especially complex business logic. Those projects can add convenience functions (not methods on the Row class) themselves.

  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    On second thought, I do not think we need to add hasEmptyDestinationProperty().

    First reason: it is safe to call removeEmptyDestinationProperty() without first testing hasEmptyDestinationProperty().

    Second reason: if code does need to test, it can just use in_array($field_name, $row->getEmptyDestinationProperties()).

    Source and destination properties need the has... methods because they use the NestedArray methods, but $emptyDestinationProperties is just a flat list of strings.

  • Pipeline finished with Success
    21 days ago
    Total: 572s
    #499274
  • Pipeline finished with Success
    21 days ago
    Total: 455s
    #499279
  • πŸ‡ΊπŸ‡ΈUnited States john.oltman

    Thanks for your review, this is ready for another look.

    I think we should not implement the two convenience methods.

    Good points, I removed those two methods.

    This is one more line, but it is fewer characters.

    That is a nice optimization, I applied your suggestion to the MR.

    On second thought, I do not think we need to add hasEmptyDestinationProperty().

    I kept this method in for these reasons:

    * It causes no harm
    * It provides symmetry with the destination "has" method, and thus contributes to the ability to understand the Row class
    * It provides information hiding of the internal implementation, which is an accepted principal of good API design

    I can take it out if required, but would prefer to keep it if you are amenable on this point.

  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    Thanks for the updates.

    I think we can compromise and keep the has... method. After all, it is handy to have it in the tests.

    In addition to the unit test, I would like to add a functional test. Basically, it comes down to the question I asked in Comment #12: I would like an example in a test (as well as an answer here on the issue) to "What is the use case?"

    I think the functional test needs a test module with two migrations. Each migration can use the embedded_data source plugin. The first migration can populate two fields, and the second migration can set both fields to empty. (To be honest, I am not sure what it takes to make that happen. Maybe just foo: null in the embedded data and field_foo: foo in the process section?) Then the module should implement an event listener, like the snippet in Comment #15 and remove just one of the two empty destinations.

    Then the test can run the first migration to create an entity, then run the second migration to update the entity. After all that, assert that one of the two fields is empty and the other is not.

    If I got any of that wrong, then I mis-understood the answer in Comments #13, #15. In that case, it proves that we need the functional test.

  • πŸ‡ΊπŸ‡ΈUnited States john.oltman

    Thank you, I'd be happy to create the functional test you describe. And your understanding is perfect. I'll hop on this and get something posted within the next week.

  • Pipeline finished with Failed
    12 days ago
    Total: 189s
    #506071
  • Pipeline finished with Success
    12 days ago
    Total: 1171s
    #506072
  • Pipeline finished with Success
    12 days ago
    Total: 810s
    #506085
  • πŸ‡ΊπŸ‡ΈUnited States john.oltman

    This worked out nicely. I added the test as a Kernel test since a browser is not needed and kernel use fewer GitLab resources and run much faster than functional. I named the test class generically as RowTest in case we or future maintainers want to add more tests for other methods of the class. For now the existing test validates the destination methods with a focus on the new methods. Back to ready for review.

  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    Thanks for adding the test, and +1 for making it a Kernel test. I have just a few suggestions on the MR.

  • Pipeline finished with Success
    10 days ago
    Total: 912s
    #507713
  • πŸ‡ΊπŸ‡ΈUnited States john.oltman

    Good suggestions, I applied all of them. Feels lean and mean, see what you think.

  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    Looks great! Thanks for putting up with all my requests for tweaks.

    I sometimes have to remind myself that the T in RTBC stands for "tested".

    For testing purposes, I installed Drupal with the Umami demo profile (and sample content). I then hacked the log process plugin, hard-coding a field name:

      public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {
        // Hack for testing purposes.
        if (empty($row->getSourceProperty('field_preparation_time'))) {
          $row->removeEmptyDestinationProperty('field_preparation_time');
        }
        // ...
      }
    

    I created a test module with the following migration:

    id: empty_destination_test
    migration_tags:
      - Test
    label: 'Test removeEmptyDestinationProperty()'
    source:
      plugin: embedded_data
      ids:
        nid:
          type: integer
      data_rows:
        - nid: 9
          field_preparation_time: null
          field_number_of_servings: null
    process:
      field_preparation_time: field_preparation_time
      field_number_of_servings: field_number_of_servings
      nid:
        - plugin: log
          source: nid
    destination:
      plugin: entity:node
      default_bundle: recipe
    migration_dependencies: {}
    

    It calls log after the two numeric fields have been processed: the order matters!

    As expected, when I executed the migration (drush mim empty_destination_test), field_number_of_servings was emptied out and field_preparation_time was untouched:

    MariaDB [db]> SELECT entity_id, field_preparation_time_value
        -> FROM node__field_preparation_time
        -> WHERE entity_id = 9;
    +-----------+------------------------------+
    | entity_id | field_preparation_time_value |
    +-----------+------------------------------+
    |         9 |                           10 |
    +-----------+------------------------------+
    1 row in set (0.001 sec)
    
    MariaDB [db]> SELECT entity_id, field_number_of_servings_value
        -> FROM node__field_number_of_servings
        -> WHERE entity_id = 9;
    Empty set (0.001 sec)
    
  • πŸ‡ΊπŸ‡ΈUnited States john.oltman

    Nice! Thanks @benjifisher!

Production build 0.71.5 2024