Do not always update when used with track_changes: true

Created on 27 January 2023, about 2 years ago
Updated 11 July 2024, 9 months ago

Problem/Motivation

When used with track_changes: true in the migration, the already imported items will always be updated even they didn't change.

Steps to reproduce

- Create a migration with track_changes: true an a migrate_source_queue source
- Create items in the source queue
- Run the migration: drush migrate:import my_migration => the items are created
- Add again items in the source queue, with the same data
- Run the migration again: drush migrate:import my_migration => the items are updated but they shouldn't since they didn't change

Proposed resolution

I have no idea so far. track_changes: true seems to use the whole queue item data, which includes data which is unchanged, but also item_id and created which change each time. Those last 2 are used in the computation of the hash, so since they change, it's considered as a change even if data did not change.

I quicky tried to remove them from the data passed to migrate but I was stuck with the fact that migrate needs the item_id to remove it from the queue.

Remaining tasks

- Find out if it's possible
- Code it

User interface changes

N/A

API changes

N/A

Data model changes

N/A

Feature request
Status

Needs work

Version

1.0

Component

Code

Created by

🇫🇷France GuillaumeDuveau Toulouse

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @GuillaumeDuveau
  • First commit to issue fork.
  • Status changed to Needs review over 1 year ago
  • 🇧🇪Belgium dieterholvoet Brussels

    I changed the implementation so only keys that are listed in the fields option of the source plugin are used in the calculation of the hash. That excludes keys like item_id and changed, so it should fix your issue.

  • Pipeline finished with Skipped
    about 1 year ago
    #80867
  • Status changed to Fixed about 1 year ago
  • Automatically closed - issue fixed for 2 weeks with no activity.

  • 🇫🇷France GuillaumeDuveau Toulouse

    Hi, sorry for getting back to this so late. I've tried your new version a few months ago and retried now. I'm still getting issues however.

    It looks like the migrations are not properly processed:
    - The content is not created ;
    - The leases are created but not removed.

    I'm not sure why. Here's what has been working for me since I posted the issue:
    - migrate_source_queue:1.0.0
    - core patch attached... I found no other way.

  • Status changed to Needs work 9 months ago
  • 🇫🇷France GuillaumeDuveau Toulouse

    Hi Dieter,

    Scratch that, the non-released items issue seems to occur when I initially import content with the core patch, then import them again without the patch, or the contrary.

    Still, the real issue is that despite your patch, items added in the migration queue are still updated each time, even if they are unchanged.

    In Drupal\migrate\Row::rehash(), where I added my Drupal core patch, I dumped the source without and with your commit:

    Without:

    array(11) {
      ["data"]=>
      array(2) {
        ["url"]=>
        string(76) "REDACTED"
        ["name"]=>
        string(53) "REDACTED"
      }
      ["created"]=>
      string(10) "1720703551"
      ["item_id"]=>
      string(8) "21746920"
      ["url"]=>
      string(76) "REDACTED"
      ["plugin"]=>
      string(5) "queue"
      ["queue_name"]=>
      string(13) "dr_crm.medias"
      ["track_changes"]=>
      bool(true)
      ["fields"]=>
      array(2) {
        [0]=>
        string(3) "url"
        [1]=>
        string(4) "name"
      }
      ["keys"]=>
      array(1) {
        ["url"]=>
        array(1) {
          ["type"]=>
          string(6) "string"
        }
      }
      ["name"]=>
      string(53) "REDACTED"
      ["target_bundles"]=>
      array(0) {
      }
    }
    

    With:

    array(11) {
    ["data"]=>
    NULL
    ["created"]=>
    NULL
    ["item_id"]=>
    NULL
    ["url"]=>
    string(76) "REDACTED"
    ["plugin"]=>
    NULL
    ["queue_name"]=>
    NULL
    ["track_changes"]=>
    NULL
    ["fields"]=>
    NULL
    ["keys"]=>
    NULL
    ["name"]=>
    string(53) "REDACTED"
    ["target_bundles"]=>
    array(0) {
    }
    }

    In the 2 situations, with my patch it does not matter if there is item_id and created, since they are removed before hashing, so the hash will always stay the same. And the content is not updated each time.

    Without my patch and with your commit, the hash should also remain the same, but the content is still updated each time. I guess there's probably too much that you are removing from the row source, that might impact the process in another key area responsible for determining if the update is needed or not, and making the update even if the hash is unchanged. Maybe track_changes?

Production build 0.71.5 2024