No messages after the migration.

Created on 28 February 2019, over 5 years ago
Updated 1 July 2024, 5 months ago

Problem/Motivation

During the migration the some data can be migrated as failed. I know that one of the migrate plugin throws an exception with a detailed message. But after the migration is finished only summary is presented (X imported, Y failed). The messages are
empty. After debugging I found out that messages are cleared every time in Drupal\migrate\Plugin\migrate\source\SourcePluginBase::next() file core/modules/migrate/src/Plugin/migrate/source/SourcePluginBase.php:

        // Clear any previous messages for this row before potentially adding
        // new ones.
369: if (!empty($this->currentSourceIds)) {
370:   $this->idMap->delete($this->currentSourceIds, TRUE);
371: }

So, every time as the new row will be migrated the next() method will iterate over previous rows and it will clear their messages.

Proposed resolution

I think that the message should be cleared before the row is scheduled to import.

User interface changes

No visible changes.

API changes

None.

Data model changes

None.

πŸ› Bug report
Status

Needs work

Version

11.0 πŸ”₯

Component
MigrationΒ  β†’

Last updated 1 day ago

Created by

πŸ‡ΊπŸ‡¦Ukraine skoro Chernihiv

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡­πŸ‡ΊHungary mxr576 Hungary
  • πŸ‡­πŸ‡ΊHungary mxr576 Hungary
  • Status changed to Needs work over 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    This issue is being reviewed by the kind folks in Slack, #needs-review-queue-initiative. We are working to keep the size of Needs Review queue [2700+ issues] to around 400 (1 month or less), following Review a patch or merge request β†’ as a guide.

    Adding specific steps would help to see the issue.

    Hiding the files as the fix is in the MR (which needs updated too)

    Tagging for subsystem review to see if they have any thought about it too.

  • Attached a copy of the latest static patch for reference purposes, and it seems to be an improvement over the existing behaviour on 10.0.x, when used with the migmag_process module.

  • πŸ‡¨πŸ‡ΏCzech Republic martin_klima

    Thank you for the patch, but it does not work well if "Tracking changes" is enabled. There is a logic problem when the record was previously migrated and is stored in idMap.

    Before deleting a record in the migration messages table, it tests if the record has been changed ($this->rowChanged($row)). However, at this point, we cannot evaluate if the record has been changed because there is no 'original_hash' in the 'idMap'.

          // Here $this->rowChanged($row) is always TRUE for existing records.
          if (!empty($this->currentSourceIds) &&
              (!$row->getIdMap() || $row->needsUpdate() || $this->aboveHighWater($row) || $this->rowChanged($row))) {
            $this->idMap->delete($this->currentSourceIds, TRUE);
          }
    

    The original hash is stored a few lines later, at the end of the prepareRow() method, where the rehash() method is called. The current logic always evaluates an existing record in the migration map as changed, because there is no original hash. See \Drupal\migrate\Row::changed and \Drupal\migrate\Row::rehash.

    Thus, it deletes the migration messages for such a record, even if it is later skipped, because on the second test with the $this->rowChanged($row) method, the original hash is already present and the record is correctly evaluated as unchanged. As a result, the record is ignored as unchanged, but all previous migration messages for the record are deleted.

          // Original hash is stored in the idMap at the end of prepareRow($row).
    
          // Preparing the row gives source plugins the chance to skip.
          if ($this->prepareRow($row) === FALSE) {
            continue;
          }
    
          // Here $this->rowChanged($row) works correctly.
    
          // Check whether the row needs processing.
          // 1. This row has not been imported yet.
          // 2. Explicitly set to update.
          // 3. The row is newer than the current high-water mark.
          // 4. If no such property exists then try by checking the hash of the row.
          if (!$row->getIdMap() || $row->needsUpdate() || $this->aboveHighWater($row) || $this->rowChanged($row)) {
            $this->currentRow = $row->freezeSource();
          }
    

    We can solve the problem at least in 2 ways:
    1) Remove $this->rowChanged($row) from the 1st condition.
    2) Modify \Drupal\migrate\Row::changed so that it only evaluates the record as changed if there is the original hash in idMap.

      public function changed() {
        $original_hash = $this->idMap['original_hash'] ?? NULL;
        $current_hash = $this->idMap['hash'] ?? NULL;
        // Until rehash() we cannot say the row has changed,
        // because the new hash was not calculated yet.
        if (is_null($original_hash)) {
          return FALSE;
        }
        return $original_hash != $current_hash;
      }
    

    Both ways work, but I'm not sure which approach is better and safer.

    Steps to reproduce the problem:
    1) Use a migration with trackChanges=TRUE
    2) Migrate some records which generate some migration messages (e.g. drush mim migration_name --limit=10).
    3) Check migration messages in the DB table or via drush mmsg migration_name
    4) Repeat step 2
    5) Repeat step 3. You can see the previous messages are gone.

  • πŸ‡¨πŸ‡ΏCzech Republic martin_klima

    Attached a copy of the latest patch for reference purposes.

  • Re #26, this is currently untested "pseudo-code", but wouldn't the following work a bit better rather than not checking if the row has been changed and always keeping the previous messages?

    if (!empty($this->currentSourceIds)) {
      $clear_messages = (!$row->getIdMap() || $row->needsUpdate() || $this->aboveHighWater($row));
      if (!$clear_messages && $this->trackChanges) {
        if (empty($row->getIdMap()['original_hash'])) {
          // Attempt to calculate the appropriate hash before checking if the row has been changed 
          // since it might be empty.
          // The only disadvangtage with this approach is that if for whatever reason, a module alters the 
          // source ID in the "migrate_prepare_row" hook called by ::prepareRow(), then it might always 
          // trigger as changed. But that's an edge case and more of an issue with the source plugin.
          $row->rehash();
        }
        $delete_source_ids = $this->rowChanged($row);
      }
      if ($clear_messages) {
        $this->idMap->delete($this->currentSourceIds, TRUE);
      }
    }
    

    We could either use this approach or update Row::changed() so that it recalculates the hash if original_hash is NULL, mitigating this issue.

    Might be worth applying this approach and seeing if it addresses your use case. Not sure if rehashing can cause other issues.

  • πŸ‡¨πŸ‡ΏCzech Republic martin_klima

    Re #28
    My approach is not ideal because it prefers to keep messages in case they can be deleted (better to keep than delete).
    But your suggestion will not work either.

    The problem cannot be easily solved because the steps must be executed in a certain order, but this order cannot be defined due to constraints.

    1) Deleting messages cannot be before $row->changed() because we don't know whether the record will be processed.
    2) We cannot check $row->changed() before the end of prepareRow(), because source data is not stable yet, e.g. hook_migrate_prepare_row can modify the source data.
    3) We cannot delete messages after prepareRow(), because prepareRow (and hooks and child prepareRow methods) can save some messages.

    Where to delete messages if we know that:
    - Deleting messages cannot be before $row->changed()
    - Deleting messages cannot be after $row->prepareRow()
    - $row->changed() cannot be before row->prepareRow()

    This logic is unresolvable.

    The only idea I have is:

    - Backup messages for the current row
    - Delete messages for the current row
    - Test if the row will be processed
    - If the row will not be processed ($this->currentRow == NULL), restore messages back.

  • Status changed to Needs review 12 months ago
  • πŸ‡¨πŸ‡ΏCzech Republic martin_klima

    In the last commit, I reverted the 1st condition, so migration messages for a current row are deleted in the original way. But they are stored in the backup variable before deleting. If the row will not be processed (was migrated before and does not need update and is not above the highwater mark and was not changed), all deleted messages are saved back from the backup.

    In this way, we don't need to solve the logic problem with impossible checking if the row has changed before processing prepareRow(). For details, see #29.

    Tested with $trackChanges = TRUE;

  • last update 12 months ago
    30,330 pass, 2 fail
  • last update 12 months ago
    30,330 pass, 2 fail
  • πŸ‡¨πŸ‡ΏCzech Republic martin_klima

    Fixed undefined variable.

  • last update 12 months ago
    30,342 pass
  • Status changed to Needs work 11 months ago
  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA

    This needs to have the merge request updated to 11.x, or a new MR created against 11.x

    My only concern with the approach is that the prepare row method and hooks can also write messages to the table, and if they aren't explicitly skipping the row by returning false then any messages created by the row preparing will start accumulating. Maybe we should check whether any messages exist for the row before we restore instead of just checking if the row will be skipped due to track changes/highwater/needs update etc.

    I was also a little worried about performance, I ran some tests on a 40k row migration without the map table joined. The migration was completed, so no rows were actually processed. I inserted a message for each row and ran a few times with and without the patch. I was seeing ~15seconds or 6% additional runtime on average with the extra database trips per row. It's not insignificant, but it could be worse, it extrapolated to an extra 6ish minutes on a million row run that would have taken an hour and a half already, so maybe it's worth providing a way to opt out? I'd approve the MR without an opt-out, but I'll let others weigh in.

  • πŸ‡¬πŸ‡§United Kingdom joekers UK

    Thanks @martin_klima for the info and patch - I'm in the same situation as you with tracking changes.

    The patch is working well for me but the performance is a bit concerning, though it's more important that I have the messages kept as I run a set of migrations daily and need to know when there are issues.

  • First commit to issue fork.
Production build 0.71.5 2024