- Merge request !2957Issue #3036368: No messages after the migration. β (Open) created by GuyPaddock
- Status changed to Needs work
over 1 year ago 4:27pm 2 March 2023 - πΊπΈUnited States smustgrave
This issue is being reviewed by the kind folks in Slack, #needs-review-queue-initiative. We are working to keep the size of Needs Review queue [2700+ issues] to around 400 (1 month or less), following Review a patch or merge request β as a guide.
Adding specific steps would help to see the issue.
Hiding the files as the fix is in the MR (which needs updated too)
Tagging for subsystem review to see if they have any thought about it too.
Attached a copy of the latest static patch for reference purposes, and it seems to be an improvement over the existing behaviour on 10.0.x, when used with the
migmag_process
module.- π¨πΏCzech Republic martin_klima
Thank you for the patch, but it does not work well if "Tracking changes" is enabled. There is a logic problem when the record was previously migrated and is stored in idMap.
Before deleting a record in the migration messages table, it tests if the record has been changed ($this->rowChanged($row)). However, at this point, we cannot evaluate if the record has been changed because there is no 'original_hash' in the 'idMap'.
// Here $this->rowChanged($row) is always TRUE for existing records. if (!empty($this->currentSourceIds) && (!$row->getIdMap() || $row->needsUpdate() || $this->aboveHighWater($row) || $this->rowChanged($row))) { $this->idMap->delete($this->currentSourceIds, TRUE); }
The original hash is stored a few lines later, at the end of the prepareRow() method, where the rehash() method is called. The current logic always evaluates an existing record in the migration map as changed, because there is no original hash. See \Drupal\migrate\Row::changed and \Drupal\migrate\Row::rehash.
Thus, it deletes the migration messages for such a record, even if it is later skipped, because on the second test with the
$this->rowChanged($row)
method, the original hash is already present and the record is correctly evaluated as unchanged. As a result, the record is ignored as unchanged, but all previous migration messages for the record are deleted.// Original hash is stored in the idMap at the end of prepareRow($row). // Preparing the row gives source plugins the chance to skip. if ($this->prepareRow($row) === FALSE) { continue; } // Here $this->rowChanged($row) works correctly. // Check whether the row needs processing. // 1. This row has not been imported yet. // 2. Explicitly set to update. // 3. The row is newer than the current high-water mark. // 4. If no such property exists then try by checking the hash of the row. if (!$row->getIdMap() || $row->needsUpdate() || $this->aboveHighWater($row) || $this->rowChanged($row)) { $this->currentRow = $row->freezeSource(); }
We can solve the problem at least in 2 ways:
1) Remove$this->rowChanged($row)
from the 1st condition.
2) Modify \Drupal\migrate\Row::changed so that it only evaluates the record as changed if there is the original hash in idMap.public function changed() { $original_hash = $this->idMap['original_hash'] ?? NULL; $current_hash = $this->idMap['hash'] ?? NULL; // Until rehash() we cannot say the row has changed, // because the new hash was not calculated yet. if (is_null($original_hash)) { return FALSE; } return $original_hash != $current_hash; }
Both ways work, but I'm not sure which approach is better and safer.
Steps to reproduce the problem:
1) Use a migration with trackChanges=TRUE
2) Migrate some records which generate some migration messages (e.g.drush mim migration_name --limit=10
).
3) Check migration messages in the DB table or viadrush mmsg migration_name
4) Repeat step 2
5) Repeat step 3. You can see the previous messages are gone. - π¨πΏCzech Republic martin_klima
Attached a copy of the latest patch for reference purposes.
Re #26, this is currently untested "pseudo-code", but wouldn't the following work a bit better rather than not checking if the row has been changed and always keeping the previous messages?
if (!empty($this->currentSourceIds)) { $clear_messages = (!$row->getIdMap() || $row->needsUpdate() || $this->aboveHighWater($row)); if (!$clear_messages && $this->trackChanges) { if (empty($row->getIdMap()['original_hash'])) { // Attempt to calculate the appropriate hash before checking if the row has been changed // since it might be empty. // The only disadvangtage with this approach is that if for whatever reason, a module alters the // source ID in the "migrate_prepare_row" hook called by ::prepareRow(), then it might always // trigger as changed. But that's an edge case and more of an issue with the source plugin. $row->rehash(); } $delete_source_ids = $this->rowChanged($row); } if ($clear_messages) { $this->idMap->delete($this->currentSourceIds, TRUE); } }
We could either use this approach or update
Row::changed()
so that it recalculates the hash iforiginal_hash
is NULL, mitigating this issue.Might be worth applying this approach and seeing if it addresses your use case. Not sure if rehashing can cause other issues.
- π¨πΏCzech Republic martin_klima
Re #28
My approach is not ideal because it prefers to keep messages in case they can be deleted (better to keep than delete).
But your suggestion will not work either.The problem cannot be easily solved because the steps must be executed in a certain order, but this order cannot be defined due to constraints.
1) Deleting messages cannot be before $row->changed() because we don't know whether the record will be processed.
2) We cannot check $row->changed() before the end of prepareRow(), because source data is not stable yet, e.g. hook_migrate_prepare_row can modify the source data.
3) We cannot delete messages after prepareRow(), because prepareRow (and hooks and child prepareRow methods) can save some messages.Where to delete messages if we know that:
- Deleting messages cannot be before $row->changed()
- Deleting messages cannot be after $row->prepareRow()
- $row->changed() cannot be before row->prepareRow()This logic is unresolvable.
The only idea I have is:
- Backup messages for the current row
- Delete messages for the current row
- Test if the row will be processed
- If the row will not be processed ($this->currentRow == NULL), restore messages back. - Status changed to Needs review
12 months ago 6:29pm 21 November 2023 - π¨πΏCzech Republic martin_klima
In the last commit, I reverted the 1st condition, so migration messages for a current row are deleted in the original way. But they are stored in the backup variable before deleting. If the row will not be processed (was migrated before and does not need update and is not above the highwater mark and was not changed), all deleted messages are saved back from the backup.
In this way, we don't need to solve the logic problem with impossible checking if the row has changed before processing prepareRow(). For details, see #29.
Tested with $trackChanges = TRUE;
- last update
12 months ago 30,330 pass, 2 fail - last update
12 months ago 30,330 pass, 2 fail - last update
12 months ago 30,342 pass - Status changed to Needs work
11 months ago 5:43pm 23 December 2023 - πΊπΈUnited States mikelutz Michigan, USA
This needs to have the merge request updated to 11.x, or a new MR created against 11.x
My only concern with the approach is that the prepare row method and hooks can also write messages to the table, and if they aren't explicitly skipping the row by returning false then any messages created by the row preparing will start accumulating. Maybe we should check whether any messages exist for the row before we restore instead of just checking if the row will be skipped due to track changes/highwater/needs update etc.
I was also a little worried about performance, I ran some tests on a 40k row migration without the map table joined. The migration was completed, so no rows were actually processed. I inserted a message for each row and ran a few times with and without the patch. I was seeing ~15seconds or 6% additional runtime on average with the extra database trips per row. It's not insignificant, but it could be worse, it extrapolated to an extra 6ish minutes on a million row run that would have taken an hour and a half already, so maybe it's worth providing a way to opt out? I'd approve the MR without an opt-out, but I'll let others weigh in.
- π¬π§United Kingdom joekers UK
Thanks @martin_klima for the info and patch - I'm in the same situation as you with tracking changes.
The patch is working well for me but the performance is a bit concerning, though it's more important that I have the messages kept as I run a set of migrations daily and need to know when there are issues.
- First commit to issue fork.