Duplicated files on source get unprocessed

Created on 6 September 2022, about 2 years ago
Updated 13 May 2023, over 1 year ago

Problem/Motivation

There are duplicated files on the source of our migration, basically more than one node is referencing the same file. As the source plugin for D7 sites (media_entity_generator_d7) uses the fid as the sourceId, the duplicated files get stuck as unprocessed because in theory it has been processed already.

We were trying to use check_duplicate plugin or create a custom plugin to verify the duplicates and throw a MigrateSkipRowException(), but the duplicated items were not even hitting the process plugins. Hence the problem needs to be solved at the query level.

Steps to reproduce

- Find a migration that's importing nodes referencing the same file multiple times.
- Go through the yml generation process and create the media to file migration.
- Run the migration and confirm there are unprocessed items after the migration ran successfully.

Proposed resolution

As the data for the rows is being added at the initializeIterator() method and counting the rows based off that, I'd suggest to add the fid as array key and checking if it has been already added.

πŸ› Bug report
Status

Fixed

Version

2.0

Component

Code

Created by

πŸ‡¨πŸ‡¦Canada bbombachini London, ON

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024