2.x branch creating duplicate media entities in Drupal

Created on 16 March 2023, almost 2 years ago
Updated 10 April 2023, over 1 year ago

The 2.x branch for integration with Widen appears to be creating two media entities in Drupal in some circumstances: Two separate media entities in the same media type, both referencing the same file.

This might be related to the category name settings in the module configuration. Example: If the module is set to bulk-import a category named "test category" and in Widen there is a "test category" and a "test category images" category, both will be pulled into Drupal, instead of only assets where the category is an exact match. Possibly this is resulting in creating a duplicate media entity?

๐Ÿ› Bug report
Status

Needs review

Version

2.0

Component

Code

Created by

๐Ÿ‡บ๐Ÿ‡ธUnited States toddwoof

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @toddwoof
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States Chris Burge

    Chris Burge โ†’ made their first commit to this issueโ€™s fork.

  • @chris-burge opened merge request.
  • Status changed to Needs review almost 2 years ago
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States Chris Burge

    I opened an MR that also checks for queued assets. It's possible that an asset will be returned by multiple categories. Presently, that would result in the asset being added to the queue once for each category.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States Chris Burge

    The bug described in the issue summary is caused by the media_acquiadam module. There's no issue on Drupal.org as Acquia uses Jira internally.

    Summary is below:

    Currently, the getAssetsByCategory() method of the Drupal\media_acquiadam\Client class doesn't use strict search matching. As a result, the query will return all categories with the word(s) in their name. For example, a search for "Collateral Catalog" will match both "Collateral Catalog" and "Collateral Catalog PDF".

    That said, once that bug is fixed in the next release of media_acquiadam, we should still verify the behavior in this module if the same asset is in two categories.

  • Assigned to mglaman
  • Status changed to Needs work over 1 year ago
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States mglaman WI, USA

    The bug is in the queue worker which does zero validation to see if a queued item has already been imported by another process.

    This bug has been latent for some time, but only surfaced by possible double-results due to non-strict searching be parallel processes.

  • @mglaman opened merge request.
  • Issue was unassigned.
  • Status changed to Needs review over 1 year ago
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States mglaman WI, USA

    This updated code will ensure queued jobs do not perform duplicated imports. The acquiadam_asset_import_cron could insert many jobs that are not executed within one cron process. The queue worker is set to process for 120, which should be more than enough time. But something could interrupt the process and leave jobs behind pending to be worked and double queued.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States Chris Burge

    MR5 looks good. +1 for adding test coverage. MR4 failed to consider existing assets.

    MR5 indirectly addresses the issue addressed by MR4 (by checking if a queued asset already exists before importing it). I still think that DamImporter->import() should keep track of what it's queuing so as to avoid putting duplicates into the queue.

Production build 0.71.5 2024