Successfully tracks that Media entity already exists, but Files duplicated

Created on 25 June 2024, 11 months ago

Problem/Motivation

When I run Media library import
then I see Media entity already exists. Not creating a new queue item for "file.jpg"
and only new images are added as Media (77 in my case)
but all the existing files are duplicated in Files (all 6716 in my case)

Successive media imports add the files again (my counter stopped at 10000)

Steps to reproduce

  • At /admin/config/media/media-library-importer, disable "Import files to location set in media"
  • set image files path in import from folder
  • Select media type to import: image
  • Run drush mli
  • view /admin/content/files and observe 6716 news files with 0 use on site

Proposed resolution

Don't add files to Files unless the files are associated with new Media

---

Is this a bug? Is there a way to prevent files from being added to the files list (files_managed) that I'm just not aware of?

🐛 Bug report
Status

Active

Version

2.0

Component

Miscellaneous

Created by

🇺🇸United States nateb Pittsburgh, Pennsylvania, USA

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @nateb
  • 🇺🇸United States nateb Pittsburgh, Pennsylvania, USA
  • 🇺🇸United States nateb Pittsburgh, Pennsylvania, USA
  • 🇺🇸United States nateb Pittsburgh, Pennsylvania, USA
  • 🇺🇸United States nateb Pittsburgh, Pennsylvania, USA

    Issue persists in 2.1.1

  • 🇮🇹Italy itamair

    I tried to replicate the issue / use case you report and I cannot reproduce, with 2.1.1 release.
    Files listed in the (Drupal 10 default) Files View (/admin/content/files) are not duplicated, in case the Media Image (using that File) is always in the website.
    Files get duplicated only when Media Images are removed / cancelled before a new import ...
    and that is a Files View issue, that doesn't properly remove Files previously imported and then un-used ...

  • Status changed to Closed: cannot reproduce 7 days ago
  • 🇺🇸United States nateb Pittsburgh, Pennsylvania, USA

    This persists and is reproduced in 2.1.3 testing in Drupal 11.2 dev. It seems like this issue is related to the setting Import files to location set in media.

    If Import files to location set in media = true
    when I run Media Import multiple times
    then files are moved to the image media type directory
    no duplicates are created for files.

    I can repeatedly run the Media Library importer and everything works as expected. 1 record in Files, 1 Media entity. No duplicates on successive imports.

    However...

    If Import files to location set in media = false
    when I run Media Import multiple times
    then file ID duplicates are created each time the importer runs
    and each instance of the duplicate gets its own ID
    and each instance is pointing to the same source file.

    I don't think this is an issue with the Files View itself. The view is just showing what is in files_managed. Each and every time the Importer runs, a new record is added to files_managed. Media isn't modified in this example. The hard part to work around is that all the duplicate managed records point to a single file, so deleting any record results in deleting the source file, and thus also breaking the media entity using that file.

    In both cases, Media never gets duplicated. "You can run import as many times as you like. Once created, media entities will not be duplicated." is true. Unfortunately, if keeping existing files in place, Files lists and views around the site gets quickly overwhelming with incorrect records - especially if this is tied in with a cron job.

  • 🇺🇸United States nateb Pittsburgh, Pennsylvania, USA
Production build 0.71.5 2024