Images imported from external feed often map to the incorrect node entity when several feeds are imported during the same cron run

Created on 18 October 2022, over 2 years ago

Problem/Motivation

We have found an issue when several Feeds, being imported via the same cron run, will often import Images from Wordpress and associate them with the incorrect node once imported to Drupal.

Our application imports several RSS feeds from a WordPress instance so we can display teaser content from WordPress on the Drupal site. We have two Feed Types, several Feeds (so we can import from the translated RSS feeds), and everything maps to the same Content Type (with some out-of-the-box rules to categorize content with taxonomy). We are running no custom PHP to accomplish this -- everything is just feeds configuration. We are not utilizing the Media module for managing the images once imported.

The Bug

The bug is in the Drupal\feeds\Laminas\Extension\Mediarss\Entry class. It’s a Drupal service, so only a single instance of that class is instantiated per request (or drush run). It’s trying to do a form of static caching in the ::getMediaContent and ::getMediaThumbnail methods that’s incompatible with processing multiple feeds in the same request.

It goes something like this:

  • For the first Feed being parsed, Drupal\feeds\Feeds\Parser\SyndicationParser::parse() uses the Laminas package to parse the raw feed using $channel = Reader::importString($raw); (line 41)
  • $channel is now a collection of Drupal\feeds\Laminas\Extension\Mediarss\Entry objects
  • ::getMediaThumbnail does its thing, and since the Entry's internal cache ($this->data) is empty, all is well. The entryKey data aren’t unique to the feed. They’re just unique to the position of the entry in the parsed feed:
    • media_thumbnail_0
    • media_thumbnail_1
    • etc.
  • For every Feed that’s processed after the first, $this->data on the Entry is now populated, so the call to ::getMediaThumbnail() returns early with the first feed’s data since each of the entryKey values are the same from feed to feed.

Steps to reproduce

As mentioned, we import several feeds from WordPress, with this pattern:

  1. As an admin, go to /admin/content/feed
  2. Delete all items from the associated feeds
  3. Select Import in Background for all the feeds you just deleted the items for
  4. Go to /en-us/admin/config/system/cron
  5. Run cron
  6. Go to /en-us/admin/content
  7. Open up all the content that was just imported. Eventually you'll find nodes that have the incorrect images associated

Note: I know these are vague. I might be able to share specific URLs etc with maintainers privately.

Remaining tasks

Have the patch reviewed and determine if this is the correct approach. We might also need to extend this fix for the ::getMediaContent() method as well.

🐛 Bug report
Status

Needs review

Component

Feeds Import (feature)

Created by

🇺🇸United States kmonty San Francisco, CA

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇺🇸United States tyler-durden

    I'm pretty sure this is my problem as well, but I am importing CSV files which clearly have the wrong images applied to the nodes. All other fields are correct.

    I will even import a feed manually with one node to create, and it pulls in a previously uploaded image from an old import and clearly the wrong category (they are car models, so it's clear the images are the wrong model/category). The only way for me to have the first import work properly is to clear the cache, but after that most imports pull the wrong image again.

    Where do I find the css import code to see if the same keys are being used?

Production build 0.71.5 2024