Feed items are constantly updated due to missing entity reference

Created on 6 December 2021, about 3 years ago
Updated 29 March 2023, almost 2 years ago

Problem/Motivation

In https://www.drupal.org/project/feeds/issues/2989789 logic was added that will reset the hash of a feed item if an entity reference was not found to force the item to update on the next run. The logic being that if feeds with dependencies are imported in an incorrect order and a referenced item is not present on the intial import, it will still get updated next time (when the reference should be there).

However, sometimes a missing reference field could be a valid situation and we don't want to reset the hash. This is the case for me, where I'm importing feeds in a strict order and some entity references are missing because that's the way the API operates. See Drupal's JSON API documentation -> Missing Resource Identifiers .

Because of this code my purge queue was constantly growing faster than it could be purged, see https://www.drupal.org/project/purge/issues/3132514 💬 your queue exceeded 100 000 items ! Purge shut down Needs review because a significant number of entities with missing reference fields were constantly being re-imported (due to having their hash reset each time).

Steps to reproduce

  1. Install drupal with standard profile so you have a page and article content type.
  2. Add an entity reference field called field_article to the page content type, that targets the article content type.
  3. Create an article feed type that uses a csv parser. Under processor settings check the option 'Update existing content items'. Save and add mappings.
  4. Using the data in tests/resources/content.csv as a guide, map source data to target fields.
    1. guid > feeds_item (guid) (unique=TRUE)
    2. title > title
  5. Create a page feed type that uses a csv parser. Under processor settings check the option 'Update existing content items'. Save and add mappings.
  6. Using the data in tests/resources/content-with-reference.csv as a guide, map source data to target fields.
    1. guid > feeds_item (guid) (unique=TRUE)
    2. title > title
    3. article > field_article (referenced by feeds_item guid)
  7. Run the page feed type with the content-with-reference.csv file. There should be two page nodes imported. Both should have an empty field_article entity reference.
  8. Change the title of 'Eodem modo typi' to Page 1 and save.
  9. Change the title of 'Aliquam feugiat diam' to Page 2 and save.
  10. Re-import the page feed with the same file.
  11. The titles should have reverted to 'Eodem modo typi' and 'Aliquam feugiat diam' because the hash was reset.

Proposed resolution

I think if the behaviour in issue 2989789 is desired, then we should make a new option that is opt-in to change this behaviour, preserving existing functionality.

Remaining tasks

Create a patch to change the behaviour.

User interface changes

A new config field (probably boolean).

API changes

N/A

Data model changes

A new configuration setting on the entity reference target plugin.

Manual Testing

To manually test, simply apply the patch and reference Steps to Reproduce. The only different step is when you map the entity reference field in the page feed type, check the new 'Do not reset hash when an entity reference was not found' option. With that change applied the second time you import the feed, the page titles will not revert.

🐛 Bug report
Status

Active

Version

3.0

Component

Code

Created by

achap 🇦🇺

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • achap 🇦🇺

    I'm back on the project I had this issue on again (yearly festival site) and have solved my own problem using the Feeds parse event to filter out missing resources. See: https://www.drupal.org/docs/contributed-modules/feeds/feeds-howtos/altering-feed-data

    Regarding #26 happy to re-open the issue but I still think my solution isn't the best one because it's either on or off. So it would solve your issue but it would stop entity references that get imported after the main feed from ever getting referenced. So I think we can do better. With that said, it is opt-in...

    Would be interested to know how the Migrate API approaches this.

  • Re-created patch against latest 3.0-rc2

  • 🇳🇱Netherlands megachriz

    Would it be better to reverse the option? Instead of it saying "Do not reset hash", let it say "Reset hash". I'm not sure, but I think it's better UX that enabled options imply to "do" something instead of "do not" something. Opinions?

    Also, since there is a patch, the status "Needs review" would apply here. Though it would be better to create a MR from it, because the testbot on drupal.org no longer evaluates patches.

    It would be nice indeed to check how Migrate approaches this problem.

  • 🇳🇱Netherlands megachriz

    Updated the remaining tasks.

Production build 0.71.5 2024