Feeds "update existing" - how to avoid wiping out all existing data?

Created on 30 July 2020, over 4 years ago
Updated 26 April 2023, over 1 year ago

Problem/Motivation

Problem: When I run a csv feeds import set to "update existing content" it seems to empty out all previous field data on each node before importing, including fields that are not mapped in the feed nor present as columns in the spreadsheet. I'm guessing a workaround might be to pre-populate the source csv with all field values, even those that aren't changing? However this would be far more work and far riskier (more testing needed) than is practical for us. From what I've read, only one feed can 'own' a content type, so multiple feeds for each use case does not seem to be a solution.

Steps to reproduce

Example:

  1. Content type 'profiles' has 20+ fields
  2. Create a structure > feed, parser=csv, content type=profiles, processor set to "update existing content;" map title (unique), address, experience, and education
  3. Create a content > feed and import a csv containing 200 existing profiles, providing title and address fields. For this particular update, no other fields are changing so they are not included in the csv.
  4. Expected: only the address fields for those 200 nodes would be updated to match the csv file
  5. Actual: while the address fields are updated for those 200 nodes, each node's experience, education, and image are now set to null. Image wasn't even mapped in the feed!

Proposed resolution

An 'update' feeds option that would:

  • Only process rows/nodes found in the source
  • Only process columns/fields found in the source
  • NULL in source cells: This is tricky due to varieties of field types and user intent. It might be easiest to say that a NULL source value should replace the target field's value with NULL. The onus would be on the user to include accurate data (even if it's a NULL/empty value) for each node and column included in the source.
  • Nothing should happen to nodes and fields that are not included in the source.

Remaining tasks

User interface changes

API changes

Data model changes

πŸ’¬ Support request
Status

Active

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States en-cc-org

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • I dont think the "set a field value" tamper resolves this and it could be worth a module update -- Thinking back to #3, maybe this is different for non-CSV sources, but for a CSV source couldn't we compare the CSV columns to the mapped columns and not update the content value if the column was omitted?
    If the column is present, but blank, I think that is different and feeds should update the content with empty values as it does now.

    Maybe #3049832 that #7 mentioned could mitigate it in some instances, but I dont think feeds tamper's "set a field value" can set the field value to be whatever is currently stored in the content (just some arbitrary value you provide in the tamper ui). It also seems like a hack to something that should just work that way out of the box.

  • πŸ‡³πŸ‡±Netherlands megachriz

    Hm, ✨ Update target more granularly Active and this issue look now the same, with the only difference that it is reported here that an unmapped source was cleared. I think we should close this or the other one as a duplicate.

  • πŸ‡³πŸ‡±Netherlands megachriz

    Maybe it makes sense to close that other one as a duplicate.

    In there I commented the following:

    If I remember correctly, there are the following challenges in play:

    1. Feeds cannot always make the distinction between empty vs non-existent. This was stated in #1107522-147: Framework for expected behavior when importing empty/blank values + text field fix β†’ .
      We should test whether or not this is still true for the current version of Feeds or if this was the case for the D7 version of Feeds only.
    2. Mapping multiple times to the same target (a way to import multiple values for the same field).
      If the first mapped source is non-existent, will the target still be emptied when the second mapped source does exist?
    3. Expectations of the current behavior.
      Right now it is ensured that all mapped targets will get overwritten. So I think we should at least keep supporting that workflow, in order to not break existing import workflows that rely on that behavior.

    Other things to take in consideration: I'm about to open an issue of moving part of the implementation of clearTarget() to the target plugin. A patch has already been written.

    So if we're going to support this workflow, we should at least make it configurable. For the D7 version of Feeds, there was the module Feeds empty β†’ , that allowed to configure the behavior per target.
    I'm not sure if we should still make this configurable per target or if it should be a one on-off switch.

  • πŸ‡³πŸ‡±Netherlands megachriz
  • First commit to issue fork.
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    ericgsmith β†’ changed the visibility of the branch 3162496-feeds-update-existing to hidden.

  • Pipeline finished with Failed
    6 months ago
    Total: 1754s
    #177559
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    I have taken a look at this today (coming from ✨ Update target more granularly Active )

    Feeds cannot always make the distinction between empty vs non-existent. This was stated in #1107522-147: Framework for expected behavior when importing empty/blank values + text field fix.
    We should test whether or not this is still true for the current version of Feeds or if this was the case for the D7 version of Feeds only.

    I assumed my test for this would fail - as all I could see to check on the item was if it was NULL. I suspected that there would be a difference between NULL and empty columns. But for the CSV at least it is seems to parse ,, to an empty string - so we can for CSV at least tell the difference between an empty item and a missing item.

    Mapping multiple times to the same target (a way to import multiple values for the same field).
    If the first mapped source is non-existent, will the target still be emptied when the second mapped source does exist?

    I have not tested this - but I believe yes, if source 1 => field_1 is missing from the source then the target field_1 is not cleared. If source 2 => field_1 has a value then target is cleared.

    Expectations of the current behavior.
    Right now it is ensured that all mapped targets will get overwritten. So I think we should at least keep supporting that workflow, in order to not break existing import workflows that rely on that behavior.

    Agreed - in my work in progress branch I have added this to the advanced fieldset if update existing is checked.

    I also have an additional question around targets comprised of multiple sources.

    Any feeling for how these should be handled?

    An example I added to the test was mapping to both body:value and body:summary from a CSV field - I'd love to be able to update just the summary and keep the original value but its pretty unpredictable what the output would be here - partially missing / updating fields would likely mean we need further changes in the target itself.

  • Pipeline finished with Success
    6 months ago
    Total: 1699s
    #177624
  • Pipeline finished with Success
    6 months ago
    Total: 1345s
    #178693
  • Pipeline finished with Success
    6 months ago
    Total: 1456s
    #178983
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update 6 months ago
    726 pass
  • Status changed to Needs review 6 months ago
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    Ok - I have enough there for an early review.

    I have only tested and focused on this using CSV as a source - I imagine it could be more complicated with other sources depending on how they provide their data - specifically if they are provide NULL for empty values rather than missing values.

    This MR introduces a new option under the Advanced tab:

    The wording is not particularly friendly - but I wanted to distinguish between a field mapping having all source values provided and some source values provided.

    E.g - if one of body:text or body:summary is missing - is it clear that we expect the body field to be skipped, or that we only want the provided source updated (e.g. only update summary, leave text in place).

    Additional to the tests I have given this a go on a site with many fields mapped in a feed - and it works as expected. A few cool things this unlocks alongside skipping fields when they are completely missing:

    • updating alt text via feeds without having to have the file id in the source
    • updating body summary values without having to have the body value in the feed

    I think if the general approach can be reviewed - the follow ups would be testing with other parsers and then improving the UX for the field labels and descriptions.

  • Pipeline finished with Success
    6 months ago
    Total: 1198s
    #179706
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update 6 months ago
    726 pass
  • Pipeline finished with Success
    6 months ago
    #179738
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update 6 months ago
    726 pass
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    Found an issue with the partially incomplete source values with file / image fields. The target_id set when attempting to merge with current values later caused issues if the reference_by was not the fid.

    I changed the approach slightly to get the target plugin to return the current target values for this situation - although its starting to get a but messy.

  • Pipeline finished with Success
    6 months ago
    Total: 1313s
    #190356
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update 6 months ago
    726 pass
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    Have done some more testing with CSVs and its working pretty well.

    I have one remaining issue and that is with fields with that have tampers applied to them.

    When feeds tamper runs for a missing source it passes NULL to the tamper plugins.

    If it either receives NULL back from the tamper plugins on L126 or if we have the SkipOnEmpty plugin causing NULL to be set on L131 here then the item suddenly have a NULL source value an sets this to the field.

    This is more complicated to solve, as I have things like the default value plugin to provide values for things that are always missing, but also have had to use SkipOnEmpty as lot due to working around issues summarised in ✨ Improve handling of empty data Needs review

    I'm still keen for a review on this initial implementation so leaving as needs review, but will need to also think about how this affects feeds tamper.

  • Pipeline finished with Success
    6 months ago
    Total: 1421s
    #191126
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update 5 months ago
    435 pass, 118 fail
  • Pipeline finished with Success
    5 months ago
    Total: 1301s
    #197774
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update 5 months ago
    708 pass, 1 fail
  • Pipeline finished with Failed
    5 months ago
    Total: 1140s
    #200576
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update 5 months ago
    734 pass
  • Pipeline finished with Success
    5 months ago
    Total: 1343s
    #201478
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update 5 months ago
    737 pass
  • Pipeline finished with Success
    5 months ago
    Total: 1420s
    #204197
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update 5 months ago
    734 pass
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith
  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    I have given this another round of improvements - most conversation points are on the MR.

    I appreciate you were part way through a review @Megachriz but I hope what I have done has simplified it so there is less to review.

    Summary of recent changes

    I have also updated the issue summary with what is proposed.

  • Pipeline finished with Success
    5 months ago
    Total: 1209s
    #204243
  • πŸ‡³πŸ‡ΏNew Zealand RoSk0 Wellington

    Overall, the MR looks good to me.

    Postponing on ✨ Add has() method to ItemInterface Active based on https://git.drupalcode.org/project/feeds/-/merge_requests/169/diffs#note... .

  • πŸ‡³πŸ‡ΏNew Zealand RoSk0 Wellington

    Adding related.

  • πŸ‡³πŸ‡ΏNew Zealand ericgsmith

    Thanks @stefanvaduva for the review and picking that UI bug up - I have pushed a fix for this.

    @max-ar thanks for the comment. Interesting that the missing json field ends up as an empty array - that warrants further investigation and would be helpful when evaluating ✨ Add has() method to ItemInterface Active

  • Pipeline finished with Failed
    7 days ago
    Total: 294s
    #339127
Production build 0.71.5 2024