Improve handling of empty data

Created on 11 January 2023, about 2 years ago
Updated 17 April 2023, almost 2 years ago

Problem/Motivation

In many practical cases , especially coming from Feeds Tamper usage point of view, handling empty data (like string or array) wouldn't be different to handling NULL value. However strict data type checks in the plugins like

are making it hard to use this functionality .

Proposed resolution

Update mentioned , and possibly other , plugins to gracefully work with the empty or NULL value data.

Remaining tasks

  • Agree on proposed resolution
  • Patch
  • Review
  • Commit

User interface changes

None

API changes

None

Data model changes

None

Feature request
Status

Active

Version

1.0

Component

Code

Created by

🇳🇿New Zealand RoSk0 Wellington

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • First commit to issue fork.
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.5 + Environment: PHP 7.3 & MySQL 5.7
    last update almost 2 years ago
    416 pass, 5 fail
  • 🇳🇿New Zealand ericgsmith

    Just pushing a WIP - not sure I got all the tests.

    I think after working with feeds again after a long time - we should be doing this.

    I think skipping on empty should solve a lot of related issues.

    I suspect we might also want to add follow up issues to relax some of the strictness / casting in other plugins - but I think for now we should make sure we are being consistent and easier to use with empty data.

  • Status changed to Needs work almost 2 years ago
  • 🇳🇿New Zealand ericgsmith

    Setting to needs work to cleanup test failures and coding standards

  • 🇳🇿New Zealand ericgsmith

    I was way too eager changing everything - a bit to revert / cleanup in the branch

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.5 + Environment: PHP 7.3 & MySQL 5.7
    last update almost 2 years ago
    459 pass
  • 🇳🇿New Zealand ericgsmith

    Alright, reverted my over eager changes.

    We would need additional tests to confirm this behaviour + document its intentions somewhere.

    Are we in agreement @MegaChriz on this approach?

    It might be a better idea for me to fix tests & commit the related issues first given I am sure on those issues, then come back to this parent issue to review implementing in the plugins I've changed here for consistency reasons. I'll try make the Friday feeds meetup to discuss.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.0 & MySQL 5.7
    last update over 1 year ago
    Composer require failure
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.0 & MySQL 5.7
    last update over 1 year ago
    Composer require failure
  • Status changed to Needs review over 1 year ago
  • 🇳🇿New Zealand ericgsmith

    I've update the MR with a bunch of tests, and tweaks to avoid calls to empty in situations where null is sufficient and safer.

    If I get time, it could do with some tests around some of these plugins.

    I'm also going to open a seperate issue as I was not sure if we should be removing the exception thrown in the Hash plugin.

  • First commit to issue fork.
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.0 & MySQL 5.7
    last update 9 months ago
    PHPLint Failed
  • Pipeline finished with Failed
    9 months ago
    Total: 163s
    #169508
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.0 & MySQL 5.7
    last update 9 months ago
    555 pass
  • Pipeline finished with Success
    9 months ago
    Total: 865s
    #169519
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.0 & MySQL 5.7
    last update 9 months ago
    579 pass
  • 🇯🇵Japan ptmkenny

    Added some more tests for empty strings and cleaned up the comment on the null value tests.

  • Pipeline finished with Success
    9 months ago
    Total: 846s
    #169543
  • Status changed to RTBC 8 months ago
  • 🇧🇷Brazil PabloNicolas

    I was unable to finish executing feeds-import because some data came in as NULL, triggering the exception and interrupting the batch import. I took the MR diff and applied it as a composer patch, it worked perfectly for me!

    Thank you!

  • 🇬🇧United Kingdom robcarr Perthshire, Scotland

    This worked well. Thanks for the patch.

    Worth noting that a related issue was down to bogus whitespace at the end of the file, just after the final closing tag. The XML parser didn't seem to like that either.

  • 🇯🇵Japan ptmkenny

    Unfortunately this no longer applies to the latest dev, so setting to "Needs work" to fix the merge conflicts.

  • Pipeline finished with Canceled
    about 2 months ago
    #355405
  • Pipeline finished with Success
    about 2 months ago
    Total: 859s
    #355406
  • 🇯🇵Japan ptmkenny

    Ok, I've updated the MR and I think this is ready to go again.

  • Status changed to Needs work 25 days ago
  • 🇳🇱Netherlands megachriz

    This is well on its way! Good work!

    1. I've gone through all Tamper plugins that we have and I think that Aggregate and WordCount (relatively new plugins) should also do something about handling empty data.
    2. The Math plugin throws an exception on an empty string. Perhaps we need to return an empty string here too?
    3. For test coverage, I think it would be better if testing with null and an empty string is added to TamperPluginTestBase instead. This way, we'll see how each plugin reacts on that input and we can override the tests for plugins as necessary. I'm working on this one.
    4. Some plugins treat the integer 0 (like ArrayFilter) as empty and return it as is. Other plugins mark it as invalid input data. More of this below.
    5. The Keyword Filter plugin treats the integer 0 as empty. I think that this is wrong. You may want to filter out items whose value is zero. I wonder if Keyword Filter should abort early at all.
    6. FindReplace and FindReplaceRegex accept integers and floats as valid input, but FindReplaceMultiline doesn't. Maybe fixing that is out of scope for this issue.
    7. We may need additional tests for how plugins deal with 0 (as integer or float) and with boolean FALSE. But we could handle that as well in a follow-up, as handling that too makes the scope of this issue bigger.
  • 🇳🇱Netherlands megachriz

    I've made the following changes:

    • Tests testWithNullValue() and testWithEmptyString() are moved to the base class and tests for certain plugins are overridden as necessary.
    • Instead of assertEquals() for testing with null values or empty strings, assertSame() is used.
    • Empty value handling for the Aggregate plugin is added. It returns NULL in case of an empty array, unless the used function is 'count' or 'sum'.
    • Empty value handling for the WordCount plugin is added. NULL is returned in case of a null value. In case of an empty string, the number of words should be zero. The WordCount actually returned 1 in this case, so additional code is added to fix that.
    • The Math plugin now treats null and empty strings as zero. Test coverage is added for empty data.
    • The Keyword Filter doesn't abort in case of empty values, but it does cast the input data to a string.
  • 🇯🇵Japan ptmkenny

    Unfortunately the MR is broken against HEAD. Setting to "Needs work" so that it can be rebased (needs manual rebase; /rebase in GitLab was not sufficient).

  • 🇳🇱Netherlands megachriz

    The merge conflict should be resolved. Conflict was caused by additional test methods in EncodeTest.

  • 🇳🇱Netherlands megachriz

    Tests are passing again. 🙂

  • 🇯🇵Japan ptmkenny

    Thanks!

    I've been using this for about a year on a site which imports about 20 different entities using Tamper, about 10,000 pieces of content, and everything is working great. My code isn't really affected by the changes in #20, so I will not set to RTBC, but I can say that this has made importing a lot easier for me because there are lots of null values that are now handled correctly.

  • 🇳🇱Netherlands megachriz

    @ptmkenny
    Thanks for testing! What do you think of the changes that I made for Aggregate, WordCount, Math and KeywordFilter? For example, do you think it is okay that the Math plugin handles an empty value as if it were a zero? Because this ensures that the Math plugin produces a number. It does error in case the plugin is configured to divide by the source value, as in math dividing by zero is not allowed.

  • 🇯🇵Japan ptmkenny

    @megachriz Are there any cases where the user would not want the calculation to be performed if the value was NULL? For example, if the field is required, that might be a safe assumption. But if the field is not required, I can imagine that a developer might want to do a calculation that only does something if the field has a numeric value (if it is NULL, don't do the calculation).

    Zero is a problem because it could mean "no value" or it could mean "the value is 0"; I have been working on a somewhat related issue in the Field Encrypt module, which uses 0 as a placeholder value for integer fields, but it was been decided to use NULL instead as the placeholder because 0 could represent an important value. ( 📌 We only encrypt fields if they don't match the placeholder value Active )

    I haven't thought through all the implications of this, but my initial thought is that this could be a little dangerous.

Production build 0.71.5 2024