Allow explode process plugin to use any delimiter allowed by explode method

Created on 4 January 2024, 12 months ago
Updated 20 June 2024, 7 months ago

Problem/Motivation

There are at least two delimiter values [(string) 0 and (int) 0] that are legal delimiters for the php explode method but are not allowed by the Drupal explode process plugin. Any delimiter allowed by the explode method should be allowed by the explode plugin. Likewise, we should disallow any value that the explode method does not allow.

Steps to reproduce

Configure a migration with the following:

  dest_id:
    -
      source: source_id
      plugin: explode
      delimiter: '0'

We get a migration exception claiming `delimiter is empty`, even though (string) 0 is a totally valid (if unusual) delimiter.

Proposed resolution

Validate the delimiter in the plugin constructor by calling php's explode method on a test string.

or

Add custom logic that more-or-less duplicates the internal workings of php's explode to validate the delimiter without calling it directly. I think the test is basically, "Is the value a scalar that is cast to a non-empty string?".

Remaining tasks

Provide a patch.
Add test coverage.

User interface changes

n/a

API changes

The explode migration process plugin would allow delimiters that would fail the empty() check but were not null or empty.

Data model changes

n/a

Release notes snippet

TBD

πŸ› Bug report
Status

Needs work

Version

11.0 πŸ”₯

Component
MigrationΒ  β†’

Last updated about 11 hours ago

Created by

πŸ‡ΊπŸ‡ΈUnited States DamienMcKenna NH, USA

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @DamienMcKenna
  • Status changed to Needs review 12 months ago
  • πŸ‡ΊπŸ‡ΈUnited States DamienMcKenna NH, USA

    For consideration.

  • Status changed to Needs work 12 months ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Is something that will need test coverage.

  • πŸ‡ΊπŸ‡ΈUnited States danflanagan8 St. Louis, US

    @DamienMcKenna, you can use a double-quoted "\n" and this should work fine as is.

    plugin: explode
    source: value
    delimiter: "\n"
    
  • πŸ‡ΊπŸ‡ΈUnited States danflanagan8 St. Louis, US

    It is kind of funky though that a string zero cannot be used as the delimiter currently!

    plugin: explode
    source: value
    delimiter: '0'
    

    fails with "delimiter is empty".

    That should clearly be supported.

  • πŸ‡ΊπŸ‡ΈUnited States DamienMcKenna NH, USA

    As it happens the original time I ran into this I had mistakenly spelled the word "delimeter". However, while reviewing the code I noticed that it used empty(), which like you said means that you can't use a zero as the delimiter.

  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA

    This is a bug, The explode plugin should accept any value that PHP's 'explode' function will accept, which is any string not equal to the empty string ("")

  • Pipeline finished with Failed
    7 months ago
    Total: 165s
    #195894
  • Status changed to Needs review 7 months ago
  • πŸ‡ΊπŸ‡ΈUnited States danflanagan8 St. Louis, US

    I opened up an MR with some tests and the fix from @DamienMcKenna in #2.

    The test only job went as planned (https://git.drupalcode.org/issue/drupal-3412309/-/jobs/1828991) but I had a boneheaded cs violation for which I had to push up a new commit. So the normal pipeline is still running on that.

    As kind of a self-review, this reminded me of and old issue about making the exceptions more consistent from process plugins. See #3247950: Throw consistent exceptions on invalid process plugin configurations β†’ . With that mindset, I'd rather move the validation of the delimiter to the constructor. And once it's in the constructor, I'd rather just put a test call to the php explode function within a try/catch. Something like

    try {
      explode($configuration['delimiter'], 'dummy text')
    catch (Throwable $t) {
       // throw whatever new exception we want
    }
    

    Then we're truly honoring whatever explode allows as the delimiter. That feels a bit out of scope though. 'Course we could always change the scope if that's appealing! That would also protect against someone using an array as the delimiter, which probably just results in an uncaught exception currently.

  • Pipeline finished with Success
    7 months ago
    Total: 537s
    #195909
  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA

    Move it to the constructor, since this is an error on the configuration and not the value, keep the MigrateException for now. I'm tempted to build something new with annotations and validators, but no time for that now..

    The correct check is `if (!is_string($x) || $x==='') throw.. `

  • Status changed to Needs work 7 months ago
  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA
  • First commit to issue fork.
  • Pipeline finished with Failed
    7 months ago
    Total: 665s
    #196320
  • Assigned to danflanagan8
  • πŸ‡ΊπŸ‡ΈUnited States danflanagan8 St. Louis, US

    @prabha1997, I'm clearly actively working on the issue, so it would be more productive for you to help move a different issue forward. I'll assign it to myself though so no one has to read between the lines. Further, you didn't even the change that @mikelutz suggested.

    @mikelutz, I'm concerned that your suggested check prohibits the use of integers as the delimiter, which is neither prohibited by the explode method nor by the current explode plugin.

    What would you say about this as the constructor?

    /**
       * {@inheritdoc}
       */
      public function __construct(array $configuration, $plugin_id, $plugin_definition) {
        if (!isset($configuration['delimiter'])) {
          throw new MigrateException('delimiter is empty');
        }
        try {
          // Validate that the delimiter is legal by exploding a test string.
          explode($configuration['delimiter'], '');
        }
        catch (\Throwable $t) {
          throw new MigrateException('delimiter is invalid');
        }
        parent::__construct($configuration, $plugin_id, $plugin_definition);
      }
    
  • Status changed to Needs review 7 months ago
  • πŸ‡ΊπŸ‡ΈUnited States danflanagan8 St. Louis, US
  • Pipeline finished with Success
    7 months ago
    Total: 492s
    #196454
  • πŸ‡ΊπŸ‡ΈUnited States danflanagan8 St. Louis, US
  • Pipeline finished with Success
    7 months ago
    Total: 703s
    #196468
  • Status changed to Needs work 7 months ago
  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA

    I'd prefer not to use a try-catch here, it's certainly not needed, as we know precisely what explode will accept, a string that is not the empty string. Ultimately I would like to require the configuration item to be a string. It's easy enough to force a string in yml by quoting it. So lets set ourselves up to be strict. Go with this

    if (!is_string($configuration['delimiter'])) {
      trigger_error('Using a non-string as a delimiter in the explode plugin is deprecated in 11.1.0 and will throw a MigrateException in 12.0.0. Ensure your value is quoted in your migration yaml if necessary', E_USER_DEPRECATED);
      if (is_array($configuration['delimiter'])) {
        $configuration['delimiter'] = '';
      }
      $configuration['delimiter'] = (string) $configuration['delimiter'];
    }
    if ($configuration['delimiter'] === '') {
      throw new \Drupal\migrate\MigrateException('Delimiter muse be a non-empty string value.....')
    }
    parent::__construct($configuration, $plugin_id, $plugin_definition);
  • πŸ‡ΊπŸ‡ΈUnited States danflanagan8 St. Louis, US

    Thanks, @mikelutz. I really like the suggestion to require a string and add the deprecation. I suppose the deprecation needs a test.

    If I can (continue to) be pedantic, though, the explode method indeed accepts non-strings as long as they can be cast to non-empty strings. That's why I liked the try/catch thing (i.e. Whatever Works). However, I like requiring a string as the delimiter a lot better than that (where "that" could refer to the try/catch or the linked movie).

  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA

    See https://onlinephp.io/c/faedc

    Try catch is like 10x slower than doing most any kind of type checks or conversions. Because explode ends up having to do the same conversions anyway PLUS run the unneeded explode call, and then check the error handler and find the catch block. It's always going to be more efficient to just do our own type checks.

    Admittedly removing the type check to the constructor negates the real world impact of this as we are no longer checking the configuration on every row, but a microsecond is a microsecond..

  • Pipeline finished with Failed
    7 months ago
    Total: 167s
    #197568
  • Status changed to Needs review 7 months ago
  • πŸ‡ΊπŸ‡ΈUnited States danflanagan8 St. Louis, US

    I did the deprecation thing, but I'm not so sure about the tests anymore. I'm just not sure how this fits into the "@group legacy" system. It's not that we want to necessarily remove any of these test cases in 12.x; instead we want to update what we expect.

  • Pipeline finished with Success
    7 months ago
    Total: 504s
    #197575
  • Status changed to Needs work 7 months ago
  • The Needs Review Queue Bot β†’ tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

    This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide β†’ to find step-by-step guides for working with issues.

Production build 0.71.5 2024