Allow process plugins to flag a row to be skipped

Created on 4 November 2021, over 2 years ago
Updated 7 May 2024, about 2 months ago

Problem/Motivation

Currently a process plugin can declare a row to be skipped by throwing a MigrateSkipRowException. Using exceptions as signals in this manner is not ideal. The process plugins have access to the $row. They should simply be able to set a flag on the row and return rather than trying to pass an exception up through the executable.

Steps to reproduce

Proposed resolution

Add flags and methods on the row object to enable it to know and report that it should be skipped rather than passing an exception as a signal. To replicate the functionality of MigrateSkipRowException, add the following methods to the Row class:

  • public function skip(string $message = '', bool $save_to_map = TRUE): void -- Replicate the MigrateSkipRowException constructor. Calling $row->skip flags the row to be skipped with an optional message and flag denoting whether the skipped row should be saved to the migration map
  • public function getSkip(): bool -- getter to access skip flag
  • public function getSkipMessage(): string -- getter to access skip message
  • public function getSaveToMap(): bool -- getter to access save to map flag

Provide mechanisms inside of MigrateExecutable to check this flag anywhere it might catch a MigrateSkipRowException and treat it the same.

For this issue, modify the skip_on_empty plugin to use the new flag instead of throwing a MigrateSkipRowExcpetion to prove that it works.

Follow-up issues will remove the remaining usages of MigrateSkipRowException and deprecate the exception class.

Remaining tasks

  1. If πŸ› When sub_process encounters a row skip, it should skip its internal row, and not bubble up to the outer row RTBC is fixed first, then update this issue to be consistent.

User interface changes

API changes

Data model changes

Release notes snippet

πŸ“Œ Task
Status

Needs work

Version

11.0 πŸ”₯

Component
MigrationΒ  β†’

Last updated about 16 hours ago

Created by

πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • last update about 1 year ago
    29,401 pass, 4 fail
  • Status changed to Needs review about 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA
  • last update about 1 year ago
    29,438 pass
  • Status changed to Needs work about 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    @mikelutz:

    I had a first look at this issue, and it looks good to me. I suggested a few minor changes on the MR, so NW to address those.

    I need to spend some more time studying the MigrateExecutable class to understand how it all fits together and decide whether anything else needs to be updated. So I am not ready yet to sign off on this issue.

  • last update about 1 year ago
    29,438 pass
  • Issue was unassigned.
  • Status changed to Needs review about 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA
  • Status changed to Needs work 12 months ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Only moving to NW for CR.

    Thanks!

  • Assigned to benjifisher
  • Status changed to Needs review 12 months ago
  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA

    I know, but I really need Benji to do a fuller review so we can see if I need to do any other architecture changes that might affect the CR before I write it.

  • heddn Nicaragua

    We could update the IS to explain what we are planning to do.

  • Status changed to Needs work 9 months ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    For the IS update.

  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area
  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada
    grep -c '^-.*catch' 1379.diff
    0
    grep -c 'if.*>shouldSkip' 1379.diff
    5
    
  • Issue was unassigned.
  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA
  • Pipeline finished with Success
    5 months ago
    Total: 514s
    #76708
  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA
  • Status changed to Needs review 5 months ago
  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA
  • Pipeline finished with Failed
    5 months ago
    Total: 638s
    #76714
  • Status changed to Needs work 5 months ago
  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA

    Made some adjustments that broke some tests, and I'm still debating if they are worth keeping at all. Currently, when SourcePluginBase runs all the prepare_row hooks, if a hook return false, the remaining hooks still run, and the row is skipped, while if a hook throws a skip_row exception, the remaining hooks are skipped and we go straight to the catch block after the hook invocations. The original version of this patch left it so that calling $row->skip in a prepare row hook still let all the remaining hooks run before deciding to skip the row. While this difference in behavior from the exception is fine (We don't rely on the behavior in core, and it's okay for the replacement of a deprecated api to work slightly differently) for performance reasons, it seems like it would be nice to stop hook processing once a skip is flagged. The latest commits do that, but broke a few tests that will be a little tricky to rework. I think ultimately we should keep the efficient behavior and eventually deprecate the ability for prepare_row hooks to return FALSE as a signal to skip a row, requiring them to use $row->skip instead, but I don't have time to fix all the tests today.

  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada

    and eventually deprecate the ability for prepare_row hooks to return FALSE as a signal to skip a row

    This was one of the warts that remained from the Drupal 7 migrate system. It was not part of the design at all.

  • Pipeline finished with Failed
    5 months ago
    Total: 611s
    #76971
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area
  • Pipeline finished with Failed
    5 months ago
    Total: 510s
    #78290
  • Pipeline finished with Canceled
    5 months ago
    Total: 84s
    #78302
  • Pipeline finished with Failed
    5 months ago
    Total: 605s
    #78303
  • Pipeline finished with Failed
    5 months ago
    Total: 514s
    #78323
  • Pipeline finished with Failed
    5 months ago
    Total: 169s
    #79347
  • Pipeline finished with Success
    5 months ago
    Total: 501s
    #79348
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    @mikelutz:

    The MR needs a rebase after ✨ Allow process plugins to stop further processing on a pipeline RTBC . I also made some small comments on the MR. Please set the status of this issue back to NR when you think it is ready. (I am not sure whether you forgot to do that or if you meant to continue working on the MR before getting more feedback.)

    The refactoring is hard to review, but the result is worth it!

    I think the "Needs subsystem maintainer review" tag was for the general approach. I am removing it now. The general approach is good, and we are now fighting with the smaller details.

  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA

    It doesn't appear to need a rebase. That issue was merged on Jan 12, and I picked this back up and rebased on Jan 13. It's still NW because the refactor needs more tests, and I'm playing with adding the ability to set the row map status on skip, which would let us deprecate using MigrateException inside a process plugin to skip a row and mark it failed. Hoping to get to it this week.

  • Assigned to mikelutz
  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA

    Assigning to me for more tests and features, and requested changes.

  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada

    Using exceptions as signals in this manner is not ideal.

    I would argue it is exactly what exceptions are made for: to break out from the ordinary code flow. Indeed, SubProcess shows the need for manually propagate instead of the exception automatically doing so. Look at the number of checks for skipping before and after. What use cases are made possible by adding this complexity? What problems are being solved here?

    Also, I am not sure why is it called "getSkip". Let's look at events: the method is called isPropagationStopped so the analogue would be isProcessingSkipped.

  • πŸ‡ΊπŸ‡ΈUnited States mikelutz Michigan, USA

    I would argue it is exactly what exceptions are made for: to break out from the ordinary code flow.

    But we don't want to break out of the normal code flow here. Skipping rows and finalizing pipelines doesn't happen because of an exceptional condition, bad data, or an error. These are operations that are done by process functions in their normal workflow, and expected and supported behaviors. They are operations on an object, and we have the object available, so it should be able to know its status.

    SubProcess shows the need for manually propagate instead of the exception automatically doing so.

    I'm not sure precisely what you mean here. I'll assume that you are referring to the fact that the MR has to set the row to skip when it encounters a condition where it wants the row to skip. This is first off only being done to preserve buggy behavior that was partially exposed by this issue. πŸ› When sub_process encounters a row skip, it should skip its internal row, and not bubble up to the outer row RTBC is attempting to fix that bug. It's a prime example of the issues caused by using exceptions to signal things in the normal flow of execution. Some called function deep inside of subprocess throws a Skip Row exception, and subprocess didn't think to handle it, and it bubbles up and breaks the whole outer row. This makes it more difficult to code those kind of bugs.

    Look at the number of checks for skipping before and after. What use cases are made possible by adding this complexity?

    This code is in a transition state, but the ultimate goal here is to remove the usage of MigrateSkipRowException and MigrateException (the latter of which does the same thing as MigrateSkipRowException, but allows you to set an arbitrary row status while bypassing the row) in process plugins, and replace them with a single set of api calls on the row object itself. This makes logical sense to me, as the row object should know and report its status. As far as the number of skip checks, with the exception of sub_process, these are 1:1 with the existing try-catch blocks which are going away. and πŸ› When sub_process encounters a row skip, it should skip its internal row, and not bubble up to the outer row RTBC currently adds a try-catch block there too, which will be replaced by a skip check either here or there depending on which issue is committed first. The endgame code will be able to replace multiple catch blocks with a single API check, so we are reducing complexity.

    What problems are being solved here?

    See above comments on reducing complexity and preventing unintended bugs

    Also, I am not sure why is it called "getSkip". Let's look at events: the method is called isPropagationStopped so the analogue would be isProcessingSkipped.

    Alwasy happy to debate naming conventions, the PR is a WIP, but initially it as a simple getter and was named as such, but something like isRowSkipped would work too. I'm going to see what it all looks like once the migrateException stuff is included and see what makes the most sense.

  • Pipeline finished with Success
    4 months ago
    Total: 522s
    #97976
  • πŸ‡¬πŸ‡§United Kingdom joachim

    MigrateSkipRowException lets you define whether to save the skipped row to the map or not:

      public function __construct($message = '', $save_to_map = TRUE) {
    

    The new API should allow that distinction too.

    That's potentially out of scope of this issue and left to a follow-up, but it should happen before we deprecate MigrateSkipRowException as otherwise it's a regression.

  • Pipeline finished with Success
    about 2 months ago
    Total: 577s
    #166570
  • Pipeline finished with Success
    about 2 months ago
    Total: 681s
    #166606
Production build 0.69.0 2024