- Merge request !1379Issue #3247718: Allow process plugins to flag a row to be skipped β (Open) created by mikelutz
- last update
over 1 year ago 29,401 pass, 4 fail - Status changed to Needs review
over 1 year ago 5:16pm 7 June 2023 - last update
over 1 year ago 29,438 pass - Status changed to Needs work
over 1 year ago 7:26pm 8 June 2023 - πΊπΈUnited States benjifisher Boston area
@mikelutz:
I had a first look at this issue, and it looks good to me. I suggested a few minor changes on the MR, so NW to address those.
I need to spend some more time studying the MigrateExecutable class to understand how it all fits together and decide whether anything else needs to be updated. So I am not ready yet to sign off on this issue.
- last update
over 1 year ago 29,438 pass - Issue was unassigned.
- Status changed to Needs review
over 1 year ago 1:23pm 24 June 2023 - Status changed to Needs work
over 1 year ago 2:48pm 5 July 2023 - Assigned to benjifisher
- Status changed to Needs review
over 1 year ago 3:14pm 5 July 2023 - πΊπΈUnited States mikelutz Michigan, USA
I know, but I really need Benji to do a fuller review so we can see if I need to do any other architecture changes that might affect the CR before I write it.
- heddn Nicaragua
We could update the IS to explain what we are planning to do.
- Status changed to Needs work
about 1 year ago 6:02pm 13 October 2023 - π¨π¦Canada Charlie ChX Negyesi πCanada
grep -c '^-.*catch' 1379.diff 0 grep -c 'if.*>shouldSkip' 1379.diff 5
- Issue was unassigned.
- Status changed to Needs review
10 months ago 5:05pm 13 January 2024 - Status changed to Needs work
10 months ago 7:05pm 13 January 2024 - πΊπΈUnited States mikelutz Michigan, USA
Made some adjustments that broke some tests, and I'm still debating if they are worth keeping at all. Currently, when SourcePluginBase runs all the prepare_row hooks, if a hook return false, the remaining hooks still run, and the row is skipped, while if a hook throws a skip_row exception, the remaining hooks are skipped and we go straight to the catch block after the hook invocations. The original version of this patch left it so that calling $row->skip in a prepare row hook still let all the remaining hooks run before deciding to skip the row. While this difference in behavior from the exception is fine (We don't rely on the behavior in core, and it's okay for the replacement of a deprecated api to work slightly differently) for performance reasons, it seems like it would be nice to stop hook processing once a skip is flagged. The latest commits do that, but broke a few tests that will be a little tricky to rework. I think ultimately we should keep the efficient behavior and eventually deprecate the ability for prepare_row hooks to return FALSE as a signal to skip a row, requiring them to use $row->skip instead, but I don't have time to fix all the tests today.
- π¨π¦Canada Charlie ChX Negyesi πCanada
and eventually deprecate the ability for prepare_row hooks to return FALSE as a signal to skip a row
This was one of the warts that remained from the Drupal 7 migrate system. It was not part of the design at all.
- πΊπΈUnited States benjifisher Boston area
@mikelutz:
The MR needs a rebase after β¨ Allow process plugins to stop further processing on a pipeline RTBC . I also made some small comments on the MR. Please set the status of this issue back to NR when you think it is ready. (I am not sure whether you forgot to do that or if you meant to continue working on the MR before getting more feedback.)
The refactoring is hard to review, but the result is worth it!
I think the "Needs subsystem maintainer review" tag was for the general approach. I am removing it now. The general approach is good, and we are now fighting with the smaller details.
- πΊπΈUnited States mikelutz Michigan, USA
It doesn't appear to need a rebase. That issue was merged on Jan 12, and I picked this back up and rebased on Jan 13. It's still NW because the refactor needs more tests, and I'm playing with adding the ability to set the row map status on skip, which would let us deprecate using MigrateException inside a process plugin to skip a row and mark it failed. Hoping to get to it this week.
- Assigned to mikelutz
- πΊπΈUnited States mikelutz Michigan, USA
Assigning to me for more tests and features, and requested changes.
- π¨π¦Canada Charlie ChX Negyesi πCanada
Using exceptions as signals in this manner is not ideal.
I would argue it is exactly what exceptions are made for: to break out from the ordinary code flow. Indeed, SubProcess shows the need for manually propagate instead of the exception automatically doing so. Look at the number of checks for skipping before and after. What use cases are made possible by adding this complexity? What problems are being solved here?
Also, I am not sure why is it called "getSkip". Let's look at events: the method is called
isPropagationStopped
so the analogue would beisProcessingSkipped
. - πΊπΈUnited States mikelutz Michigan, USA
I would argue it is exactly what exceptions are made for: to break out from the ordinary code flow.
But we don't want to break out of the normal code flow here. Skipping rows and finalizing pipelines doesn't happen because of an exceptional condition, bad data, or an error. These are operations that are done by process functions in their normal workflow, and expected and supported behaviors. They are operations on an object, and we have the object available, so it should be able to know its status.
SubProcess shows the need for manually propagate instead of the exception automatically doing so.
I'm not sure precisely what you mean here. I'll assume that you are referring to the fact that the MR has to set the row to skip when it encounters a condition where it wants the row to skip. This is first off only being done to preserve buggy behavior that was partially exposed by this issue. π When sub_process encounters a row skip, it should skip its internal row, and not bubble up to the outer row RTBC is attempting to fix that bug. It's a prime example of the issues caused by using exceptions to signal things in the normal flow of execution. Some called function deep inside of subprocess throws a Skip Row exception, and subprocess didn't think to handle it, and it bubbles up and breaks the whole outer row. This makes it more difficult to code those kind of bugs.
Look at the number of checks for skipping before and after. What use cases are made possible by adding this complexity?
This code is in a transition state, but the ultimate goal here is to remove the usage of MigrateSkipRowException and MigrateException (the latter of which does the same thing as MigrateSkipRowException, but allows you to set an arbitrary row status while bypassing the row) in process plugins, and replace them with a single set of api calls on the row object itself. This makes logical sense to me, as the row object should know and report its status. As far as the number of skip checks, with the exception of sub_process, these are 1:1 with the existing try-catch blocks which are going away. and π When sub_process encounters a row skip, it should skip its internal row, and not bubble up to the outer row RTBC currently adds a try-catch block there too, which will be replaced by a skip check either here or there depending on which issue is committed first. The endgame code will be able to replace multiple catch blocks with a single API check, so we are reducing complexity.
What problems are being solved here?
See above comments on reducing complexity and preventing unintended bugs
Also, I am not sure why is it called "getSkip". Let's look at events: the method is called isPropagationStopped so the analogue would be isProcessingSkipped.
Alwasy happy to debate naming conventions, the PR is a WIP, but initially it as a simple getter and was named as such, but something like isRowSkipped would work too. I'm going to see what it all looks like once the migrateException stuff is included and see what makes the most sense.
- π¬π§United Kingdom joachim
MigrateSkipRowException lets you define whether to save the skipped row to the map or not:
public function __construct($message = '', $save_to_map = TRUE) {
The new API should allow that distinction too.
That's potentially out of scope of this issue and left to a follow-up, but it should happen before we deprecate MigrateSkipRowException as otherwise it's a regression.