Process plugin: snippet to re-use YAML config

Created on 30 March 2020, over 4 years ago
Updated 9 September 2024, 2 months ago

Problem/Motivation

We often need to copy/paste YAML config across several migrations.

If we want to apply a specific suite of process plugins to all fulltext fields (e.g: dom process plugins), all you can do ATM is copy/paste the suite of yaml config everywhere.

Same if you have a specific process plugin with specific/complex config.

If different migrations use the same pipeline for the same source and destination fields, then we can use shared configuration provided by the migrate_tools module. (That option did not exist when this issue was created.) But we still need a way to handle different fields.

Proposed resolution

Add the snippet plugin, which reads a process pipeline from a YAML file in the directory mymodule/migrations/process/.

For example, if the file mymodule/migrations/process/clean_whitespace.yml contains this code

- plugin: callback
  callable: htmlentities
- plugin: str_replace
  search:
    - ' '
    - ' '
  replace: ' '
- plugin: str_replace
  regex: true
  search: '@\s+@'
  replace: ' '
- plugin: callback
  callable: trim

then we can use the snippet plugin in mymodule/migrations/test.yml like this:

id: test
label: Test the snippet process plugin.
source:
  plugin: embedded_data
  data_rows:
    - id: 1
      string: 'Taxonomy term  1'
    - id: 2
      string: ' Taxonomy term 2 '
  ids:
    id:
      type: integer
destination:
  plugin: entity:taxonomy_term
  default_bundle: tags
process:
  name:
    - plugin: snippet
      module: mymodule
      path: clean_whitespace
      source: string

Original proposal

Define a "pipeline" process plugin that defines a suite of process plugins to run.

Given the following example in a migration:

  process:
  'body/value':
    - plugin: pipeline
      id: my_process_pipeline
      source: 'body/0/value'

We would create the file migrate_plus.migration_process_pipeline.my_process_pipeline.yml with config:

langcode: en
status: true
id: my_process_pipeline
label: 'My process pipeline'
description: 'Process pipeline for processing fulltext fields'
process:
  -
    plugin: dom
    method: import
  -
    plugin: dom_str_replace
    mode: attribute
    xpath: '//a'
    attribute_options:
      name: href
    search: 'foo'
    replace: 'bar'
  -
    plugin: dom_str_replace
    mode: attribute
    xpath: '//a'
    attribute_options:
      name: href
    regex: true
    search: '/foo/'
    replace: 'bar'
  -
    plugin: dom
    method: export
✨ Feature request
Status

Needs review

Version

6.0

Component

Plugins

Created by

πŸ‡§πŸ‡ͺBelgium herved

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡ΊπŸ‡ΈUnited States amaisano Boston

    This doesn't work with sub_process plugins. The snippet is sending the 1st item in the array rather than the whole array to the separate file.

  • πŸ‡ΊπŸ‡ΈUnited States danflanagan8 St. Louis, US

    Would it work to put the single_value plugin in the pipeline ahead of the snippet plugin in that case?

    testing:
      -
        plugin: single_value
        source: my_array
      -
        plugin: snippet
        module: my_module
        path: my_process_pipeline
    
  • πŸ‡ΊπŸ‡ΈUnited States amaisano Boston

    @danflanagan8 that seems to be working! What a nice workaround. Thank you.

    Snippet YML:

    -
      plugin: sub_process
      source: bricks
      process:
        _stub:
          plugin: switch_on_condition
          source: type
          .....
    

    --migrate-debug output:

    ...
        13 => []
        14 => array:3 [
          "_stub" => array:2 [
            0 => "227"
            1 => "274"
          ]
          "target_id" => "227"
          "target_revision_id" => "274"
        ]
    

    It's interesting how the output values contain the pseudo_field, which usually isn't present in the output for Destinations. It has no effect on the import thankfully.

  • πŸ‡ΊπŸ‡ΈUnited States alison

    I just started using #19 to do identical processing on a bunch of rich text fields and I'm in love, this feature is absolutely wonderful, THANK YOU!!

    The status is "Needs work" not "Needs review," so I won't change it to RTBC, but do y'all know what's left needing work, or maybe can we move this to RTBC........?

  • πŸ‡ΊπŸ‡ΈUnited States danflanagan8 St. Louis, US

    The `snippet` plugin for sure needs test coverage. (Added the tag.) And the IS here describes a different approach. I would be in favor of updating the IS to describe the `snippet` plugin. So I added that tag too.

  • Assigned to benjifisher
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    I am updating the title and issue summary, since I think we have consensus on using the snippet plugin.

    To do:

    1. Convert the patch to a MR.
    2. Add tests.

    I am assigning the issue to myself, but if I do not make any progress in a week, then feel free to change that.

  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area
  • Merge request !101Add the snippet process plugin β†’ (Open) created by benjifisher
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    I made a MR based on the patch in #19.

    I notice that there is a todo comment in the code:

        // @todo Add error checking: the module is enabled, the file exists, the
        // YAML is valid.
    

    We should handle that as well as add a test.

  • Pipeline finished with Success
    4 months ago
    Total: 234s
    #241170
  • Pipeline finished with Success
    4 months ago
    Total: 240s
    #241175
  • Pipeline finished with Failed
    4 months ago
    Total: 240s
    #243454
  • Pipeline finished with Failed
    4 months ago
    #243681
  • Pipeline finished with Failed
    4 months ago
    Total: 244s
    #243684
  • Pipeline finished with Failed
    4 months ago
    Total: 241s
    #243693
  • Status changed to Needs review 4 months ago
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    I handled the @todo comment and added a kernel test. This issue is ready for review.

  • Pipeline finished with Failed
    4 months ago
    Total: 244s
    #243734
  • Issue was unassigned.
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area
  • Status changed to Needs work 3 months ago
  • πŸ‡ΊπŸ‡ΈUnited States danflanagan8 St. Louis, US

    I've been working on the "T" part of RTBC and almost everything is amazing. I'm getting helpful exception messages when the snippet of my yaml is bad or when the "path" I configured was incorrect. Or when I forget to put the "module" or when I use a module that isn't installed (which is even a different message!). Really awesome DX.

    I've also used the plugin successfully on multiple projects.

    The only problem I've noticed is that the snippet plugin doesn't attempt to handle stopping a pipeline. ( ✨ Allow process plugins to stop further processing on a pipeline RTBC )

    What should happen if I use this snippet?

    - plugin: callback
      callable: is_numeric
    - plugin: skip_on_empty
      method: process
    - plugin: default_value
      default_value: 'should I ever see this?'
    

    I think I'd want to get null but instead I get that default message when the source is not numeric.

    If we don't support stopping in a snippet, we should document that. I think the way to support it would be to go back to the foreach loop and do something like:

      /**
       * {@inheritdoc}
       */
      public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {
        foreach ($this->pipeline as $step) {
          $value = $step->transform($value, $migrate_executable, $row, $destination_property);
          if ($step->isPipelineStopped()) {
            break;
          }
        }
        return $value;
      }
    
  • Status changed to Needs review 3 months ago
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    @danflanagan8:

    Thanks for all those good ideas! I think the plugin will be a lot more robust thanks to this input. Back to NR.

    Since isPipelineStopped() was introduced in Drupal 10.3.0, and migrate_plus declares compatibility with 9.1, I added method_exists($step, 'isPipelineStopped') to the test. With earlier versions of Drupal core, the code will never reach the break line, but a MigrateSkipProcessException will be handled by executable.

    I am disappointed: I do not think I have ever used array_reduce() before, and I thought it was an elegant way to implement transform(). But I do not know how to break out of array_reduce(), except by throwing an exception, so I want back to a foreach loop.

  • Pipeline finished with Failed
    3 months ago
    Total: 828s
    #274234
  • Assigned to benjifisher
  • Status changed to Needs work 3 months ago
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    On second thought, I need to do more: the snippet plugin should implement isPipelineStopped(), not just check that method for the process plugins in the pipeline. Maybe this is a hint that I should change the implementation to be more like the sub_process plugin, so that I am not re-implementing the executable. If I do that, then using source: in a snippet should work normally.

    I am setting the status to NW and assigning this issue to myself. I intend to update it within the next few days.

  • πŸ‡ΊπŸ‡ΈUnited States danflanagan8 St. Louis, US

    On second thought, I need to do more: the snippet plugin should implement isPipelineStopped()

    Oh, this is actually a really subtle and interesting point. What IS the snippet? I think I was imagining that stopping within the snippet pipeline would not result in the main pipeline stopping. Does one imagine the snippet as a branch off of the main pipeline (like subprocess)? Or does one imagine the snippet as being inserted into the main pipeline (basically a syntactic trick)? I think there are good uses for both implementations.

    Maybe this is a hint that I should change the implementation to be more like the sub_process plugin, so that I am not re-implementing the executable.

    It depends on what we're going for. Is the snippet supposed to act more like subprocess? Or are we supposed to imagine the snippet's yaml being inserted into the main pipeline?

    If I do that, then using source: in a snippet should work normally.

    That occurred to me too! Your beautiful new bit of documentation would be for naught. :(

  • Pipeline finished with Failed
    3 months ago
    Total: 260s
    #277667
  • Issue was unassigned.
  • Status changed to Needs review 3 months ago
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    It depends on what we're going for. Is the snippet supposed to act more like subprocess? Or are we supposed to imagine the snippet's yaml being inserted into the main pipeline?

    Personally, I think of it the second way.

    A more important consideration is that the plugin should behave the same way in Drupal 10.2 as it does in 10.3 and 11.0. I added a test: it uses a snippet that includes skip_on_empty. In older versions of Drupal (and with the version of this plugin that several people have been using) that throws a MigrateSkipProcessException exception, which is caught by the code calling the snippet plugin. In order to get the same behavior with newer versions, I stop the pipeline and return NULL.

    I did not quite get it to work, but I think the constructor could create a stub migration using the null source and destination plugins and

    process:
      snippet_result:
        - plugin: get
          source: snippet_value
        - <insert pipeline here>
    

    Then create a MigrateExecutable object with that migration and save that as a class variable instead of the $pipeline array. Then the transform() method would just have to set snippet_value on the Row object, call MigrateExecutable::processRow(), and return the snippet_result destination property.

    I think I will leave well enough alone and set the issue status back to NR.

  • Pipeline finished with Failed
    3 months ago
    Total: 272s
    #277682
  • πŸ‡ΊπŸ‡ΈUnited States danflanagan8 St. Louis, US

    > Personally, I think of it the second way.

    Gotcha. Re-reading the IS, I'm convinced that's what's being asked for anyway.

    > A more important consideration is that the plugin should behave the same way in Drupal 10.2 as it does in 10.3 and 11.0.

    It should be tested manually, but the latest looks good to me. It's right to return $value instead of NULL, which you must have realized after #37

    Leaving at NR for the T part of RTBC, as you say.

  • πŸ‡ΊπŸ‡ΈUnited States alison

    Finally back to re-test!

    FYI, on my current project, I'm only using snippet to do text format mapping on all my rich text fields (so far) -- so, it's only one snippet, and it's only on one type of field/data/whatever -- and only on one use per migration.
    Before today, I was using the 2024-08-01 version of the patch/MR.

    • Today, I applied the latest version of the patch/MR.
    • I ran one of my "upgrade_d7_node_complete_TYPE" migrations that I'd run previously, with the --update flag -- everything worked great.
    • I rolled back that "upgrade_d7_node_complete_TYPE" migration, then re-ran it, just, to simulate doing it "fresh" or something -- everything worked great.
    • I did the same thing with one other "upgrade_d7_node_complete_TYPE" migration -- success with that one, too.

    "Everything worked great" = node data came in, no errors, proper text formats applied to rich text fields πŸŽ‰

    Please let me know if there's any other info I should share. Thank you!!

  • πŸ‡ΊπŸ‡ΈUnited States amaisano Boston

    I was worried when I saw additional notes in the doc comments about needing to explicitly `get` source values inside the snippet, because I have code like this:

    Parent migration:

    ...
    process:
      name: name
      _parent_id: source/parent/id
      description/value: description/value
      description/format:
        plugin: snippet
        module: sw_migrate_content
        path: text_format
        source: description/format
    ...

    Snippet:

    -
      plugin: static_map
      bypass: true
      map:
        uikit: legacy
    -
      plugin: if_condition
      condition: not:empty
      else_get: constants/full_html_format
    -
      plugin: concat
      source:
        - null # null trick current pipeline value
        - '@_parent_id'

    But things APPEAR to still have worked fine, both before (using an earlier version of this patch) and now. Is this comment only for the first step in the pipeline? It is confusing to me.

    Unless I'm missing something? The comment in question:

    * Normally, any process plugin can specify the source key, which implicitly
    * adds the get process plugin to the pipeline. This does NOT WORK inside a
    * snippet:
    *
    * @code
    * # The source will be ignored. The source of the snippet plugin is always
    * # used.
    * - plugin: default_value
    * default_value: default value
    * source: some_field
    * @endcode
    *
    * Instead, explicitly add the get plugin to the pipeline:
    *
    * @code
    * - plugin: get
    * source: some_field
    * - plugin: default_value
    * default_value: default value
    * @endcode

  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    @amaisano:

    I tested your example with both the current version of MR 101 and the first commit on the feature branch, which is the same code as Comment #19. Both times, your snippet did not work. Do you have any custom code that might be involved?

    I created the test module sw_migrate_content using drush gen module, and I copied your snippet into modules/sw_migrate_content/migrations/process/text_format.yml. I installed the Migrate Conditions β†’ and Migrate Sandbox! β†’ modules. I copied the process section of the migration from your comment and entered the following for the data row:

    source:
      parent:
        id: 17
    description:
      value: test description from source
      format: uikit
    

    For the constants, I entered

    full_html_format: full_html
    

    When I tried to process the row, I got the error

    (Migrate Message) migrate_sandbox:description/format:snippet: 'legacy' is not an array

    When I modified the snippet as suggested in the doc block (explicitly adding the get plugin), it worked as expected:

    _parent_id: 17
    description:
      value: 'test description from source'
      format: legacy17
    

    I do not see any changes in the code that would affect this behavior. I added error checking, used constructor promotion, and updated the code to check isPipelineStopped() when it is defined. None of that seems relevant.

Production build 0.71.5 2024