Allow callback for Url source, and single item Json plugin

Created on 14 March 2019, over 5 years ago
Updated 4 September 2024, 3 months ago

Problem

Much like the problem outlined in #2608610: Add support for list/item pattern , we have a need to import a bunch of items where the full list of those items (with minimal metadata like id) is defined in an endpoint like:

And each item can be retrieved in full as a single JSON object via:

I have looked into the solution on that issue, and it looks promising, however - the latest patches don't apply. And I feel like the idea of defining uri templates and child item selectors is adding complexity without addressing all use cases where we want to dynamically generate the source URLs.

Also - the item endpoint I am using provides just a single JSON object, not an array as the Json data parser assumes.

Proposed solution

Lets just allow the Url processor to take a callback which can do the work of generating the URLs in advance.

This would have other benefits (in addition to providing a way for you to implentment the ItemList Item pattern), for example, if you wish your list of source URLs to be paginated from a source, for example you are importing a catalog of 10,000 products and you want to do in batches of 200 (to balance memory limitations with the economy of making fewer HTTP requests) your callback could handle generating the urls. Each URL might looks like:

The other part of the solution would require a way to configure the migration so that it knows a source URL might contain a single object. We could explicitly set item_selector to FALSE to indicate this.

To get this working the source configuration would be something like:

source:
  plugin: url
  data_fetcher_plugin: http
  data_parser_plugin: json
  urls:
    callback: my_module_migrate_urls
  item_selector: false
  skip_count: true 
 

Note: is important if you have a large number of URLs, otherwise things like drush migrate-status will iterate through all requested URLs to get a total count.

The current migration is passed to the callback, this will allow some other tweaking, like ensuring only URLs we intend to import in the current migration are imported, like so:

use Drupal\migrate\Plugin\MigrationInterface
use Drupal\migrate\Plugin\MigrateIdMapInterface
 
 function my_module_migrate_urls(MigrationInterface $migration) {
      $ids = my_module_get_all_ids();
      $urls = [];

      // Lets exclude URLs for items already marked as imported.
      $id_map = $migration->getIdMap();
      $imported_ids = $id_map->getDatabase()->select($id_map->mapTableName(), 'm')
         ->fields('m', ['sourceid1'])
         ->condition('source_row_status', MigrateIdMapInterface::STATUS_IMPORTED)
         ->execute()
         ->fetchCol();
      $ids = array_diff($ids, $imported_ids);

      foreach ($ids as $id) {
        $urls[] = "https://example.com/jsonapi/node/article/{$id}";
      }
      return $urls;
   }
Feature request
Status

Needs work

Version

6.0

Component

Plugins

Created by

🇦🇺Australia rollsd

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇳🇴Norway steinmb

    Reading through the Drupalspoons issue, with the state we now have on d.o. perhaps we should reopen the issue here and create MR from here? The feature req. it still useful and from what I can see, we still do not have a good alternative?

  • Status changed to Needs review almost 2 years ago
  • 🇳🇴Norway steinmb

    Manual reroll and minor changes just to get the ball rolling. My local tests are still running so partly untested.

  • 🇮🇳India Aaron23

    @Steinmb Agreed. This patch needs to be updated in module. As there are scenario to keep the urls fetch from backend configuration. the custom callback will be the solution to achieve it

  • 🇺🇸United States jcandan

    Am I correct that the list of URLs created for a combination of list/item pattern and pagination would be something like this?

    If the URL callback just does the job of listing all these endpoints, does anyone see a potential source data handling conflict by treating the list and item responses the same? Seems the separate solutions offered by #2608610: Add support for list/item pattern and 📌 Support paging through multiple requests Needs work might solve for that, but I am open to getting a better grasp of this solution.

    Thoughts?

  • 🇺🇸United States jcandan

    Actually, come to think of it, the list/item solution I went with was a variation of the UrlPerRow example in Drupal Answer link from #26 of #2608610: Add support for list/item pattern .

  • 🇺🇸United States amaisano Boston

    We wrote a similar plugin extension that provides a list of URLs back to the main source plugin. There hasn't been any conflict with # of items found vs. imported vs. any other status - it just works like normal.

    We've extended the the migrate_plug\data_parse\JSON plugin, and altered the nextSource() method to build this list.

    We are crawling a JSON:API in our custom plugin until we find no more "next-page" attributes in the JSON response, and then compile the full list of URLs as basically a bunch of offset page requests.

    I can share the code if anyone is interested.

  • @amaisano id very much like to read the code of this custom JSON plugin! :)

  • 🇺🇸United States oivanov

    @amaisano me too! please share the code of this custom JSON plugin!

  • 🇪🇸Spain jcarrasco

    I tried this patch (42, 3040427-42-migrate_plus_multiple_urls.patch ) and code for function my_module_migrate_urls but I have this error:

    In Url.php line 88:
    The URL callback function is not callable.

    What I am missing to make callback function callable?
    Thanks in advance

  • 🇺🇸United States amaisano Boston

    Here is the parser code plugin we made that handles custom/dynamic/multiple source urls:

    https://www.drupal.org/project/migration_jsonapi

  • Status changed to RTBC 9 months ago
  • 🇦🇹Austria hudri Austria

    We are using patch #42 for years now very successfully with many different endpoints. Moving to reviewed & tested.

    Migrate sources are so vastly different that - for me - it is much easier and faster to write a PHP callback than to find and study and update 437 different list/item source plugin modules. And very often those plugins are NOT (re-) usable because endpoint A returns an array [] on root, endpoint B returns an object [] on root, endpoint C returns an HTTP-Code 404 when the last page number was passed and endpoint D just returns an HTTP200 with no content. PHP callbacks just work. For years. (Sorry for the rant, but the more different sources you have to migrate from, the more useful this simple patch is.)

    +1 from me to include this feature into migrate_plus.

  • 🇺🇸United States codechefmarc

    We are also using this patch to great success. I am curious if it's possible to add an argument that can be passed to the callback as well. Our use case is that we have several migrations that all use very similar code, but have different endpoints to get different resources. So, I envision something like this:

    source:
      plugin: url
      data_fetcher_plugin: http
      data_parser_plugin: json
      urls:
        callback: my_custom_callback
        argument: something-passed-here
    

    And then in the callback function:

    function my_custom_callback(MigrationInterface $migration, $argument) {
    
    }
    

    This may be another issue/MR, but wanted to see if anyone else had this kind of use case. At the moment, the workaround is for us to use multiple callback functions, one for each migration.

  • 🇺🇸United States codechefmarc

    I just did a test and I can do something like this and it works - not sure how valid it is though:

    Migration yml:

    source:
      plugin: url
      data_fetcher_plugin: http
      data_parser_plugin: json
      urls:
        callback: my_custom_callback
      callback_arg: 'my amazing argument'
    

    Callback function:

    function my_custom_callback(MigrationInterface $migration) {
      $my_arg = $migration->getPluginDefinition()['source']['callback_arg'];
    }
    
  • Status changed to Needs work 7 months ago
  • 🇬🇧United Kingdom joachim
    1. +++ b/src/Plugin/migrate/source/Url.php
      @@ -32,11 +71,23 @@ class Url extends SourcePluginExtension {
      +        $configuration['urls'] = $configuration['urls']['callback']($migration);
      

      Rather than load up a potentially HUGE list of URLs into the configuration, we should defer this to getSourceUrls(), and the fetchers should call that.

      Also, changing configuration feels a bit ick -- it should remain what it was set to in the plugin definition rather than being spookily changed.

  • Pipeline finished with Success
    7 months ago
    Total: 259s
    #161701
  • 🇺🇸United States codechefmarc

    Hi, finding out what I can do to help to move this forward - I'm planning on needing this functionality in another contrib module I'm working on. I'm not as well versed in these kinds of migrate plugins so I'm trying to follow your comment about deferring to getSourceUrls(). I see that function in there, but it's not being used anywhere. And not sure how to change the configuration piece either?

    What if we made this a separate plugin that is a different name but extends the URL plugin and that way the original class isn't changed but the new one can add this functionality?

    Thanks!

  • Pipeline finished with Failed
    3 months ago
    Total: 452s
    #258220
  • 🇬🇧United Kingdom joachim

    > What if we made this a separate plugin that is a different name but extends the URL plugin and that way the original class isn't changed but the new one can add this functionality?

    That's not going to work because we also need to change the DataParserPluginBase. Currently it lifts the 'urls' property directly from the Url.

    We need to decouple them a little bit and have the parser plugin call a method on the URL plugin.

  • 🇬🇧United Kingdom joachim

    I've done a bit more work on this, but it turns out that adding a method to the Url source plugin class has repercussions.

    We need the parser plugin to have the url source plugin object, and that means:

    - DataParserPluginManager needs to implement createInstance() to receive it and pass it on to the parser plugin
    - the kernel tests will all need tweaking because many of them create an instance of the parser

    Would be good to get input from maintainers before I go further on this.

  • 🇺🇸United States jacobbell84

    Making a quick update to the existing patch to work with the new function format. While the patch was still applying before, the variable names in the getSourceData function had changed so the code wasn't actually doing anything.

Production build 0.71.5 2024