Problem/Motivation
While investigation superfluous requests sent to a remote API, we noticed that they were caused by regular synchronizations that we run via drush migrate:import --update --sync MY_MIGRATION
.
Each execution of the command above queries the remote API 3 times, instead of a single time.
This causes extra load on the source server, but could also cause discrepancies should the remote API return different responses for each call.
My main question is: Today, is there a way to avoid migrate from querying a remote server several twice during a migration?
Steps to reproduce
Configure a migration with the url
plugin, http
data fetcher plugin and the json
data parser plugin.
Run this migration with the following options: --update --sync
Note that the above requires migrate_tools
.
Placing an XDebug breakpoint in \GuzzleHttp\Handler\CurlHandler::__invoke()
illustrates that several calls are actually made.
Proposed resolution
I'm not sure what the solution could be precisely.
Maybe there is an option somewhere that I'm missing, or a hook that could be implemented to solve my issue.
My understanding is that in my case, the source plugin refetches data every-time rewind()
gets called on it, which causes a complete retrieveal of the source data from the remote API.
A "simple" migration calls the remote 2 times (once for counting, a second time for migrating), but with the --sync
option (which calls rewind() too), this goes up to 3.
I feel like the fetcher could have an extra configuration key for caching statically remote data.
I've seen the following related matters:
API changes
Maybe a new configuration to statically cache source data, but maybe there are other ways I might not be aware of.
Does anyone have a hint on how to avoid multiple queries during a migration?
Thanks :)