- Issue created by @joachim
- 🇬🇧United Kingdom joachim
I think I can see a way to make this work, with some provisos...
Suppose when an entity is being run through the import processor pipeline by ImportService, some of the processors (such as links, reference fields, embedded links) find it has links/nested entities. Let's call the main entity A1, and currently the list of entities to process is this, where the * indicates the current item being processed:
> *A1, X, Y, Z
The ImportService collects the list of dependencies. Let's call them B1, B2, B3. It then manipulates the batch queue, so it looks like this:
> *A1, B1, B2, B3, A1, X, Y, Z
It then stops processing A1, allowing the batch/queue system to advance to B1.
If B1 has links/nested entities C1, C2, it does the same, so the queue is:
> *B1, C1, C2, B1, B2, B3, A1, X, Y, Z
The idea is that we stop processing the parent entity, import the dependencies first, then re-encounter the parent entity in the queue.
We obviously still need to respect $entitiesMarkedForImport, which prevents infinite loops of link following.
The provisos we need for this all to work are:
1. We need to be able to add items to the front of the batch queue. I don't know how yet. We can override the queue class in the batch definition, though we may not need to, as the two queue classes used by BatchAPI say they use FIFO ordering in the docs -- and that's what we want. The other problem is how to access the queue, but since we can set the queue name in the batch definition, we can maybe just reach into it with the queue API.
2. We need to split up our batch operations to be atomic. Currently, both whole-channel and selected UUID pulls pass a URL to the batch operation callback, and that fetches the whole JSONAPI data and then slices through it in repeated calls to the same operation callback:
if (empty($context['sandbox'])) { $response = $import_service->jsonApiRequest('GET', $url); $json = Json::decode((string) $response->getBody()); $entity_list_data = EntityShareUtility::prepareData($json['data']); $context['sandbox']['entity_list_data'] = $entity_list_data; $context['sandbox']['progress'] = 0; $context['sandbox']['max'] = count($entity_list_data); $context['sandbox']['batch_size'] = \Drupal::getContainer()->getParameter('entity_share_client.batch_size'); } if (!isset($context['results']['imported_entity_ids'])) { $context['results']['imported_entity_ids'] = []; } $sub_data = array_slice($context['sandbox']['entity_list_data'], $context['sandbox']['progress'], $context['sandbox']['batch_size']);
This won't allow what we want to do here, as we potentially want to insert items between items here. We need to do only one entity per operation, and I think for clarity and simplicity we need to remove the entity list data stuff, and give each batch operation its own JSONAPI URL.
3. Tests! PullKernelTestBase currently has very naive handling of running the batch. This will need to be improved.
- 🇬🇧United Kingdom joachim
Hacking extra items into the batch is possible, like this:
// Inside a batch operation callback: $batch = &batch_get(); // Get the name for the batch queue. // @see _batch_populate_queue() // We expect there is only one set. $queue_name = 'drupal_batch:' . $batch['id'] . ':' . 0; $queueFactory = \Drupal::service('queue'); /** @var \Drupal\Core\Queue\QueueInterface $queue */ $queue = $queueFactory->get($queue_name); $queue->createItem( [static::class . '::operation', [99]], ); // Hack the count and the total for our batch set, so that _batch_process() // processes the additional queue item. $batch['sets'][0]['total']++; $batch['sets'][0]['count']++;
- 🇬🇧United Kingdom joachim
Or alternatively....
We still split the initial batch into one operation per requested entity.
But instead of making the batch operation parameter a URL or some entity data, we introduce an entity pull request value object.
This gets passed through the processor plugins, and if there are references, those get added to it. This creates a stack of requests in the EPR.
The import service can then mark the EPR as being incomplete.
The batch operation callback can then report to BatchAPI that it is incomplete, and so it keeps getting called until the EPR's stack is all done.