Problem/Motivation
I am running up against PHP memory issues on search API when I track items and rebuild tracking info at 512M of memory.
There is a lot of content on this site and many fields so I expect it to run up against memory issues, and the batch should pick up where it leaves off. However it seems to hit a wall around 60k items.
After the first error rebuild tracking into, I see a button in the ui "Track items for index Not all items have been tracked for this index. This means the displayed index status is incomplete and not all items will currently be indexed."
So I hit that and it immediately errors out when it tries to run the batch.The same occurs with drush search-api:rebuild-tracker (sapi-rt) hitting memory limit at around 60k items tracked. As far as I know there is no drush command to run the "track now" button, so each time it's starting over and then reaching a memory limit and erroring out around 60k items out of 300k+ that should be tracked.
There were memory limit errors while indexing as well, but I was able to get passed them by reducing the batch item count to 10 at a time, and I think that might be a possibility with tracking to get passed the errors, but it doesn't look like that number is configurable.
It is hitting the memory limit on this method:
<?php
public function executeSpecificTask(TaskInterface $task) {
$event = new TaskEvent($task);
$this->eventDispatcher->dispatch($event, 'search_api.task.' . $task->getType());
if (!$event->isPropagationStopped()) {
$id = $task->id();
$type = $task->getType();
throw new SearchApiException("Could not execute task #$id of type '$type'. Type seems to be unknown.");
}
if ($exception = $event->getException()) {
throw $exception;
}
$task->delete();
}
?>
Specifically this line:
$this->eventDispatcher->dispatch($event, 'search_api.task.' . $task->getType())
Update some additional information:
This issue seemed to arise after a process ran that imports hundreds of thousands of pieces of content from an API and there was a blip in tracking so "Rebuild tracking info" was triggered.
What's hitting a memory limit is the probably the query that is assembling the items into a pager in the task. The higher the page count the more chance it had hitting the memory limit. Seems to time out around page 500. I'm thinking if there was the ability to configure the batch total as you are on cron indexing this might help.