Rebuild tracking info and track now hitting PHP memory limit

Created on 6 June 2023, over 1 year ago
Updated 21 June 2023, over 1 year ago

Problem/Motivation

I am running up against PHP memory issues on search API when I track items and rebuild tracking info at 512M of memory.

There is a lot of content on this site and many fields so I expect it to run up against memory issues, and the batch should pick up where it leaves off. However it seems to hit a wall around 60k items.

After the first error rebuild tracking into, I see a button in the ui "Track items for index Not all items have been tracked for this index. This means the displayed index status is incomplete and not all items will currently be indexed."

So I hit that and it immediately errors out when it tries to run the batch.The same occurs with drush search-api:rebuild-tracker (sapi-rt) hitting memory limit at around 60k items tracked. As far as I know there is no drush command to run the "track now" button, so each time it's starting over and then reaching a memory limit and erroring out around 60k items out of 300k+ that should be tracked.

There were memory limit errors while indexing as well, but I was able to get passed them by reducing the batch item count to 10 at a time, and I think that might be a possibility with tracking to get passed the errors, but it doesn't look like that number is configurable.

It is hitting the memory limit on this method:

<?php
public function executeSpecificTask(TaskInterface $task) {
    $event = new TaskEvent($task);
    $this->eventDispatcher->dispatch($event, 'search_api.task.' . $task->getType());
    if (!$event->isPropagationStopped()) {
      $id = $task->id();
      $type = $task->getType();
      throw new SearchApiException("Could not execute task #$id of type '$type'. Type seems to be unknown.");
    }
    if ($exception = $event->getException()) {
      throw $exception;
    }
    $task->delete();
  }
?>

Specifically this line:

$this->eventDispatcher->dispatch($event, 'search_api.task.' . $task->getType())

Update some additional information:

This issue seemed to arise after a process ran that imports hundreds of thousands of pieces of content from an API and there was a blip in tracking so "Rebuild tracking info" was triggered.

What's hitting a memory limit is the probably the query that is assembling the items into a pager in the task. The higher the page count the more chance it had hitting the memory limit. Seems to time out around page 500. I'm thinking if there was the ability to configure the batch total as you are on cron indexing this might help.

πŸ’¬ Support request
Status

Postponed: needs info

Version

1.29

Component

General code

Created by

πŸ‡ΊπŸ‡ΈUnited States wstocker

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @wstocker
  • πŸ‡ΊπŸ‡ΈUnited States wstocker
  • πŸ‡ΊπŸ‡ΈUnited States wstocker
  • πŸ‡ΊπŸ‡ΈUnited States wstocker

    Wanted to throw another update here. In order to get around this issue we are having to re-build the tracker locally where we can set the PHP memory limit to -1 and upload the the entire database to each environment on the host where tracking is hitting the memory limit.

    If tracking gets stuck in this memory limit there is essentially no way to unstick it. Rebuilding tracking info entirely hits the memory limit and then track now starts hitting the memory limit immediately. This also blocks cron from running since it's trying to run the same task.

  • Status changed to Postponed: needs info over 1 year ago
  • πŸ‡¦πŸ‡ΉAustria drunken monkey Vienna, Austria

    Thanks for reporting this problem.

    The tracking page size can be controlled with the `search_api.settings:tracking_page_size` config value (which defaults to 100).
    Does that help?
    Adding a Drush command that executes all pending tasks (which the item tracking counts as, internally) would also be a good idea, otherwise.

  • πŸ‡ΊπŸ‡ΈUnited States wstocker

    Thanks! Yes. We have a work around for now, but good to know for the future.

Production build 0.71.5 2024