Problem/Motivation
Currently, NodeSearch decides which content to index in a given cron run by running a query to find nodes that have been added or updated, and decides how many by a setting.
Since the time to index a node could depend on the nodes themselves, this setting is kind of problematic, and rather than using a query, a queue system seems like it might be better: nodes could be added to the queue as they are created or updated, and then the queue system would index them in order.
However, the current Drupal queue API does not have any way to tell whether a given node is already in the queue or not, and we would need to know that so that each node is not added to the queue multiple times, which would make indexing terrible inefficient.
Proposed resolution
a) Add functionality to the queue API (like a new type of queue?) so that it has some way of determining "this item is already in the queue". Possibly a tagging system? So that perhaps instead of just calling
QueueInterface::createItem($data);
we would call
TaggedQueueInterface::createTaggedItem($data, $tags)
with
$tags = array('nid' => 3)
and then there could be
TaggedQueueInterface::itemExists($tags)
which would return True/False after checking whether an item with matching tags already exists in the queue.
b) Use this queue system to index nodes in NodeSearch, rather than the existing query/setting system.
Remaining tasks
Make the new queue system and use it.
User interface changes
We'd get rid of the "number of nodes to index per cron run" setting.
API changes
This would be an API addition to the queue system, rather than a change to the existing API.
Original report by
@Xano →
We've had the queue API since D7, but it's not used for indexing content yet. Next to the well-known benefits of using the queue API for tasks like this, we can also remove the dreaded "Number of items to index per cron run" setting at the Search.module configuration page.