Explore ways to increase efficiency in Queue item lookups

Created on 1 February 2024, 5 months ago

Problem/Motivation

In a discussion of Issue 3417119 πŸ› Better handling of potential race conditions between deletion queue and migration queue Fixed we tried to see if there were more efficient ways to find Queue items. The only way we had of querying other Queue items was with a direct db query using LIKE constructs against serialized data.

Out of that discussion, the following explorations came:

Yes, there is very good reason to be concerned about the performance of this query since the amount of queued migration items is potentially infinite and as you know LIKE (especially on long data blobs) is not particularly efficient.

Without extending the Queue module's schema to add a column for "origin_id", this is the best we can do though without extensive more work (see below). It is certainly better than consecutively instantiating each data blob and querying for its SystemIdentifier value.

I did look into what it would take to extend core Queue and that's where I ran into the fact that I did not see a way to dynamically retrieve the schema of another module, so one could extend it. The only way to do it is to hard-code the change. I had little insight into knowing what knock-on effects adding a column to the queue table would have for other modules, since the table is shared by any number of Queues. The model Queue uses is by-design quite simple, so it might quite easily have been done with no side effects but I did not feel confident nor know of a way to become confident :) without installing every Drupal module possible and testing. At that point though, I started to feel I was moving far away from original scope.

The other option would be to, during the queue-ing operation itself, in a table wholly owned by this module, create a lookup between the queue table and ours:

orange_dam_queue_lookup:
  - origin_id: {SystemIdentifier}
  - queue_id: queue.item_id

Then we could JOIN orange_dam_queue_lookup to queue on orange_dam_queue_lookup.queue_id = queue.item_id and find our row that way. And the effect on other code would effectively be removed. QueueInterface::createItem() says this:

/**

Adds a queue item and store it directly to the queue.
@param $data
Arbitrary data to be associated with the new task in the queue.
@return false|int|string
A unique ID if the item was successfully created and was (best effort)
added to the queue, otherwise FALSE. We don't guarantee the item was
committed to disk etc, but as far as we know, the item is now in the
queue. */ public function createItem($data);

So we do get an item_id back (in success) so we could definitely do this with our own lookup table.

Or, do the research necessary to see if extending the base Queue schema with an origin_id column would be non-destructive.

And

Realizing this could become a Queue contribution:

Either:

Add a column called reference where a third party module can add a mymodule: and where we trust that all modules will cooperate to make sure the values are prefixed correctly so as to not clobber other module's values.

or

Add two columns: module: for the name of the module that is owning this queue item. And reference for that module to use as a reference to the item.

Queue Purists will probably (theoretically rightly) argue that a Queue item should know nothing about other Queue items, and that they should, therefore, not be inspected. They are right.

What would be required, however, for the genuine near-elimination of the race condition is that

1. migrations and deletions go into the _same_ queue.
2. We would need to refactor the incoming fields to include a last-touched timestamp for items.
3. We would need a way to use the Orange Logic timestamp as the Queue item timestamp, but Queue provides no API for setting the created timestamp, or making it mirror the timestamp in the internal data. (Do we need to contribute a special ExternalItemsQueue module?)

So even if we did refactor the module to put deletions and updates into the same queue, we would still need to be able to inspect other items in the queue: An incoming deletion instruction would have to be able to inspect the migration items, since, without being able to recreate the correct time-sequence of the events in Queue, we would have to do an object inspection before deciding whether to insert the deletion instruction.

So we need to either: extend Queue api to allow for setting the `created` field on the item during `createItem()`, or extend Queue to have a new column where module references can be written and queried against, or, we create a relationship map table managed by this module that relates objects to queue items as described above.

Proposed resolution

Not sure yet. Need to discuss and research more.

Remaining tasks

- Research the above options.
- Decide on approach
- Code the approach

User interface changes

none

API changes

none

Data model changes

Possibly extensive and possibly requiring contributions to core Queue.

✨ Feature request
Status

Active

Version

1.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States apotek

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.69.0 2024