Implement a queue mechanism

Created on 28 September 2023, about 1 year ago
Updated 17 October 2023, about 1 year ago

Problem/Motivation

Regardless of the tier you're on, there will always be a limit of 30 mails per minute. This might sound okay, but if you're sharing the same API key/set-up over a multitude of large websites, 30 mails per minute suddenly isn't that much anymore on a form heavy site where some forms send multiple emails.

https://learn.microsoft.com/en-us/office365/servicedescriptions/exchange...

Aparantly there is a Retry-After response header which indicates how long the application should wait until it resubmits the request. This can maybe set as the back-off value.

Proposed resolution

Implementing a queue system for the mails solves a few issues at the same time:
- If you hit the rate limit, the mail is not lost but sent on the following cron run
- Depending on the error, e.g. a network timeout, the mail is not lost, and can be kept in the queue until successfully sent

Remaining tasks

- Create the queue and queue worker
- Define an offset mechanism so there is a max retries implemented to not have items in the queue forever
- Define which error codes are kept in the queue.

Error Codes

List of all error codes: https://learn.microsoft.com/en-us/graph/errors

Error codes to keep in queue:
- 429 Too many requests

Ok status codes:
- 202 Accepted

Don't store in queue:
- 400 Bad request

User interface changes

N/A

API changes

Queue mechanism implemented

Data model changes

N/A

โœจ Feature request
Status

Fixed

Version

2.0

Component

Code

Created by

๐Ÿ‡ง๐Ÿ‡ชBelgium BramDriesen Belgium ๐Ÿ‡ง๐Ÿ‡ช

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @BramDriesen
  • ๐Ÿ‡ง๐Ÿ‡ชBelgium BramDriesen Belgium ๐Ÿ‡ง๐Ÿ‡ช

    Drafted first list of codes to handle

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium BramDriesen Belgium ๐Ÿ‡ง๐Ÿ‡ช
  • First commit to issue fork.
  • ๐Ÿ‡ง๐Ÿ‡ชBelgium dave.mentens

    dave.mentens โ†’ made their first commit to this issueโ€™s fork.

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium dave.mentens

    Also a patch for version 1.0.x

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium dave.mentens

    Added missing dependencies for 1.0.x patch

  • Status changed to Needs review about 1 year ago
  • ๐Ÿ‡ง๐Ÿ‡ชBelgium BramDriesen Belgium ๐Ÿ‡ง๐Ÿ‡ช
  • @apaderno opened merge request.
  • Status changed to Needs work about 1 year ago
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly apaderno Brescia, ๐Ÿ‡ฎ๐Ÿ‡น

    The issue fork is 10 commits behind, 20 commits ahead of the upstream repository. When I created the MR, it created it against the 1.0.x branch, not the 2.0.x branch.

    The GraphService class should have a different name, as none of the classes that implement a service contains Service in their names.

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium cgoffin

    I think it is also better to catch all exceptions thrown, because we now sometimes see a '503 Service Unavailable'.

  • ๐Ÿ‡ฎ๐Ÿ‡นItaly apaderno Brescia, ๐Ÿ‡ฎ๐Ÿ‡น

    The MR is also doing changes that should be better split on different issues, as they are not strictly necessary to implement a queue mechanism, for example adding the schema for the configuration file.

  • @cgoffin opened merge request.
  • @cgoffin opened merge request.
  • ๐Ÿ‡ง๐Ÿ‡ชBelgium cgoffin

    I tried to refactor, but also messed up some branches. My apologies. I created the new branch 3390319-add_queue_mechanism which started from the 1.0.x version. Here also a patch file to use in your projects.

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium cgoffin

    Fixed a bug in de code changes.

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium cgoffin

    Found another bug with missing use statements.

  • Status changed to Postponed about 1 year ago
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly apaderno Brescia, ๐Ÿ‡ฎ๐Ÿ‡น

    It is better to postpone this, as the changes introduced from the linked issue are slightly different, even if the module will still implement a service class.

  • Assigned to apaderno
  • Status changed to Active about 1 year ago
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly apaderno Brescia, ๐Ÿ‡ฎ๐Ÿ‡น

    I will handle this in the next 6 hours.

  • ๐Ÿ‡ฎ๐Ÿ‡นItaly apaderno Brescia, ๐Ÿ‡ฎ๐Ÿ‡น

    The issue I see is that the queue we require needs to be strictly first in, last out; it should also allow us to always handle items respecting the Retry-After value returned from the Microsoft Graph API. This is not possible with the queue API, which does not even guarantee the default queue is delayable.

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium BramDriesen Belgium ๐Ÿ‡ง๐Ÿ‡ช

    first in, last out

    Hmm I'm not following here. Why would that be needed? Seems more logical to me that the oldest item get's processed first.

    Retry-After value returned from the Microsoft Graph API. This is not possible with the queue API, which does not even guarantee the default queue is delayable.

    I don't really see an issue here. I guess we could set the delay exception to the retry-after value. But in anyway, the queue is depending on how often you're running cron, and thus processing queue's.

    We're running on Acquia where the max cron interval is 5 minutes, so we know it is possible there might be a small delay in the emails being sent. However the trade off of having the email sent VS it being lost forever because there was an exception is for us a major reason to have the queue. We've been running this for a week or so now in production and everything is working fine actually.

  • ๐Ÿ‡ฎ๐Ÿ‡นItaly apaderno Brescia, ๐Ÿ‡ฎ๐Ÿ‡น

    The issue is that the queue API does not guarantee the implemented queue is reliable (the order of items is preserved and every item will be executed at least once) nor delayable (an item can be released on a delay) which are the requisites for the queue the module should use.

    Furthermore, for the queue API, keeping an item in the queue is an exception, not the rule. The module needs also to get from the queue only those items for which the Retry-After delay is already passed.

    I did not say the module will not use any queue, but that queue cannot be the queue implemented by the queue API.

  • ๐Ÿ‡ฎ๐Ÿ‡นItaly apaderno Brescia, ๐Ÿ‡ฎ๐Ÿ‡น

    We've been running this for a week or so now in production and everything is working fine actually.

    It works fine only if the site is using the DatabaseQueue class for queues, but the default queue class can be changed by modules or from site settings, and the used class is not required to implement DelayableQueueInterface nor ReliableQueueInterface.

  • Status changed to Needs work about 1 year ago
  • ๐Ÿ‡ง๐Ÿ‡ชBelgium BramDriesen Belgium ๐Ÿ‡ง๐Ÿ‡ช

    Ah I see, we're not using DelayedRequeueException here

  • ๐Ÿ‡ฎ๐Ÿ‡นItaly apaderno Brescia, ๐Ÿ‡ฎ๐Ÿ‡น

    By default, cron tasks use the queue.database service, which implements a reliable and delayable queue.
    If the site setting select a different queue, which is neither reliable nor delayable, that is an issue with the site settings, not the module. (As far as I can see, modules do not have a way to select a different queue implementation to pass to a QueueWorker plugin.)

    There are still issues with the current patch:

    • The correct way to delay an item is to throw a DelayedRequeueException exception
    • The code checking for the Retry-Delay value does not verify the thrown exception type ($delay = ($e->getResponse() && $e->getResponse()->hasHeader('Retry-After')) ? (int) $e->getResponse()->getHeader('Retry-After')[0] : 600;)

    For the first point, the only way to delay an item is to store in the item itself the delay the Microsoft Graph API required when throwing the exception. When the queue worker handles the item, it must first check the delay and be sure that delay is already passed, before handling the item.

  • Status changed to Active about 1 year ago
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly apaderno Brescia, ๐Ÿ‡ฎ๐Ÿ‡น

    Now the code just needs to back-ported to the 1.0.x branch.

  • Status changed to Fixed about 1 year ago
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly apaderno Brescia, ๐Ÿ‡ฎ๐Ÿ‡น
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024