ECA Queue Worker should rethrow exceptions

Created on 11 June 2024, 11 months ago

Problem/Motivation

When using the event "ECA Processing Queued Task", ECA provides a queue worker for items added to a specific queue.

If there is an action that throws an exception, ECA catches it so it can continue executing the model -- however, the event never forwards on the exception, and so the queue item is always removed from the queue and never requeued.

For systems that count on queues being reliable, this short-circuits this benefit.

More discussion in Slack here: https://drupal.slack.com/archives/C0287U62CSG/p1718046179972269

Steps to reproduce

  1. Create a custom action that throws a RequeueException of some kind.
  2. Create a model that inserts something in a queue, and an event to process that queue task, and then do the custom action.
  3. Trigger the event.

Expected result: Queue is processed, item gets returned to the queue to be retried the next time the queue is processed.

Actual result: Queue is processed. Item gets removed.

Proposed resolution

In the original event, look for an exception in the ECA context, and if there is one, throw it. This will cause the queue item to remain in the queue, and the queue runner can handle various options through its normal exception classes.

Remaining tasks

User interface changes

API changes

Data model changes

✨ Feature request
Status

Active

Version

2.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States freelock Seattle

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @freelock
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen
  • Status changed to Needs review 10 months ago
  • πŸ‡ΊπŸ‡ΈUnited States freelock Seattle

    Hi,

    First cut at a fix for this -- it turns out that if we use a try/catch block on the event plugin, it never catches an exception because the EcaAction object catches it first.

    Not wanting to hard-code a list of events to bypass exception handling, I was looking at the plugin tags to see if we could leverage that, not something I see implemented anywhere other than just tags on events.

    So I ended up extending the Event plugin interface and base class with a new method, "handleExceptions", which the EcaAction can use to determine whether or not to rethrow exceptions.

    For now I picked the DelayedRequeueException with a 10 minute delay -- this could certainly get more configurable with other exception types and more handling, but this solves what I need for a few different sites already.

    Does this seem like an appropriate solution? Are there event plugins that do not extend the EventBase class?

    I think it does fix queue processing to be reliable, at least more so than ECA does today.

    Cheers,
    John

  • Pipeline finished with Failed
    10 months ago
    Total: 442s
    #232609
  • Status changed to Needs work 10 months ago
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    This looks pretty good as a staring point. I've left a couple of comments in the MR.

    Please, don't set this to NR before the pipeline is green. There are currently a few code style issues, that need to be fixed first.

    Are there event plugins that do not extend the EventBase class?

    No. So the approach is perfect.

    The only concern that comes to mind, if the exception is not temporary but permanent, e.g. due to a misconfigured ECA model, then the item remains in the queue forever, and we don't have any mechanism, to clean up such an issue. Should there be something that controls the max number of retries?

  • πŸ‡ΊπŸ‡ΈUnited States freelock Seattle

    The only concern that comes to mind, if the exception is not temporary but permanent, e.g. due to a misconfigured ECA model, then the item remains in the queue forever, and we don't have any mechanism, to clean up such an issue. Should there be something that controls the max number of retries?

    Hmm there are existing mechanisms for clearing out the queue -- drush, queue ui module.

    Are the exception classes that were already being rethrown indicate things that we should pass along as is, instead of using a requeue exception?

    I do think another consideration might be whether the user/developer considers the queue to be a reliable one or not -- if it's reliable, we need to make sure items get requeued regardless of what else happens... if not, it's not as big a deal to absorb the exception and continue.

    We're often using queues so that API calls to remote services are reliable on otherwise unreliable connections (or unreliable services).

  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    there are existing mechanisms for clearing out the queue -- drush, queue ui module

    Yes, but site owners may not realize that some constantly failing queue tasks are filling up in their system. Not saying, this is something for ECA, at least not for this issue. But a misconfigured ECA model may fill up queues without anyone noticing it.

    I do think another consideration might be whether the user/developer considers the queue to be a reliable one or not

    ECA always uses reliable queues, and we should probably stick to that unless someone asks for an option to use unreliable ones. That would be a new feature request then.

    We're often using queues so that API calls to remote services are reliable on otherwise unreliable connections (or unreliable services).

    Yeah, for that this requeue mechanism is perfect.

    Maybe we should do this: log a critical error before throwing the requeue exception. That way, site owners at least get a chance to notice that something went wrong. That would require us the move the new catch block below the 3 lines in EcaAction that already write an error log.

  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    @freelock this is close to the finish line, any chance we can move forward with it?

  • Status changed to Needs review 5 months ago
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    I've addressed all open threads and also moved the re-throw below the logging to address the final part of #7

    This needs a review now, and hopefully we can get this in to the soon to be published 2.1.0 release.

  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    Setting this to RTBC according to comments in the MR.

  • Pipeline finished with Skipped
    5 months ago
    #363025
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024