Use queue for helpdesk requests

Created on 2 February 2024, 5 months ago
Updated 6 March 2024, 4 months ago

Problem/Motivation

Sometimes the helpdesk backend is down, busy or produces errors, even if the request itself is correct. This always results in an exception in Drupal and the user gets an error message which he cannot understand.

Proposed resolution

  • All update requests (mostly PUT or POST requests) should be put into a queue and the user can continue immediately
  • If a GET requests results in an error, we provide a proper message that the displayed data cannot be updated.
  • The requests in the queue can be sent with strategies, like retries in error cases
  • There should be the possibility to send a proper error message to other parties, e.g. a DevOps team
  • The entire process should be logged in Drupal as well.
  • more to come....
Feature request
Status

Fixed

Version

2.0

Component

Code

Created by

🇩🇪Germany danielspeicher Steisslingen

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @danielspeicher
  • First commit to issue fork.
  • Merge request !10Resolve #3418898 "Use queue for" → (Merged) created by jurgenhaas
  • Status changed to Needs review 5 months ago
  • 🇩🇪Germany jurgenhaas Gottmadingen

    I've started the implementation of this.

    The first commit moved the initial sync attempt out of the pre-save phase back to the insert hook, so that we can ensure that the entity gets saved.

    The second commit is then dealing with the queue: if the initial attempt fails, the RequeueException will be caught by the insert hook which adds this to the queue then. The queue worker will then try again later and if the RequeueException throws again, the queue implementation is supposed to keep the task in the queue so that it will be tried again next time.

    This needs some testing and then we should consider 2 more things:

    • Do the same for comments
    • Use a short timeout for the first attempt (or skip the first attempt entirely), because this is where the user waits for the feedback in the browser, and that should be quick
  • 🇩🇪Germany jurgenhaas Gottmadingen

    The third commit is now implementing the same logic for comments and also fixes some bugs in previous commits.

    The fourth commit removes a deprecated function call which blocks tests at the moment.

  • 🇩🇪Germany jurgenhaas Gottmadingen

    Thinking about it again, I guess it is always an issue to directly sync new issues and comments with the backend, unless that is a really fast one. So, direct sync should only be used if that backend would always respond with less than a second.

    Therefore, I've now implemented a new setting for helpdesk config entities, where direct sync can be enabled, but it comes with a disabled default. With that setting, issues and comments will be queued immediately and then sync in separate processes where the user is not waiting for it.

    With this in place, I think the whole topic is fully implemented and ready for testing and review.

  • Status changed to RTBC 5 months ago
  • 🇩🇪Germany danielspeicher Steisslingen

    The solution is working as intended.

    Great!

  • 🇩🇪Germany danielspeicher Steisslingen

    For comprehension:

    If we process the queue and, the helpdesk systems returns errors and never succeeds. Do we run in an endless loop, or do I miss something?

  • 🇩🇪Germany jurgenhaas Gottmadingen

    Processing the queue works such that it takes all the available items/jobs from the queue and processes them, either all or some, depending on how long the processing takes for each.

    Jobs, that succeed, get removed from the queue. Those who don't, remain in the queue and will be tried next time. That next time means, not in the same process but next time, the queue worker starts working.

    So, if a job always fails, it will be retried indefinitely. We may want to consider implementing a threshold for max retries. I think, the queue framework does already provide a mechanism for that. We could then write an alert to the logs and remove the job from the queue.

  • Pipeline finished with Skipped
    4 months ago
    #97223
  • Status changed to Fixed 4 months ago
  • 🇩🇪Germany danielspeicher Steisslingen
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.69.0 2024