Multiple cron events not firing consistently

Created on 31 July 2024, 4 months ago
Updated 26 August 2024, 3 months ago

Problem/Motivation

My problem is that I have 2 models, both triggering off cron events, and:

1. If the cron frequency for both events is set to * * * * * (always), then only one of the two models fires.
2. If the cron frequency is set for (ostensibly) any other frequency, like midnight 0 0 * * *, neither of them will fire.

It might be that the id of the Cron Event's timestamp in the state. It seems to always be timestamp.cron-eca_base_eca_cron, and there is only one of them. I would expect to see a unique timestamp per Cron Event.

Steps to reproduce

Setup models attached.
Run cron.
Note that only one the first model fires.

Proposed resolution

Fix the bug.

πŸ› Bug report
Status

Fixed

Version

2.1

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States coderdan

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @coderdan
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    This is a great find and your assumption with the state key is exactly right. Instead of using individual keys, which could lead to many orphaned keys eventually, we could get and set that single key only once per request. I'll start an MR on this to see if that resolves the issue.

  • Status changed to Needs review 4 months ago
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    The ID is currently always the same. But that may change when another action plugin extends the existing one. That would then introduce a different ID. To be prepared for that, let's read the last run value for each ID only once per PHP request. That should resolve the is-due calculation, even for multiple events with the same ID.

  • Pipeline finished with Success
    4 months ago
    Total: 405s
    #239731
  • Status changed to RTBC 4 months ago
  • πŸ‡ΊπŸ‡ΈUnited States coderdan

    Thanks @jurgenhaas! I tested this on both a fresh D10 site with version 2.0 of ECA, and a production site using 1.1.5.

  • Pipeline finished with Skipped
    4 months ago
    #240294
  • Status changed to Fixed 4 months ago
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    Thank you @coderdan, this is now merged into 2.1 and back ported into 2.1 and 1.1

  • Status changed to Needs work 4 months ago
  • πŸ‡©πŸ‡ͺGermany mxh Offenburg

    Sorry but I doubt this is the right solution. Why does the static property solve it?

    For me the commit looks like the first value that will come wins, meaning all others will use the same value (which is wrong because it may vary per configured event). That means, if the first event is not holding a value for being due to processing, all others won't be due too. Can you please verify this? Best would be to cover this with a test.

  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    Here is why that solves the issue: $lastRun is the key for the determination if the current cron event is due. Before the static variable, subsequent cron events within the same PHP request received the last cron of the previous cron event in the same request. Now, with the static, each cron event receives the same value for $lastRun, just as if it were the first to be triggered.

    +1 for adding a test on this.

  • πŸ‡©πŸ‡ͺGermany mxh Offenburg

    Thanks for the explanation. However, this means that subsequent cron events are using the same timestamp when they have the same event ID. What we have here are multiple event instances - they are different ones but they hold the same ID (machine name), that's why there is a collision.

    We are updating the last-run timestamp on purpose within Drupal\eca_base\Plugin\ECA\Event\BaseEvent::appliesForWildcard and setting a one-time static variable will not respect subsequent updates of that timestamp (within the same PHP process). Yes PHP processes are usually short-lived, however it's still not correct logical-wise.

    The solution path would be to add in a more granular state key, I think ECA ID + event ID combined should work.

  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    The last run timestamp we save is NOT the last time when each individual cron event was executed. It tells, when then Drupal cron dispatched the cron event the last time.

    More granular details are not required when we work with the cron notation that defines day and time patterns for when a cron job is due. The most granular pattern is * * * * *, i.e. the cron is supposed to run every minute. If it's 11:29 now and the last run was 11:28, then the job is due. But if the last run was 11:29, then it's not due. This special case only caused the issue as there have been more than 1 cron pattern like that, where the second one didn't run because the last run value was updated too soon, i.e. before the second one had a chance to test.

    In detail:

    • Last run was 11:28
    • It's now 11:29
    • The first * * * * * cron checks is due, returns true and updates last run to 11:29
    • Then the second * * * * * cron checks as well and returns false because it has to check against the updated 11:29, and not the 11:28
    • With the static timestamp both of them test against 11:28 and both return true
  • πŸ‡©πŸ‡ͺGermany mxh Offenburg

    The last run timestamp we save is NOT the last time when each individual cron event was executed. It tells, when then Drupal cron dispatched the cron event the last time.

    Yes, it stores the timestamp of dispatch. But that timestamp is supposed to be related to each configured cron event ("modeler event") as documented and implemented within Drupal\eca_base\Plugin\ECA\Event\BaseEvent::appliesForWildcard:

    <?php
      /**
       * {@inheritdoc}
       *
       * Verifies if this event is due for the next execution.
       *
       * This event stores the last execution time for each modeller event
       * identified by $id and determines with the given frequency, if and when
       * this same event triggered cron should be executed.
       */
      public static function appliesForWildcard(Event $event, string $event_name, string $wildcard): bool {
        if ($event instanceof CronEvent) {
          [$id, $frequency] = explode('::', $wildcard, 2);
          if ($event->isDue($id, $frequency)) {
            $event->storeTimestamp($id);
            return TRUE;
          }
        }
    // ...
    }
    ?>

    So what's happening right now is a collision when two configured events share the same ID. Using the static variable does not make sense from a logical view. Instead, we need to resolve that collision - for example by adding the ECA config ID into the state key.

  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    There are no individual IDs for cron events. And there is no need for individual IDs as I tried to explain above. The given pattern is sufficient to determine if a cron is due. Using an ID at this point seems to be a matter of unintentional overengineering. Unless there is a need for the ID? I just don't see that.

  • πŸ‡©πŸ‡ͺGermany mxh Offenburg

    There are no individual IDs for cron events.

    Please have a look into the code in #14. There is an $id variable extracted from the wildcard. And the wildcard is being generated within Drupal\eca_base\Plugin\ECA\Event\BaseEvent::generateWildcard as the following:

    <?php
          case 'eca_cron':
            return $ecaEvent->getId() . '::' . $ecaEvent->getConfiguration()['frequency'];
    ?>

    Meaning $ecaEvent->getId() -> That's the event ID. It is an individual ID but it may happen that it has the same value within multiple ECA configurations. And as can be seen in #14 - this is being checked for the event instance using that ID. We can avoid that collision using the ECA config ID additionally when the state key is being generated.

  • πŸ‡©πŸ‡ͺGermany mxh Offenburg

    Using an ID at this point seems to be a matter of unintentional overengineering. Unless there is a need for the ID? I just don't see that.

    It has been working until there is the collision with the same ID. When that collision is resolved (for example using additionally the ECA config ID), then we don't run into any danger generating more problems because of changing the main mechanic that is already implemented.

    This is my recommendation and understanding of how we may properly resolve this. You may decide of course what's the right thing.

  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    $ecaEvent->getId() always returns eca_base_eca_cron for the cron event, as we receive the plugin ID, not the configured ID in the UI. And for cron events, that's sufficient. The functionality does not require individual IDs for this state as described above. Please describe otherwise why individual IDs would be required.

  • πŸ‡©πŸ‡ͺGermany mxh Offenburg

    $ecaEvent->getId() always returns eca_base_eca_cron for the cron event, as we receive the plugin ID, not the configured ID in the UI.

    It's not the plugin ID that is being returned. It's the event ID that can be configured via UI.

    You can check it by creating a sample cron model with BPMN modeler, then debug what is happening within Drupal\eca_base\Plugin\ECA\Event\BaseEvent::appliesForWildcard and which value the $id variable holds.

  • πŸ‡©πŸ‡ͺGermany mxh Offenburg

    The functionality does not require individual IDs for this state as described above. Please describe otherwise why individual IDs would be required.

    Individual IDs are required. Yo need to hold the last run timestamp for each cron event run, because you may have individual timing settings.

    Let's say we have two configured cron events. The first one runs every minute, the other one runs every hour. Let's say they share the same last-run timestamp. Then the consequence is the following:
    * First one will be running, last-run time is set to the last minute.
    * Second one will not be running because the shared timer said it ran the last minute.

    That's not yet involving the static variable that got introduced now. That itself is adding a further problem: it does not get updated on runtime.

    With one timestamp per configured event, we don't have that collision. It is already implemented in the right way, but it needs further entropy, e.g. by adding the ECA config ID.

  • Assigned to danielspeicher
  • πŸ‡©πŸ‡ͺGermany danielspeicher Steisslingen
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    You can check it by creating a sample cron model with BPMN modeler, then debug what is happening within Drupal\eca_base\Plugin\ECA\Event\BaseEvent::appliesForWildcard and which value the $id variable holds.

    The plugin ID of the cron event is eca_base:eca_cron. In the attached models of the OP, the event ID is eca_base_eca_cron.

    I stand corrected, thanks for pointing that out. I let myself misled by the eca_base_eca_cron and was guessing that this was the plugin ID without checking. Sorry.

    So, we do get individual IDs, until the user manually creates identical IDs. But since my attempt aims to even support ID collision, that wouldn't matter. Of course, only if my attempt holds true.

    Let's see this example from @mxh:

    Let's say we have two configured cron events. The first one runs every minute, the other one runs every hour. Let's say they share the same last-run timestamp. Then the consequence is the following:
    * First one will be running, last-run time is set to the last minute.
    * Second one will not be running because the shared timer said it ran the last minute (this one has a one hour time range).

    So, the first cron pattern is * * * * * and the second cron pattern is 0 * * * *. What will happen there is that the Drupal cron at 10:59 will only run the first, and the next cron run (either at 11:00 or later) will execute both of them because the last run is stored as 10:59, which is why both will be due next time. Please proof me wrong, I can't find the problem myself.

    That's not yet involving the static variable that got introduced now. That itself is adding a further problem: it does not get updated on runtime.

    That's on purpose and the reason why the ID collision doesn't matter. All events are operating on the same lastRun timestamp because the don't get the updated one. However, the timestamp does get updated so that the next Drupal cron will then operate on the new one.

    With one timestamp per configured event, we don't have that collision. It was already implemented in the right way, but it needs further entropy, e.g. by adding the ECA config ID.

    That could be done, yes. Still, I'd like to validate the logic above, also because this would be important for the test(s) to be written.

    A follow-up task would then be to discuss if and how state values should eventually be cleaned-up so that we don't leave garbage behind.

  • Pipeline finished with Success
    4 months ago
    Total: 478s
    #241772
  • Pipeline finished with Failed
    4 months ago
    Total: 431s
    #241775
  • Issue was unassigned.
  • Status changed to Needs review 4 months ago
  • πŸ‡©πŸ‡ͺGermany mxh Offenburg

    I've taken some time and wrote a test that represents what I described in #20. I added the test into the already merged issue fork branch. For that I created MR !444 that is not meant for merging, just for showcasing the test itself. It shows the following results:

    • In that issue fork, the test fails already in line 155, mainly because of the static property.
    • When checking out tag 2.0.1 and add the same test there, the test fails at line 206 - representing the bug reported by the OP in this issue.

    I have implemented my suggested approach by adding more entropy using the ECA config ID and put it into MR !443. In that MR, basically the same test is there as well, only adjusted by the further added entropy.

    I hope the test makes clear why we need the IDs and using the same last-run timestamp for every cron event won't work.

    Also feel free to try out and review MR !443.

  • Pipeline finished with Success
    4 months ago
    Total: 449s
    #241835
  • Pipeline finished with Failed
    4 months ago
    Total: 451s
    #241840
  • Assigned to danielspeicher
  • πŸ‡©πŸ‡ͺGermany mxh Offenburg

    Sorry @danielspeicher I accidentally unassigned you. Feel free to continue on this if you want. You can of course also add more tests / re-add the tests tag if you think this needs more test coverage.

  • Issue was unassigned.
  • πŸ‡©πŸ‡ͺGermany danielspeicher Steisslingen

    No problem. I focus on the other issues.

  • Status changed to Needs work 4 months ago
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    The problem with this test architecture is that the Drupal cron is run multiple times within a single PHP request. That's not what happens during runtime. Therefore, the static variables behave differently. Also, crons can only run once per minute, while each cron is a fresh PHP request. Assumption: we can only run \Drupal::service('cron')->run() once during unit tests to find out if everything works correct. We should be able to have enough cron events in ECA models to find out about each scenario.

  • πŸ‡©πŸ‡ͺGermany mxh Offenburg

    The problem with this test architecture is that the Drupal cron is run multiple times within a single PHP request.

    I'm sorry but I currently disagree with this argument. Testing against multiple cron runs within one runtime is valid. For example, Drupal\Tests\layout_builder\FunctionalJavascript\InlineBlockTest::testDeletion also runs cron multiple times within one test. There are various other tests in Drupal Core that run cron multiple times.

    You're now assuming that cron runs only one time per request / runtime. But there is no assertion which guarantees that. There is even a test in core that tests against running within the same request: Drupal\Tests\system\Kernel\System\CronQueueTest::testLeaseTime

  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    You're now assuming that cron runs only one time per request / runtime.

    Yes, for runtime context. Not talking about test contexts.

    But there is no assertion which guarantees that.

    That's correct, but there is common sense and there is the actual code. Cron is being executed through a link, through a form submit, through cron.php or with drush, and each of them call that run in the cron service exactly once.

    Anyways, the test is not mainly about the static variable. It's proving that the ID collision is real

    It's not, as cron executes the run method only once per request/runtime while the test behaves differently. Multiple runs per request/runtime are hypothetical, not real. With that in mind, the ID collision isn't an issue.

  • πŸ‡©πŸ‡ͺGermany mxh Offenburg

    I'm 100% sure the ID collision is real and it needs to get properly fixed.
    The reasons why I'm sure about this are already written in my previous comments.

    The added test proves ID collision in the following way:

    The test itself contains the same problematic configuration as provided in the IS: sharing same cron event IDs.
    The test failures are not coming from the static variable or because cron is run multiple times per runtime.
    The test fails because of the ID collision.

    More entropy has been added in MR443 and that is why the test is green.
    I won't rewrite the test because I'm convinced it's proving my arguments properly, like other core tests also run cron multiple times per runtime.

    I'm a bit frustrated that I don't get any support here. When not properly fixed, this will mean the cron events won't trigger reliable anymore.
    At least I know what to do then (which means patching), but I'd like to have this problem be avoided for others. For that I've spent a lot of time already, explaining, giving examples and finally providing a possible fix including a test.

  • Status changed to Needs review 4 months ago
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    I don't want to frustrate anyone, just want to be clear.

    I'm not saying the ID collision doesn't happen. I'm saying that the collision doesn't matter if the cron run is only called once per PHP request/runtime. That scenario is described in #22 and that is what currently gets executed in Drupal installations. That description has not (yet) been proven wrong and the test operates under different circumstances, that are not happening in real life, at this point.

    The improvements in your MRs are great, though. I'm happy that we include them as well into future versions of ECA, at least as a safe-guard if anything for the core run scenario would ever be changed in future core versions.

    Therefore, my suggestion is to keep this issue as is, since the bug report is being fixed and confirmed to be working, even with the ID collision.

    The extra's in the MRs should be handled in separate issues as feature requests and then only go into 2.1 without back porting.

  • Status changed to Fixed 3 months ago
  • πŸ‡©πŸ‡ͺGermany jurgenhaas Gottmadingen

    I've created the follow-up issue ✨ Remove potential ID collision in ECA cron events Active for the enhancement to remove the ID collision and mark this one as fixed for the original report.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024