Multiple recurring orders get created if cron is badly configured

Created on 17 December 2017, about 7 years ago
Updated 6 August 2024, 6 months ago

Problem/Motivation

Unless cron is very carefully configured, the jobs for order close and renew can get requeued, resulting in multiple copies of a subscription's next recurring order.

For example, if Ultimate Cron is used to configure cron jobs, and Commerce Recurring's cron runs hourly while Advanced Queue's cron runs daily, then every hour, a RecurringOrderClose and RecurringOrderRenew job will be queued for an order that's up for renewal. By the time AQ gets to processing the queue, there will be multiple copies of these jobs. In particular, RecurringOrderRenew will create a new draft recurring order every time it's processed. If these erroneous duplicates are not spotted, customers will be charged multiple times

Proposed resolution

- Fix Advanced Queue so it support unique jobs: Add support for unique jobs Fixed
- Fix our job plugins so they are unique

Original report by Almare

Due to my tests with recurring orders I think I found a misbehavior. I created for testing purposes a billing schedule of 1 hour. My cron is running every 15 minutes. My aim was to create every hour a recurring order for the bought products. The result of the recurring order is now that every 15 minutes recurring orders were created. Please see the attached images for more details. Maybe I made a mistake in my configurations.

🐛 Bug report
Status

Fixed

Version

1.0

Component

Code

Created by

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 5.7
    30:28
    29:48
    Queued
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    Patch Failed to Apply
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update about 1 year ago
    Patch Failed to Apply
  • 🇩🇪Germany Emil Stoianov

    Lets make it Compatible with advancedqueue 1.0 also this related issue is now almost merged https://www.drupal.org/project/advancedqueue/issues/2918866 Add support for unique jobs Fixed

  • 🇮🇱Israel jsacksick

    hm I'd say "UNIQUE_DONT_OVERWRITE" as we don't want to queue duplicate jobs.

  • 🇬🇧United Kingdom jonathanshaw Stroud, UK

    Actually in the latest MR on Add support for unique jobs Fixed the annotation is allow_duplicates = false

  • Status changed to Needs work 8 months ago
  • 🇮🇱Israel jsacksick

    Marking this as needs work, we should favor leveraging the Advancedqueue support for unique jobs instead...

  • First commit to issue fork.
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update 8 months ago
    62 pass, 3 fail
  • Pipeline finished with Failed
    8 months ago
    Total: 224s
    #184218
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update 8 months ago
    70 pass
  • Pipeline finished with Success
    8 months ago
    Total: 185s
    #184396
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update 8 months ago
    70 pass
  • Pipeline finished with Success
    8 months ago
    #184398
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update 8 months ago
    PHPLint Failed
  • Pipeline finished with Success
    8 months ago
    Total: 185s
    #184404
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 5.7
    last update 8 months ago
    PHPLint Failed
  • Pipeline finished with Success
    8 months ago
    Total: 185s
    #184447
  • 🇺🇦Ukraine tbkot

    Patch with the latest changes in MR
    It should applied together with the patch for advancedqueue in Add support for unique jobs Fixed

  • 🇻🇳Vietnam phannphong Ho Chi Minh City

    I've updated the patch to work with PHP 7.x

  • 🇮🇱Israel jsacksick

    @phannphong: Could you upload an interdiff? Also FYI Commerce core itself has been requiring PHP8+ for a long time already.

  • 🇮🇱Israel jsacksick

    So I think the Advancedqueue patch isn't going to fully help here...
    I wish we had flags on the orders (but as actual fields that we could use to detect whether the recurring order was already queued for closing / renewal).

    We could use the order's data for that, but then we'd still keep querying for the order IDS on each cron run, so this isn't ideal... Perhaps better than nothing... Or we could maybe use the "locked" flag for this?

    We lock the order once it's queued for closing and renewal, the advantage of this solution is that we'd stop refreshing the order. Since it was supposedly queued for closing and renewal, it shouldn't be refreshed in subsequent order loads?

  • Status changed to Needs review 7 months ago
  • 🇮🇱Israel jsacksick

    Could anyone confirm if the attached patch helps? We'd then skip relying on Advancedqueue for that...

  • 🇳🇿New Zealand john pitcairn

    Hmm. We are allowing users with failing payments to put the recurring order back through checkout to update card details and pay it immediately.

    We can unlock it before checkout but if the user abandons the process that might present a problem at next queueing query (every 15 minutes for us).

  • 🇮🇱Israel jsacksick

    We can unlock it before checkout but if the user abandons the process that might present a problem at next queueing query (every 15 minutes for us).

    A job will be requeued in this case yes, but I don't think that is a problem? Because the initial job would be marked as a failure, so technically a "new" job would be requeued and it'd also fail.

    We have other options...

    1. Set data flag on orders (e.g: $order->setData('commerce_recurring_order_close_queued', TRUE), the problem is orders would still be queried for "nothing" and would then be filtered out in the loop
    2. Add a boolean field? Not sure how we'd name it? Just "queued"? "queued_for_closing"?

    In any case, I don't think relying on Advancedqueue is the answer to this problem anymore. Ideally the fix would allow us to filter out orders already queued at the query level.

  • 🇳🇿New Zealand john pitcairn

    Because the initial job would be marked as a failure

    No it's still in dunning retry, ie needs_payment.

  • 🇮🇱Israel jsacksick

    @John Pitcairn: I was referring to the Advancedqueue job state, not the order state? Not really sure what you're reporting here then? Because if the order has the "needs_payment" state, then it won't be re-queued the next time? Since we're only queuing draft orders?

  • 🇳🇿New Zealand john pitcairn

    OK so the state would prevent a requeue for us.

    What else does a locked order imply though? What is that flag actually for?

    I'd worry about further development deciding to assume only locked orders are in the queue. I also think there was some reason we don't save the order prior to putting it into checkout, which we'd have to to unlock it.

    If the orders are already getting loaded and looped through then filtering on order data seems reasonable. If not and we are only querying for IDs then it probably should be a field for performance?

  • 🇳🇿New Zealand john pitcairn

    Sorry cross post ... wouldn't a boolean (or lock) to indicate queueing need an update hook to set that for already queued orders? Is that feasible?

  • 🇮🇱Israel jsacksick

    wouldn't a boolean (or lock) to indicate queueing need an update hook to set that for already queued orders?

    That isn't really feasible, because the data is serialized in the advancedqueue table, if the queue isn't cleaned up for many orders, that potentially means reading millions of rows, so not a good idea.

    A locked order for example isn't refreshed on load. And if you're allowing your customers to resume checkout, you'd have to unlock them.
    The data flag would work as I said, but we'd first query them for nothing.
    Also, we could eventually skip saving the orders and just perform a direct update query to the DB (which would be more performant).

  • 🇬🇧United Kingdom jonathanshaw Stroud, UK

    Using lock for this makes me nervous, as per #88. It seems like an insufficiently robust solution, potentially with unexpected consequences or interactions with custom code, and generally increasing the WTF of an already complex mechanism.

    What about #62:

    I wonder if RecurringOrderManager::createOrder() should refuse to create an order if that would create two orders for the same subscription with the same billing period. Simple to enforce.

    Then we wouldn't be relying on AdvancedQueue. We could still use it's mechanism to avoid duplicates, but we'd not be reliant on it.

  • 🇮🇱Israel jsacksick

    @jonathanshaw: How is that related? This is about ensuring we don't queue twice the job for renewing / closing the recurring order for the current billing cycle.
    In other words, there is code invoked and multiple queue jobs before we even reach RecurringOrderManager::createOrder().
    Whatever fix we pick has to happen from the cron service and should prevent the jobs from being queued more than once. We can pick the data flags, this is the easiest solution but it means we can't filter out orders that shouldn't be renewed / closed at the query level, which would be better performance wise.

  • 🇬🇧United Kingdom jonathanshaw Stroud, UK

    My thinking was the duplicate queue jobs would be harmless if RecurringOrderManager::createOrder() had the check I suggested.
    I agree the order data seems like a natural place to store a flag.
    Regarding performance, I wonder if commerce will one day change the data field to json and then we will be able to query it, something core is working on supporting Add "json" as core data type to Schema and Database API Needs work

  • 🇳🇿New Zealand john pitcairn

    Probably silly idea: add a "queued" state?

  • 🇮🇱Israel jsacksick

    Probably silly idea: add a "queued" state?

    Not a good idea, the workflow can be changed.

    Maybe we'll just go with the data flag... And in theory, if the queue is properly processed, we shouldn't be loading the same orders over and over...

  • 🇬🇧United Kingdom jonathanshaw Stroud, UK

    How about we proceed with the unique queue support here, as it seems desirable even if not a complete solution. And do the data flag in a seperate issue?

  • 🇮🇱Israel jsacksick

    The "unique" job support that was introduced isn't really helping us...
    It wouldn't prevent a job to be queued if it failed...

    See 🐛 In some cases the duplicated jobs are created Active .

  • Status changed to Needs work 6 months ago
  • 🇳🇴Norway zaporylie

    Shall we, at the very least, also include a test to ensure no duplicates are created? I read carefully through this issue comments and it seems to me like #83 is the closest to the perfect test.

    Performance issues are being discussed, especially in #69 - #73 with advice to work on them in a separate issue and not include in the scope here. I am a bit unsure about the latest proposal in #102 as it may close the door to making performance improvements in the future. This is a known limitation of the proposed approach as mentioned several times between #89 and #102.

    This is a big pain, especially for merchants handling all their recurring payments on a frequent schedule (ex. hourly) or all at the same time (ex. first day of the month).

    We could still go with the solution proposed in #102 if we plan to transition data property into json field as mentioned in #97 but given we're probably talking months if not years here I would rather opt for boolean field.

  • 🇮🇱Israel jsacksick

    #83 is leveraging the Advancedqueue patch, but once again, I'm no longer convinced relying on Advancedqueue to fix duplicates is the answer.
    Also, we're collecting subscriptions by passing the order to the RecurringOrderManager, so looks like we can't easily avoid loading the orders without rewriting the cron job completely.

    Additionally, #102 is a pragmatic approach to an issue that was opened 7 years ago in an attempt to not drag this further and revisit this later since we can't really find an agreement on a "perfect" solution.

    I think the boolean makes more sense too, but it seems there were reservations about this approach too.

  • Status changed to Needs review 6 months ago
  • 🇳🇴Norway zaporylie

    I was only thinking about the subset of the test from #83, not the Unique AQ feature implementation. That being said I am not strongly opposed to implementing it as a safety measure, in addition to your proposed solution. It doesn't have to be done here and now though, could be done in a follow-up. That would address some concerns raised between #59 - #62

    In regards to the boolean field, the only concern I see is in #93 and it's about setting a default value. I think this concern is valid whether we use lock, data, or boolean field. So if we don't change the query aspect of cron for now and leave it for a follow-up we may still be better, in a future-proof way, with boolean than data param. At least that's my opinion.

    Anyways. combining #83 with #102 I was able to see that testing for running the cron twice (before running AQ cron) fixes the issue. I can still think of some scenarios where this can help - ex multiple instances of cron running at the same time but that's partially mitigated by cron lock.

    Here are test results:

    zaporylie@drupal-commerce-web:/var/www/html$ ./vendor/bin/phpunit  web/modules/contrib/commerce_recurring/tests/src/Kernel/CronTest.php 
    PHPUnit 9.6.20 by Sebastian Bergmann and contributors.
    
    Runtime:       PHP 8.3.8
    Configuration: /var/www/html/phpunit.xml.dist
    Warning:       Your XML configuration validates against a deprecated schema.
    Suggestion:    Migrate your XML configuration using "--migrate-configuration"!
    
    .....F                                                              6 / 6 (100%)
    
    Time: 00:10.476, Memory: 4.00 MB
    
    There was 1 failure:
    
    1) Drupal\Tests\commerce_recurring\Kernel\CronTest::testJobDuplicates
    Failed asserting that two arrays are equal.
    --- Expected
    +++ Actual
    @@ @@
     Array (
    -    'queued' => 2
    +    'queued' => '4'
     )
    
    /var/www/html/vendor/phpunit/phpunit/src/Framework/Constraint/Equality/IsEqual.php:95
    /var/www/html/web/modules/contrib/commerce_recurring/tests/src/Kernel/CronTest.php:323
    /var/www/html/vendor/phpunit/phpunit/src/Framework/TestResult.php:729
    
    FAILURES!
    Tests: 6, Assertions: 243, Failures: 1.
    
    

    after the patch in #102

    zaporylie@drupal-commerce-web:/var/www/html/web/modules/contrib/commerce_recurring$ cd -
    /var/www/html
    zaporylie@drupal-commerce-web:/var/www/html$ ./vendor/bin/phpunit  web/modules/contrib/commerce_recurring/tests/src/Kernel/CronTest.php 
    PHPUnit 9.6.20 by Sebastian Bergmann and contributors.
    
    Runtime:       PHP 8.3.8
    Configuration: /var/www/html/phpunit.xml.dist
    Warning:       Your XML configuration validates against a deprecated schema.
    Suggestion:    Migrate your XML configuration using "--migrate-configuration"!
    
    ......                                                              6 / 6 (100%)
    
    Time: 00:10.358, Memory: 4.00 MB
    
    OK (6 tests, 243 assertions)
    
    
  • 🇮🇱Israel jsacksick

    In regards to the boolean field, the only concern I see is in #93 and it's about setting a default value. I think this concern is valid whether we use lock, data, or boolean field.

    Regardless of the picked solution, we can't properly backfill I mea, it is ok to me if a "queued" boolean or whatever name we picked has 0 even if the recurring order is placed, what really matters to me is fixing the actual issue and making sure we no longer queue the same recurring order for closing / renewal.

  • 🇳🇴Norway zaporylie

    Do you think we should also change the method of loading the Order entity to OrderStorage::loadForUpdate to avoid race conditions? It kinda makes sense in this scenario and would further fail-safe proof the queuing mechanism. I am basing this on the fact that this issue was initially triggered by how differently people approach setting/splitting cron. That probably is a scope creep and a material for a followup though :)

  • 🇮🇱Israel jsacksick

    Using loadForUpdate() makes sense IMO, I don't really consider this a scope creep. The method didn't previously exist, so why not using it to make this even more robust.

  • Pipeline finished with Failed
    6 months ago
    Total: 186s
    #228150
  • Pipeline finished with Success
    6 months ago
    Total: 183s
    #228153
  • Pipeline finished with Success
    6 months ago
    Total: 215s
    #228200
  • 🇳🇴Norway zaporylie

    MR 31 is based on what we concluded so far:
    - the commerce_recurring_queued boolean field is created - on install or via hook_update_N for existing websites. The field will be set to TRUE via cron
    - Kernel test was introduced to ensure duplication issue is gone
    - commerce requirement is bumped to account for orders now being loaded with OrderStorage::loadForUpdate to ensure the order is not being updated with 2 (or more) processes.

    Some issues that we can work on in follow-up issues given we now have a param we can query against:
    - adjust the order query to avoid orders that are already queued from being processed
    - set a limit on the query to avoid processing more than reasonable number of orders at the same time

  • 🇮🇱Israel jsacksick

    MR looks good to me, but let's see what other people think about it.

    - adjust the order query to avoid orders that are already queued from being processed

    I think we should do that already no? We can probably add a notExists()? Or maybe an OR condition group on notExists() + commerce_recurring_queued != 1.

  • 🇮🇱Israel jsacksick

    I think:

    - set a limit on the query to avoid processing more than reasonable number of orders at the same time

    should be addressed in a followup issue yes.

  • Status changed to RTBC 6 months ago
  • 🇮🇱Israel jsacksick

    Marking this as RTBC still, as we have tests and we've implemented the approach I favored (i.e. the boolean).
    Perhaps we can update the test that was added to confirm the job types that were queued?

    Basically, ensure that we have one "commerce_recurring_order_close" and another job for renewing the order "commerce_recurring_order_renew".

  • Pipeline finished with Success
    6 months ago
    Total: 237s
    #228609
  • Pipeline finished with Success
    6 months ago
    Total: 206s
    #228619
  • 🇳🇴Norway zaporylie

    Perhaps we can update the test that was added to confirm the job types that were queued?

    Added, although we already were testing this in CronTest::testActive(). Nevertheless, I think it makes sense to have an explicit test here and I added something I should have added before - asserting commerce_recurring_queued boolean value - 0 before cron runs, 1 after cron runs.

  • Status changed to Fixed 6 months ago
  • 🇮🇱Israel jsacksick

    Went ahead and merged the request, let's move forward and address what's remaining as followup issues like suggested.

  • 🇳🇴Norway zaporylie

    I added 2 child issues for followups.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024