Move cleanup to cron job to avoid deadlocks

Created on 18 June 2020, almost 5 years ago
Updated 1 March 2024, about 1 year ago

Problem/Motivation

This is something about \Drupal\serial\SerialSQLStorage::generateValueFromName() function. It is used for generating new number for serial field value.

If we describe the algorithm the we will be have this:

  1. Insert new record into serial storage table
  2. Get the last insert ID (sid)
  3. If sid % 10 == 0 then do cleanup - DELETE FROM {serial_storage} WHERE sid < :sid_we_just_generated

The problem happens when we're trying to create a lot of entities in parallel transactions. If we have two transactions which are going to execute clean up we can get a deadlock. The scenario is:

  1. Transaction A inserts the row into storage table and gets sid = 100
  2. Transaction B inserts the row into storage table and gets sid = 110
  3. Transaction B tries to do cleanup by executing DELETE FROM {serial_storage} WHERE sid < 110. This query is locked becuase row sid=100 (which is going to be deleted) is locked in transaction A. And moreover this DELETE statement create another lock.
  4. Transaction A tries to do cleanup by executing DELETE FROM {serial_storage} WHERE sid < 100. We already have a lock for rows sid < 110. So this query is also locked. The database considers it as deadlock and this transaction fails, so transaction B can continue.

The problem is - user sees the error. Another big problem, in logs we have completely different error:

SQLSTATE[42000]: Syntax error or access violation: 1305 SAVEPOINT savepoint_2 does not exist: ROLLBACK TO SAVEPOINT savepoint_2;

(or savepoint_1, the number does not matter). This really confuses.

We see this irrelevant error because of this code:

    catch (\Exception $e) {
      $transaction->rollback();
      watchdog_exception('serial', $e);
      throw $e;
    }

In case of deadlock it's not possible to rollback to savepoint. And this piece of code does everyhing possible to hide real error from the developer, becuase rollback fails and watchdog_exception is never reached. I suppose that someone just copypasted it from the core (becuause in entity save we also have such problem). But this is not something which we are going to fix.

I think it's important to improve logic to do not cause deadlocks.

Proposed resolution

I think the cleanup should not happen when we do inserts. Moreover it's completely illogical to execute DELETE statements when we create new entities.

The most obvious solution is moving cleanup to the cron job. In this case we never get deadlocks.

Remaining tasks

Review, another opition is needed.

User interface changes

None.

API changes

None.

Data model changes

None.

πŸ› Bug report
Status

Needs work

Version

2.0

Component

Code

Created by

πŸ‡ΊπŸ‡¦Ukraine mpolishchuck

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    The 8.x-1.x branch of this project is no longer active, so I am moving this issue to the 2.0.x branch.

    I will create a MR from the patch in #2 and see whether the tests pass.

  • Merge request !9Issue #3152955, Comment #2 β†’ (Open) created by benjifisher
  • Status changed to Needs work about 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States benjifisher Boston area

    Cleaning up during cron seems like a good idea. Thanks for suggesting it.

    I think that #5 is a good point. Let's keep the highest current value in the table. One advantage of this change is that the site owner can use a simple database query to figure out what the next serial will be, and recognize whether the entity with the highest serial has been deleted.

    I would like to get some test coverage for this change. At least test add some entities, delete some entities, run cron, add and delete som more, and make sure that we get the expected serial values at every step. If there is a way to reproduce the deadlock in an automated test, that would be great.

Production build 0.71.5 2024