Strengthen TransactionManager

Created on 2 September 2023, about 1 year ago
Updated 16 September 2023, about 1 year ago

Problem/Motivation

In ๐Ÿ“Œ Refactor transactions Fixed we introduced a new TransactionManager, refactoring the internals of the Database Transaction API.

Now I noticed a couple of edge cases that require fixing:

  1. If you open multiple transactions (e.g. root, savepoint_1, savepoint_2, savepoint_3) and you scope out an intermediate one (e.g. savepoint_1), which is a legitimate operation that translates into a 'RELEASE SAVEPOINT savepoint_1', the later Transaction objects remain hanging and when they get out of scope they try to release their corresponding savepoint but this leads to failures. Also, the transaction stack is left in an inconsistent state.
  2. The same happens if you commit a root transaction while there are still active savepoints. Address that in ๐Ÿ› [PP-1]ย Commiting a transaction while there are still active savepoints leaves the stack in inconsistent state Postponed .
  3. If a DDL statement breaks an active transaction, we currently can clean up the stack (but we will not be able to do it in the mysqli driver that does not have a way to check if a transaction is active), however the Transaction objects remain in scope and can potentially interfere with later transactions when they go out of scope (e.g. because a savepoint_1 Transaction object opened BEFORE the DDL statement may try to release a savepoint_1 opened AFTER the DDL statement).

Steps to reproduce

Proposed resolution

Adjust the internal TransactionManager stack and the Transaction objects to carry a unique ID, and ensure that when a Transaction object is unpiled or rolled back, both Transaction id and name match the info on the stack.

When unpiling a savepoint, remove from the stack the savepoint items opened after that savepoint.

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

๐Ÿ› Bug report
Status

Fixed

Version

11.0 ๐Ÿ”ฅ

Component
Databaseย  โ†’

Last updated about 24 hours ago

  • Maintained by
  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands @daffie
Created by

๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @mondrake
  • last update about 1 year ago
    30,138 pass
  • @mondrake opened merge request.
  • Status changed to Needs review about 1 year ago
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

    Fixes.

  • last update about 1 year ago
    30,138 pass
  • last update about 1 year ago
    30,138 pass
  • Status changed to Needs work about 1 year ago
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

    Need to check if 1) from the IS also applies when you commit the root transaction while there are active savepoints.

  • last update about 1 year ago
    30,129 pass, 1 fail
  • last update about 1 year ago
    30,129 pass, 1 fail
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

    Yes, #4 is a case to fix too.

  • last update about 1 year ago
    30,138 pass
  • Status changed to Needs review about 1 year ago
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

    So the fix for #4, which is correct IMHO, leads to a failure in existing tests. This means a change in behaviour, which needs separate discussion and assessment of all ramifications. Filed a separate ๐Ÿ› [PP-1]ย Commiting a transaction while there are still active savepoints leaves the stack in inconsistent state Postponed issue for that, and reverted the fix here.

  • last update about 1 year ago
    30,138 pass
  • last update about 1 year ago
    30,136 pass, 2 fail
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น
  • last update about 1 year ago
    30,138 pass
  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands daffie

    @mondrake: When I look at the changes in the MR, we are changing the class variable TransactionManagerBase::stack to be used in order. Only its documentation is saying that it is not a LILO stack and yet after the MR it is going to be a used as a LILO stack. Should we add some order in those save points and changes the class variable to be a real LILO stack?

  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

    Well in fact we are not changing, the stack is still not a LIFO one.

    A real LIFO (push/pop) stack should be:

    Operation            Result
    ---------            ------
    push A               A
    push B               A > B
    push C               A > B > C
    push D               A > B > C > D
    pop B                fail, B is not last
    pop D                A > B > C
    pop C                A > B
    pop B                A
    pop A                nil
    

    The current API allows to release an item that is in between the stack, but what we have in HEAD now is something like:

    Operation            Result
    ---------            ------
    push A               A
    push B               A > B
    push C               A > B > C
    push D               A > B > C > D
    unpile B             A > C > D
    unpile A             C > D
    

    as you can see, this leaves gaps in the sequence and hanging items once you believe you have release the root item.

    With this MR, the logic is changed to

    Operation            Result
    ---------            ------
    push A               A
    push B               A > B
    push C               A > B > C
    push D               A > B > C > D
    unpile B             A
    unpile A             nil
    

    which ensures no gaps.

    All this is complicated by the fact that the stack does not contain the Transaction objects themselves, but a non binding reference to them. Because of how the Transaction implementation is done, such items can only live in the scope of the code that requires them - we cannot put a pointer to them in the stack otherwise they would not be destructed when they get out of scope in the calling code, which is the trigger for the release/commit.

  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands daffie

    @mondrake: According to with this MR the logic changes to:

    Operation            Result
    ---------            ------
    push A               A
    push B               A > B
    push C               A > B > C
    push D               A > B > C > D
    unpile B             A
    unpile A             nil
    

    And I agree that this is the right change.

    My point is that when I look at the implementation of unpile(), it is using the class variable TransactionManagerBase::stack is a LIFO stack.

    The code from the method unpile() is:

      public function unpile(string $name, string $id): void {
        // If the $id does not correspond to the one in the stack for that $name,
        // we are facing an orphaned Transaction object (for example in case of a
        // DDL statement breaking an active transaction), so there is nothing more
        // to do.
        if (!isset($this->stack()[$id]) || $this->stack()[$id]->name !== $name) {
          return;
        }
    
        // If unpiling a savepoint, but that does not exist on the stack, the stack
        // got corrupted.
        if ($name !== 'drupal_transaction' && !$this->has($name)) {
          throw new TransactionOutOfOrderException();
        }
    
        // Release the client transaction savepoint in case the Drupal transaction
        // is not a root one.
        if (
          $this->has($name)
          && $this->stack()[$id]->type === StackItemType::Savepoint
          && $this->getConnectionTransactionState() === ClientConnectionTransactionState::Active
        ) {
          // If we are not releasing the last savepoint but an earlier one, all
          // subsequent savepoints will have been released as well. The stack must
          // be diminished accordingly.
          while (($i = array_key_last($this->stack())) !== $id) {
            $this->removeStackItem($i);
          }
          $this->releaseClientSavepoint($name);
        }
    
        // Remove the transaction from the stack.
        $this->removeStackItem($id);
    
        // If this was the last Drupal transaction open, we can commit the client
        // transaction.
        if (
          $this->stackDepth() === 0
          && $this->getConnectionTransactionState() === ClientConnectionTransactionState::Active
        ) {
          $this->processRootCommit();
        }
      }
    

    The while loop is popping StackItems of the stack until it reaches the required StackItem. It is using the class variable TransactionManagerBase::stack is a LIFO stack! Can we at least document the class variable TransactionManagerBase::stack that it is used as a LIFO stack. Maybe do some more to make sure that something else does not changes the order in the class variable. The transactions themselfs are not LIFO, just the class variable TransactionManagerBase::stack.

    The rest of the patch is great.

  • last update about 1 year ago
    Custom Commands Failed
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

    Ah I see you point now @daffie, thanks for that. Tried to clarify in the docblock.

  • last update about 1 year ago
    30,139 pass
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

    'descoping' looks like a valid word to me, https://en.wiktionary.org/wiki/descoping

  • last update about 1 year ago
    30,139 pass
  • last update about 1 year ago
    30,139 pass
  • Status changed to RTBC about 1 year ago
  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands daffie

    All changes look good to me.
    All my remarks have been addressed.
    For me it is RTBC.

    As the base transactions have already been merged with the D11 branch and without this fix there is the possibility of data loss for a site. I am changing the priority to critical.

  • last update about 1 year ago
    30,149 pass
  • last update about 1 year ago
    30,149 pass
  • last update about 1 year ago
    30,153 pass
  • Status changed to Fixed about 1 year ago
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    This is tricky stuff but it's great to see the documentation and test coverage improve as well as the code clean-up, feels like major refurbishment to the database system of the past year or two.

    Committed/pushed to 11.x, thanks!

    • catch โ†’ committed 70b607c3 on 11.x
      Issue #3384960 by mondrake, daffie: Strengthen TransactionManager
      
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch
  • Status changed to Needs work about 1 year ago
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    Something wrong here:

        Install site\n
        TypeError:
        Drupal\Core\Database\Transaction\TransactionManagerBase::removeStackItem():
        Argument #1 ($id) must be of type string, int given, called in
        /builds/project/drupal/core/lib/Drupal/Core/Database/Transaction/TransactionManagerBase.php
    

    See https://drupal-gitlab-job-artifacts.s3.us-west-2.amazonaws.com/c0/78/c07...

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    Reverted for now.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    Rebasing this on 11.x should hopefully show the test failure in the MR pipeline to help track down the differences.

  • last update about 1 year ago
    27,361 pass, 992 fail
  • last update about 1 year ago
    Build Successful
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

    Sorry in which job is there a failure? is it possible to get a backtrace? the link does not work for me

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    Sorry I think MR pipelines only allow access if you get push access, this should be changing overnight though according to slack.

    Found a DrupalCI one, different to the test I linked but same error:

    https://dispatcher.drupalci.org/job/drupal8_core_regression_tests/86504/...

    1)  โœ– NightwatchAssertError
    16:29:58    Command failed: php ./scripts/test-site.php install --setup-file "core/tests/Drupal/TestSite/TestSiteOliveroInstallTestScript.php" --install-profile "minimal"  --base-url http://php-apache-jenkins-drupal8-core-regression-tests-86504/subdirectory --db-url mysql://drupaltestbot:drupaltestbotpw@172.18.0.4/jenkins_drupal8_core_regression_tests_86504 --json
    16:29:58 PHP Fatal error:  Uncaught TypeError: Drupal\Core\Database\Transaction\TransactionManagerBase::removeStackItem(): Argument #1 ($id) must be of type string, int given, called in /var/www/html/core/lib/Drupal/Core/Database/Transaction/TransactionManagerBase.php on line 228 and defined in /var/www/html/core/lib/Drupal/Core/Database/Transaction/TransactionManagerBase.php:145
    16:29:58 Stack trace:
    16:29:58 #0 /var/www/html/core/lib/Drupal/Core/Database/Transaction/TransactionManagerBase.php(228): Drupal\Core\Database\Transaction\TransactionManagerBase->removeStackItem(6503259739987)
    16:29:58 #1 /var/www/html/core/lib/Drupal/Core/Database/Transaction.php(88): Drupal\Core\Database\Transaction\TransactionManagerBase->unpile('savepoint_1', '6503259739987')
    16:29:58 #2 /var/www/html/core/lib/Drupal/Core/Menu/MenuTreeStorage.php(810): Drupal\Core\Database\Transaction->__destruct()
    16:29:58 #3 /var/www/html/core/lib/Drupal/Core/Menu/MenuTreeStorage.php(143): Drupal\Core\Menu\MenuTreeStorage->saveRecursive('filter.tips_all', Array, Array)
    16:29:58 #4 /var/www/html/core/lib/Drupal/Core/Menu/MenuLinkManager.php(158): Drupal\Core\Menu\MenuTreeStorage->rebuild(Array)
    16:29:58 #5 /var/www/html/core/lib/Drupal/Core/EventSubscriber/MenuRouterRebuildSubscriber.php(90): Drupal\Core\Menu\MenuLinkManager->rebuild()
    16:29:58 #6 /var/www/html/core/lib/Drupal/Core/EventSubscriber/MenuRouterRebuildSubscriber.php(78): Drupal\Core\EventSubscriber\MenuRouterRebuildSubscriber->menuLinksRebuild()
    16:29:58 #7 [internal function]: Drupal\Core\EventSubscriber\MenuRouterRebuildSubscriber->onRouterRebuild(Object(Drupal\Component\EventDispatcher\Event), 'routing.route_f...', Object(Drupal\Component\EventDispatcher\ContainerAwareEventDispatcher))
    16:29:58 #8 /var/www/html/core/lib/Drupal/Component/EventDispatcher/ContainerAwareEventDispatcher.php(111): call_user_func(Array, Object(Drupal\Component\EventDispatcher\Event), 'routing.route_f...', Object(Drupal\Component\EventDispatcher\ContainerAwareEventDispatcher))
    16:29:58 #9 /var/www/html/core/lib/Drupal/Core/Routing/RouteBuilder.php(197): Drupal\Component\EventDispatcher\ContainerAwareEventDispatcher->dispatch(Object(Drupal\Component\EventDispatcher\Event), 'routing.route_f...')
    16:29:58 #10 /var/www/html/core/lib/Drupal/Core/ProxyClass/Routing/RouteBuilder.php(83): Drupal\Core\Routing\RouteBuilder->rebuild()
    16:29:58 #11 /var/www/html/core/includes/install.core.inc(1878): Drupal\Core\ProxyClass\Routing\RouteBuilder->rebuild()
    16:29:58 #12 /var/www/html/core/includes/install.core.inc(707): install_finished(Array)
    16:29:58 #13 /var/www/html/core/includes/install.core.inc(578): install_run_task(Array, Array)
    16:29:58 #14 /var/www/html/core/includes/install.core.inc(121): install_run_tasks(Array, NULL)
    16:29:58 #15 /var/www/html/core/lib/Drupal/Core/Test/FunctionalTestSetupTrait.php(290): install_drupal(Object(Composer\Autoload\ClassLoader), Array)
    16:29:58 #16 /var/www/html/core/tests/Drupal/TestSite/Commands/TestSiteInstallCommand.php(235): Drupal\TestSite\Commands\TestSiteInstallCommand->doInstall()
    16:29:58 #17 /var/www/html/core/tests/Drupal/TestSite/Commands/TestSiteInstallCommand.php(225): Drupal\TestSite\Commands\TestSiteInstallCommand->installDrupal()
    16:29:58 #18 /var/www/html/core/tests/Drupal/TestSite/Commands/TestSiteInstallCommand.php(123): Drupal\TestSite\Commands\TestSiteInstallCommand->setup('minimal', 'Drupal\\TestSite...', 'en')
    16:29:58 #19 /var/www/html/vendor/symfony/console/Command/Command.php(326): Drupal\TestSite\Commands\TestSiteInstallCommand->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
    16:29:58 #20 /var/www/html/vendor/symfony/console/Application.php(1063): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
    16:29:58 #21 /var/www/html/vendor/symfony/console/Application.php(320): Symfony\Component\Console\Application->doRunCommand(Object(Drupal\TestSite\Commands\TestSiteInstallCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
    16:29:58 #22 /var/www/html/vendor/symfony/console/Application.php(174): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
    16:29:58 #23 /var/www/html/core/scripts/test-site.php(24): Symfony\Component\Console\Application->run()
    16:29:58 #24 {main}
    16:29:58   thrown in /var/www/html/core/lib/Drupal/Core/Database/Transaction/TransactionManagerBase.php on line 145
    16:29:58 
    16:29:58    โœ– 17) Olivero/oliveroSearchFormTest
    16:29:58 
    
  • last update about 1 year ago
    30,148 pass, 2 fail
  • Open in Jenkins โ†’ Open on Drupal.org โ†’
    Environment: PHP 8.1 & MariaDB 10.3.22
    last update about 1 year ago
    30,145 pass, 2 fail
  • last update about 1 year ago
    30,050 pass, 11 fail
  • last update about 1 year ago
    27,667 pass, 970 fail
  • last update about 1 year ago
    30,112 pass, 8 fail
  • last update about 1 year ago
    27,598 pass, 980 fail
  • last update about 1 year ago
    30,127 pass, 8 fail
  • last update about 1 year ago
    30,157 pass
  • last update about 1 year ago
    30,161 pass
  • last update about 1 year ago
    30,161 pass
  • Open in Jenkins โ†’ Open on Drupal.org โ†’
    Environment: PHP 8.1 & MariaDB 10.3.22
    last update about 1 year ago
    30,161 pass
  • last update about 1 year ago
    30,161 pass
  • last update about 1 year ago
    30,153 pass, 1 fail
  • last update about 1 year ago
    30,161 pass
  • last update about 1 year ago
    30,161 pass
  • last update about 1 year ago
    30,161 pass
  • Status changed to Needs review about 1 year ago
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

    So this fragment of the backtrace

    16:29:58 #0 /var/www/html/core/lib/Drupal/Core/Database/Transaction/TransactionManagerBase.php(228): Drupal\Core\Database\Transaction\TransactionManagerBase->removeStackItem(6503259739987)
    16:29:58 #1 /var/www/html/core/lib/Drupal/Core/Database/Transaction.php(88): Drupal\Core\Database\Transaction\TransactionManagerBase->unpile('savepoint_1', '6503259739987')

    gave me the hint on the problem: variable type strictness. Either PHP internal storage of array key is not type safe (i.e. if the key is representing a number it is stored as an int even if passed in as a string), or array_last_key() returns an int on the same circumstance. Both cases would be PHP's headaches though, so here I just made comparison looser and ensured the value returned by array_last_key is always cast to string.

    Strict typing can be best friend or worst enemy, even to PHP itself...

  • Status changed to RTBC about 1 year ago
  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands daffie

    The problem with the testbot has been addressed.
    Back to RTBC.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom longwave UK

    Re #23 see https://www.php.net/manual/en/language.types.array.php

    Additionally the following key casts will occur:

    Strings containing valid decimal ints, unless the number is preceded by a + sign, will be cast to the int type.

  • Status changed to Fixed about 1 year ago
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    Committed/pushed to 11.x, thanks!

  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

    @longwave thanks for #25.

    Another puzzle is why uniqid() is returning ints, quite evidently on some PHP configurations only. In any of my testing I was getting an hex string.

    Maybe we should think about using something alternative to uniqid(), there quite some literature about itโ€ฆ I picked it because itโ€™s low in computation and we only need uniqueness within each request.

    Thanks also @catch for #22.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States mfb San Francisco

    uniqid() is based purely on the time with microseconds, so uniqid() won't be unique if you call it more than once per microsecond or the system time changes due to NTP clock synchronization etc. It should be possible to calculate how often it returns all decimal ints if someone is inclined :)

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    It's already used elsewhere in the database API, but we should use the $more_entropy second argument in TransactionManagerBase.

    *
    Query/SelectExtender.php:    $this->uniqueIdentifier = uniqid('', TRUE);
    Query/SelectExtender.php:    $this->uniqueIdentifier = uniqid('', TRUE);
    Query/Query.php:    $this->uniqueIdentifier = uniqid('', TRUE);
    Query/Query.php:    $this->uniqueIdentifier = uniqid('', TRUE);
    Schema.php:    $this->uniqueIdentifier = uniqid('', TRUE);
    Schema.php:    $this->uniqueIdentifier = uniqid('', TRUE);
    Transaction/TransactionManagerBase.php:    $id = uniqid();
    
    

    Opened ๐Ÿ› Use more entropy in TransactionManager Needs review

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024