Database ping, close, and reconnect capabilities

Created on 19 December 2024, 3 days ago

This issue outlines a necessity for adding the ability for Drupal to add capabilities to manipulate the database connection, including:

  • Testing if connection is alive, particularly useful for long lived commands & scripts
  • Close the database connection, in order to free up infrastructure resources
  • Open the database connection, aka reconnect. For when detecting the connection was lost, or forcefully closed.

Problem/Motivation

When operating Symfony Messenger , we've found that with MySQL configurations, eventually the connection will fail. With neither Drupal or PHP attempting to reconnect. We've also found the time before failure can be any of 5 minutes, or 8 hours, dependent on the platform code is run on, specifically the wait_timeout configuration.

Not only is this a problem with Messenger, but any long lived CLI command, including custom commands, and migrations.

I expect as we transition to adopting ReactPHP / Revolt, this kind of problem will make itself more apparent. ( 🌱 [PP-1] Adopt a library like amp or provide async management primitives in Drupal core Postponed , #2218651: [meta] Make Drupal compatible with persistent app servers like ReactPHP, PHP-PM, PHPFastCGI, FrankenPHP, Swoole , 🌱 Adopt the Revolt event loop for async task orchestration Active )

Proposed resolution

The way I see it, first we need a way for Drupal to:

  • Ping each database, to test if connection is alive
  • Close the connection
  • Open the connection

This would allow contrib or custom code which know they execute long lived tasks, to be able to ping the connection and reconnect. This strategy is what Symfony Messenger uses. When a task is being fetched and run, a early middleware would check and reconnect if the connection was dropped. Effectively solving connection issues so user/custom code doesnt need to think about it.

The connection pinging action is simply a SQL query known to not have any side effects, but is a proven reliable way of checking for liveness. For most SQL database engines, this query is SELECT 1 (See Doctrine), with DB2/Oracle being outliers.

The ability to close the connection is also helpful (see Messenger), in order to save the number of connections, which are typically limited by infra.

After this is set up, we can consider if we need to automate connection reconnects for core tasks. I can see it being necessary for sites using dblog.

Steps to reproduce

Basic test to make database connection fail is to modify connection settings in `settings.php` with:

```php
$databases['default']['default'] = [
...
'init_commands' => [
'isolation_level' => 'SET SESSION wait_timeout = 1',
],
];
```

In a test controller, add a `sleep()` for greater than 1 second. Then execute an action that talks to the database. You'll see:

> SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

You can also setup and run [Symfony Messenger](sm) project, and experience connection loss after a while.

Remaining tasks

  • Discuss
  • Implement
  • Test

User interface changes

Nil

Introduced terminology

API changes

  • New API surface for \Drupal\Core\Database\Connection, various methods
  • New dummy SQL constant/methods added to database drivers.

Data model changes

None.

Release notes snippet

Feature request
Status

Active

Version

11.1 🔥

Component

database system

Created by

🇦🇺Australia dpi Perth, Australia

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @dpi
  • 🇦🇺Australia dpi Perth, Australia
  • Merge request !10616Database ping, close, and reconnect capabilities → (Open) created by dpi
  • Pipeline finished with Failed
    3 days ago
    Total: 246s
    #373388
  • Pipeline finished with Success
    3 days ago
    Total: 457s
    #373403
  • 🇦🇺Australia dpi Perth, Australia
  • 🇮🇹Italy mondrake 🇮🇹

    The motivation is clear and sensible, +1.

    However, pinging with a query anytime the client collection is fetched means a lot of db roundtrips, and as such kill performance - if I am not mistaken, getClientConnection() is called anytime a transaction is opened. BTW, that would also mean that non-transactional statement executions will not be covered.

    I think it would be preferable to store the client connection state in the Connection object, being 'active' at opening, then through exception handling change its state to 'connection lost' when an issue occurs. Code executing calls to the db could then check that state before executing and ping/close/reconnect in case. A challenge for sure would be how to determine a clear connection lost exception vs other exceptions.

    Another aspect to take care of is understanding the implications of a reconnect occurring while a db transaction is open - is the db transaction rolled back, how will the db failure reflect in the transaction manager, etc.

    On the positive side, we could think of building on the experience from 📌 Move the on-demand-table creation into the database API Needs work and think of a method that could embed a retry of executing a statement, i.e. in case of failure for disconnect, reconnect and retry.

    Finally - MySql (and possibly PgSql) would lose the connection, but what should be the behavior for Sqlite? It also can fail if the file is not in the same filesystem of the web server.

  • 🇮🇹Italy mondrake 🇮🇹
  • 🇦🇺Australia dpi Perth, Australia

    However, pinging with a query anytime the client collection is fetched means a lot of db roundtrips, and as such kill performance - if I am not mistaken, getClientConnection() is called anytime a transaction is opened. BTW, that would also mean that non-transactional statement executions will not be covered.

    As the code is written right now, this is not correct. ->ping() is called in ->getClientConnection(), but since ->connection is always a PDO, it exits. It would only continue if the PDO object in ->connection was set to null by an explicit call to ->close().

    The behavior is not wrapping every call in a safety net. If the connection was lost, and ping was not called by userland code, then indeed the app would crash. Encapulating all database calls would be quite a scope increase, and fwiw is not the behavior of Symfony+Doctrine+Messenger.

    Another aspect to take care of is understanding the implications of a reconnect occurring while a db transaction is open - is the db transaction rolled back, how will the db failure reflect in the transaction manager, etc.

    I'm not sure about this one.

  • 🇦🇺Australia dpi Perth, Australia
  • 🇳🇱Netherlands daffie

    @dpi: I get why you need to have a database connection that reconnects itself automatically. Doing an extra database call every time will kill performance and most of the time it is unnecessary. Most of the time we are done with the database connection long before the database connection times out. As an alternative we could add a new method called Connection::getConnectionWithReconnect(). Let the new method do the same as Connection::getConnection() only add the reconnect functionality.

  • 🇮🇹Italy mondrake 🇮🇹

    The more I look at this, the more I think we should build on top of 📌 Move the on-demand-table creation into the database API Needs work , and introduce a generic method that takes care of making db calls 'safe' from different perspectives.

    For example, what we have in that issue at the moment is a ::executeEnsuringSchemaOnFailure that in case of failure of a db operation, tries to build missing tables and retries the db operation:

        $execution = $this->connection->executeEnsuringSchemaOnFailure(
          execute: function (): int {
            return do_something();
          },
          schema: $schemaDefinition,
          retryAfterSchemaEnsured: TRUE,
        );
    

    we could make that more generic:

        $execution = $this->connection->safelyExecute(
          execute: function (): int {
            return do_something();
          },
          withSchema: $schemaDefinition,
          withReconnection: TRUE,
          retryOnFailure: TRUE,
        );
    
Production build 0.71.5 2024