ReplicaKillSwitch unnecessarily starts a session on each request

Open on Drupal.org →

Created on 11 November 2020, over 4 years ago

Updated 30 January 2023, over 2 years ago

Problem/Motivation

Whenever a request is processed, the database.replica_kill_switch service checks if the replica database should be ignored. The problem is that the event subscriber method \Drupal\Core\Database\ReplicaKillSwitch::checkReplicaServer always fetches data from the session, which can cause a session to be started when it has already been stopped.

    if ($this->session->has('ignore_replica_server')) {
      if ($this->session->get('ignore_replica_server') >= $this->time->getRequestTime()) {
        Database::ignoreTarget('default', 'replica');
      }
      else {
        $this->session->remove('ignore_replica_server');
      }
    }

The \Drupal\Core\Database\ReplicaKillSwitch::trigger has a preflight check that sees if a replica server even exists before setting a session variable. The check method should perform this check as well.

Steps to reproduce

On kernel.terminate perform a subrequest. The request triggers the replica killswitch check and throws an except due to headers already being sent.

Proposed resolution

Add this check before reaching into the session service and lazy loading/starting a service

    $connection_info = Database::getConnectionInfo();
    // Only set ignore_replica_server if there are replica servers being used,
    // which is assumed if there are more than one.
    if (count($connection_info) > 1) {

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

🐛 Bug report

Status

Needs work

Version

10.1 ✨

Component

Last updated 4 days ago

Maintained by
🇳🇱Netherlands @daffie

Created by

🇺🇸United States mglaman WI, USA

Live updates comments and jobs are added and updated live.

Needs tests
The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Incomplete comments

Sign in to follow issues

Merge Requests

!12229ReplicaKillSwitch unnecessarily starts a session on each request
Open
🇺🇸United States JonMcL
updated 2 months ago
!12227ReplicaKillSwitch unnecessarily starts a session on each request
Closed
🇺🇸United States JonMcL
updated 2 months ago

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Comment over 2 years ago →
needs-review-queue-bot
The Needs Review Queue Bot → tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.

Consult the Drupal Contributor Guide → to find step-by-step guides for working with issues.
Status changed to Needs review almost 2 years ago10:46pm 12 September 2023
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
Patch Failed to Apply
Comment almost 2 years ago →
🇨🇦Canada joelpittet Vancouver
D10 Fails on this patch with

TypeError: Drupal\Core\Database\ReplicaKillSwitchRequest::__construct(): Argument #1 ($requestStack) must be of type Drupal\Core\Http\RequestStack, Symfony\Component\HttpFoundation\RequestStack given, called in /var/www/html/public/core/lib/Drupal/Component/DependencyInjection/Container.php on line 259 in Drupal\Core\Database\ReplicaKillSwitchRequest->__construct() (line 26 of core/lib/Drupal/Core/Database/ReplicaKillSwitchRequest.php).

This patch should work for D11 as well.
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.2 & MySQL 8
last update almost 2 years ago
30,151 pass
Comment almost 2 years ago →
🇦🇺Australia kim.pepper 🏄‍♂️🇦🇺Sydney, Australia
Fixes the patch conflicts and adds a kernel test.

I think we need to work out how this would be used instead of the session version? Do we switch to it and deprecate? How do we support both?
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.1 & MySQL 5.7
last update almost 2 years ago
Custom Commands Failed
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.1 & MySQL 5.7
last update almost 2 years ago
Patch Failed to Apply
Comment almost 2 years ago →
🇺🇸United States smustgrave
Is there any BC concerns if we switch?
Status changed to Needs work almost 2 years ago6:05pm 19 October 2023
Comment almost 2 years ago →
🇺🇸United States smustgrave
Moving to NW for issue summary update. And if there is a backwards compatibility concern to switch?
Open in Jenkins → Open on Drupal.org →
Environment: PHP 8.1 & MySQL 8
last update over 1 year ago
Patch Failed to Apply
Comment 2 months ago →
🇺🇸United States JonMcL Brooklyn, NY
I have reviewed some of the patches, including MR!25 and I do not think any of them address the fundamental problem: if the session has been ended, the ReplicaKillSwitch service is still going to attempt to start a session.

In our (extreme edge) case, we are attempting to save some entities in a shutdown function. The session has been closed, our shutdown function then saves a pending entity change, SqlContentEntityStorage::save then calls \Drupal::service('database.replica_kill_switch')->trigger() which then ends creating an Exception which then causes a transaction rollback (inside SqlContentEntityStorage::save and our attempt to save changes to an entity (in the shutdown function) are lost.

Incidentally, we don't have a replica server, but we do have another database connection. So our Database::getConnectionInfo count is 2.

I think the right solution here might be to check if the session ::isStarted before attempting to get or fetch data from it?
Merge request !12227Issue #3181946 by jonmcl, mglaman → (Closed) created by JonMcL
Pipeline finished with Canceled
2 months ago
Total: 199s
#506092
Comment 2 months ago →
System Message
jonmcl → closed merge request !12227
Comment 2 months ago →
🇺🇸United States JonMcL Brooklyn, NY
jonmcl → changed the visibility of the branch 3181946--replicakillswitch-uses-non-started-sessions to active.
Pipeline finished with Failed
2 months ago
Total: 623s
#506089
Pipeline finished with Failed
2 months ago
Total: 554s
#506093
Comment 2 months ago →
🇬🇧United Kingdom catch
The theory behind starting the session when this was originally added is that if someone has just triggered saving an entity, then while the replica is catching up to the primary database, you want that user to see fresh data while they're browsing around the site, and not potentially stale data from the replica.

We could change that logic, and that would mean that the user potentially sees stale data until things catch up again. However, this was all added before render caching, and if anyone browses a listing query that's from a replica database listing stale data, then it'll get cached for other people too, and then this logic does nothing useful anyway. So I'm not sure it makes any sense any longer. Sites using this feature just have to deal with the possibility that some state listing data might get cached sometimes.

I think it's completely reasonable to avoid setting anything to the session unless the session is open, an actual user triggering an entity save without a session open would be extremely rare. So that might also be fine.

The MR will need to be targeted against 11.x, and it looks like the branch needs a rebase.
Pipeline finished with Failed
2 months ago
Total: 578s
#506097
Comment 2 months ago →
🇺🇸United States JonMcL Brooklyn, NY
I pushed up my changes on https://git.drupalcode.org/issue/drupal-3181946/-/tree/3181946--replicak...

I tried to create my new branch off of 11.x, but things seemed to go awry and that's probably not the correct way to get this updated for 11.x. When I created the MR, it showed thousands of changes even though the new MR was targeting 11.x.

Someone with better git.drupalcode.org skills than me is needed to clean things up.
Merge request !12229Issue #3181946: Don't use non-started or closed session in ReplicaKillSwitch. → (Open) created by JonMcL
Comment 2 months ago →
🇬🇧United Kingdom catch
When you branch in an issue fork, it uses 11.x from the issue fork, and that can be very old on an old issue - so you would need to rebase the branch with origin 11.x again.

Thinking about alternatives to session - one possible option that would also fix caching would be to write a timestamp to state instead of session, and then not allow anything to query the replica for x seconds after the timestamp. This would also prevent stale data from the replica going into the render cache too.
Pipeline finished with Failed
2 months ago
Total: 136s
#506155
Comment 2 months ago →
🇺🇸United States JonMcL Brooklyn, NY
@catch I am wondering if my understanding of ReplicaKillSwitch is correct.

This is to make it so that MY queries, in this current request, go to the primary db server instead of any of the replicas. The idea being that my current request just made a change to the database. That change goes through the primary connection. If my current request, being processed in the same thread, has a select query, the select goes through the primary connection again so that it is assured to have the updated data?

Then, because the kill switch is in my session for a period of time, my next request to load and display the updated node (and update entity & render caches) is guaranteed to get fresh data from the primary because the kill switch is sending me there. I suppose there is a small risk that another user's request comes in at that moment and, because they don't have the kill switch, they get stale data from a replica and replace the invalidated cache items with stale data.
Pipeline finished with Failed
2 months ago
Total: 1873s
#506157
Comment 2 months ago →
🇬🇧United Kingdom catch
That's the right idea yes. Except loading a node never goes to the replica, in core I think the only places that support it are search and views, so the only place to get stale data would be listing queries.

However there could be dozens of pages on which any one node appears - taxonomy term listings etc. and it would be easy for another user to warm the cache of those pages.

contrib.social Blog FAQ Discussions

Production build 0.71.5 2024