Node grants rebuild is not atomic to end-users

Created on 14 April 2016, almost 9 years ago
Updated 20 March 2025, 20 days ago

Problem/Motivation

When administrators have to rebuild permissions in batch mode the process is the following one:
1. Initiate rebuild.
2. Delete permissions.
3. Regenerate permissions. Depending on the site can be very slow (I've seen D7 sites to do this in hours).
4. Rebuild finished - site is operational.

The problem here is that step 2 is relatively fast, so it removed everything form the node_access table, but 3 is (VERY) slow, that leaves the system in a broken/invalid state for a ranging period of time, depending on the size of the system.

Current recommendation is to put the site into maintenance during the rebuild, as the users will get many "403 Forbidden" responses otherwise. The bigger the site is the longer this time-frame will be, the the more users will be unhappy, as the site is effectively not-accessible (down).

I am marking this as:
- Bug - because I am considering it as a such. Feel free to change the categorization.
- Major - because site is not operational when the process takes place. The bigger the site, the bigger the negative impact...

Proposed resolution

The solution comes from the double buffer design pattern:
Do not break the system, until you have the new state ready on the side and them just swap them in a fast manner - the old with the new.

The idea is to have a second database table like (node_access_temp) that will not be in use, except for this case and the process will be changed a bit like so:
1. Initiate rebuild.
2. Clean everything from node_access_temp, for safety (expected to be empty).
3. Rebuild new permissions in node_access_temp. This will be slow (as it currently is), but the site will be operational with the old permissions.
4. Clean node_access. This is fast, as it currently is.
5. Transfer node_access_temp state to node_access. I expect it to be much faster, as it is a solution that is fully dependent on storage level limitations. No high level APIs will be involved here.
- insert select or something similar.
- drop node_access, alter node_access_temp to node_access, recreate node_access_temp.
- Have a state value that will point to the active node_acceess table for managing access on the site, the switch here can be atomic, by changing the pointer value.
- Other ideas?
6. Clean-up node_access_temp, as the data is already active.
7. Rebuild finished - site is operational.

This way the rebuild can take arbitrary long time, but the switch will be just the time to transfer the data from node_access_temp to node_access, greatly reducing the time, where the system's access data is in invalid state. If we manage to make it fully atomic, then rebuild will not require downtime at all (maintenance mode).

Remaining tasks

Discussion, patch, review, RTBC, commit.

User interface changes

None.

API changes

None, this is implementation detail.

Data model changes

New Temporary DB table. No structural changes to existing systems.

🐛 Bug report
Status

Active

Version

11.0 🔥

Component

node system

Created by

🇧🇬Bulgaria ndobromirov

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇦🇺Australia acbramley

    All of these old node_grants/access issues that have gone stale make me wonder how often this system is actually used anymore. We certainly don't interact with it at all on any of our client projects.

    This is, however, still relevant.

Production build 0.71.5 2024