Enhancing APCu Lock Efficiency (for ChainedFast for High-Concurrency Cache Invalidation)

Created on 26 October 2023, about 1 year ago
Updated 31 July 2024, 6 months ago

Novice tasks

- Rename APCuBackend::set() into ::setWithoutLock()
- Copy the doxygen from the CacheBackendInterface into the new method
- Add new set() function from the IS
- Create patch
- Upload patch

Please remove Novice tag after this is done.

Problem/Motivation

In high-concurrency environments, our current implementation of the `set` method within our caching system has shown performance bottlenecks, particularly due to lock contention. When multiple processes attempt to acquire a lock simultaneously, our use of `apcu_store()` results in writer locks being acquired, which block both readers and writers. This leads to increased wait times, potential bottlenecks, and overall degraded performance.

This is especially relevant for ChainedFast, where we need to write all items again when just one item in the bin changes.

Steps to reproduce

1. Set up a Drupal environment under high-concurrency conditions with multiple processes attempting to write to the cache simultaneously.
2. Monitor the performance and observe the time taken to acquire locks and write to the cache.
3. Notice the increased wait times and potential bottlenecks, especially when the number of concurrent processes is high.

Proposed resolution

To address this issue, we propose a novel approach to lock acquisition and initialization in our caching system. Instead of relying solely on `apcu_store()` for initializing and acquiring locks, which requires writer locks, we will utilize `apcu_add()` for initialization and `apcu_cas()` for lock acquisition. `apcu_cas()` operates with a reader lock and performs an atomic compare-and-swap operation, reducing lock contention and improving performance.

public function set($cid, $data, $expire = CacheBackendInterface::CACHE_PERMANENT, array $tags = []) {
  // Construct the necessary keys based on the cache ID.
  $key = $this->getApcuKey($cid);
  $lock_key = $key . ':apcu_update_lock';

  // Attempt to fetch the existing lock value from APCu.
  $results = apcu_fetch([$lock_key]);
  $old_lock_value = $results[$lock_key] ?? FALSE;

  // If the lock doesn't exist, initialize it with a random value.
  if ($old_lock_value === FALSE) {
    $old_lock_value = mt_rand(0, 2147483647);
    apcu_add($lock_key, $old_lock_value);
  }

  do {
    $lock_value = mt_rand(0, 2147483647);
  } while ($old_lock_value === $lock_value);

  // Attempt to acquire the lock.
  if (apcu_cas($lock_key, $old_lock_value, $lock_value)) {
    $this->setWithoutLock($cid, $data, $expire, $tags);
  }
}

I think I would suggest to just put the new method as set() directly in the APCu backend and then have a new protected method of setWithoutLock().

That would probably be the simplest.

CAS explanation

CAS stands for "Compare-And-Swap." It is an atomic operation used in concurrent programming to manage access to shared data without using locks, helping to avoid issues such as deadlocks and contention. The operation works as follows:

- Compare: The CAS operation takes three parameters: a memory location, an expected old value, and a new value. It first compares the current value at the memory location with the expected old value.

- And-Swap: If the current value matches the expected old value, it means that no other operation has modified the data since it was last checked, and it is safe to update. The CAS operation then swaps the current value with the new value, completing the operation.

- Return: The CAS operation returns a boolean value indicating whether the swap was successful. If the current value did not match the expected old value, the operation does not make any changes and returns false.

This mechanism ensures that updates to shared data are made safely, even in a multi-threaded or distributed environment, without the need for locking. If a CAS operation fails, it’s a signal that some other operation modified the data in the meantime, and the operation can be retried or handled accordingly.

In the context of our set() method in the previous code snippet, apcu_cas() is used to attempt to acquire a lock by comparing and swapping the lock value atomically. This helps to prevent race conditions where multiple processes might otherwise try to update the cache at the same time.

Remaining tasks

- [ ] Peer review of the proposed solution.
- [ ] Comprehensive testing in a simulated high-concurrency environment.
- [ ] Performance benchmarking to quantify the improvements.
- [ ] Update the documentation to reflect the new approach and its benefits.
- [ ] Merge the changes into the main codebase.

User interface changes

N/A – This change is backend-focused and does not affect the user interface.

API changes

N/A – The public API remains unchanged; this is an internal optimization.

Data model changes

N/A – There are no changes to the data model.

Release notes snippet

Improved the performance of the cache set operation in high-concurrency environments by optimizing lock acquisition and initialization. This change reduces lock contention, resulting in faster cache writes and improved overall system performance.

Issue summary was generated by: ChatGPT (and then slightly modified)

πŸ“Œ Task
Status

Needs work

Version

11.0 πŸ”₯

Component
CacheΒ  β†’

Last updated 3 days ago

Created by

πŸ‡©πŸ‡ͺGermany Fabianx

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

  • needs profiling

    It may affect performance, and thus requires in-depth technical reviews and profiling.

Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @Fabianx
  • πŸ‡©πŸ‡ͺGermany Fabianx

    Here is the ChatGPT analysis of this function:

    You’ve highlighted an important aspect of how these APCu functions interact with the underlying data store, and you are correct in your understanding of the different lock mechanisms they employ.

    ### apcu_cas()

    - `apcu_cas()` operates with a lighter locking mechanism.
    - It only requires a reader lock on the entry.
    - It performs a compare-and-swap (CAS) operation, which is atomic.
    - This means that it can check the current value and, if it matches the expected value, update it in a single atomic operation.
    - Because it only needs a reader lock, it doesn't block other reads from happening simultaneously.

    ### apcu_store()

    - On the other hand, `apcu_store()` requires a writer lock.
    - A writer lock is more restrictive; it blocks both other writers and readers until the operation is complete.
    - This can lead to longer wait times and potential bottlenecks, especially under high load or when dealing with large data sets.

    ### Why Your Approach Is Beneficial

    - By initializing the lock with `apcu_add()` (which uses `apcu_store()` internally but is only called once at the start), and then using `apcu_cas()` for subsequent lock acquisitions, you ensure that the heavier writer lock is acquired less frequently.
    - Most of the synchronization relies on `apcu_cas()`, which uses the lighter reader lock, resulting in less contention and better performance.
    - This is especially beneficial in a high-concurrency environment, where many processes are trying to acquire the lock simultaneously.

    By understanding and leveraging the underlying mechanisms of these APCu functions, your approach efficiently reduces lock contention, leading to better performance and scalability. This kind of optimization is crucial in performance-critical applications and high-load scenarios.

    As background reading.

  • πŸ‡§πŸ‡ͺBelgium wim leers Ghent πŸ‡§πŸ‡ͺπŸ‡ͺπŸ‡Ί
  • πŸ‡«πŸ‡·France andypost

    +1 to CAS

  • πŸ‡¬πŸ‡§United Kingdom catch
      // Attempt to acquire the lock.
      if (apcu_cas($lock_key, $old_lock_value, $lock_value)) {
        $this->setWithoutLock($cid, $data, $expire, $tags);
      }
    

    If this is false, what should happen? Do we assume it got set with the correct value already and silently skip it?

  • πŸ‡©πŸ‡ͺGermany Fabianx

    #9: If this is false, what should happen? Do we assume it got set with the correct value already and silently skip it?

    If this is false, then another process has gotten the "lock" and either calls setWithoutLock() right now, has called setWithoutLock() already or will call setWithoutLock() in the near future.

    As this is just about avoiding write stampedes to the same item (first step), there is likely even a process that is still before the apcu_fetch() and will write the data again, so we also don't need to worry about the process that got the "lock" to crash.

    Someone else will then set it.

  • πŸ‡¬πŸ‡§United Kingdom catch

    OK I think we should add a comment explaining this in the future MR, but that sounds right to me. We just have to assume that apcu_cas() can't get into a permanent failure state, but then if it does, it's probably because apcu as a whole is so writes wouldn't work anyway.

  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada

    you need to loop the two random number generations until they are not equal

  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada
  • πŸ‡«πŸ‡·France andypost

    Is there plan how to test it?
    Looks it does not work for Novice tag

  • Status changed to Needs work 6 months ago
  • πŸ‡«πŸ‡·France andypost

    I bet it needs to improve tests and profiling too

  • Pipeline finished with Failed
    6 months ago
    Total: 756s
    #239872
Production build 0.71.5 2024