TTL handling broken, always permanent

Created on 29 October 2020, almost 5 years ago
Updated 21 February 2023, over 2 years ago

Problem/Motivation

  protected function getExpiration($expire) {
    if ($expire == Cache::PERMANENT || $expire > $this->permTtl) {
      return $this->permTtl;
    }
    return $expire - \Drupal::time()->getRequestTime();
  }

This looks perfectly reasonable but there is a subtle bug because it will basically always hit the permanent case and never return the calculated value.

First, permTtl is equal to LIFETIME_PERM_DEFAULT(31536000) unless otherwise set through config. So when you say set the cache to say 600 the call ends up looking something like $this->getExpiration(\Drupal::time()->getRequestTime() + 600); which we would expect to return the 600. However, the value will look something like this 1604003270 which is always larger than which is the default value and the TTL sent to redis will be the permanent TTL.

Steps to reproduce

Set a cache entry to a couple minutes in the future and check the ttl in redis. You'll see something close to the PERM ttl instead of the provided ttl.

Proposed resolution

Fix the default permTtl setup to take into account the request time.

Remaining tasks

User interface changes

n/a

API changes

maybe? this seems like a straight up bug but its possible someone was relying on the the permTtl value somehow? Its protected so I think its ok to correct the logic.

Data model changes

n/a

πŸ› Bug report
Status

Needs work

Version

1.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States neclimdul Houston, TX

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    This module now has tests on merge requests ( doesn't show up here), and they are failing, which is expected based on what was discussed in the related github issue.

  • πŸ‡ΊπŸ‡ΈUnited States lcatlett

    This patch has been required for Pantheon customers hitting maxmemory limit and is critical to evicting keys as expected using LRU

  • First commit to issue fork.
  • Pipeline finished with Failed
    over 1 year ago
    Total: 253s
    #125516
  • Pipeline finished with Failed
    over 1 year ago
    Total: 195s
    #126369
  • Pipeline finished with Failed
    over 1 year ago
    Total: 219s
    #128762
  • Pipeline finished with Failed
    over 1 year ago
    Total: 317s
    #128772
  • Pipeline finished with Success
    over 1 year ago
    Total: 202s
    #130874
  • Pipeline finished with Failed
    over 1 year ago
    Total: 253s
    #131928
  • Pipeline finished with Failed
    over 1 year ago
    Total: 195s
    #131933
  • Pipeline finished with Success
    over 1 year ago
    Total: 312s
    #131939
  • πŸ‡ΊπŸ‡ΈUnited States Ec1ipsis

    I can't tell whether the automated test needs to change or if the behavior itself has a bug. I've merged in the most recent release to the 8.1 branch, and the error that I'm seeing from PHPUnit is:

    1) Drupal\Tests\redis\Kernel\RedisCacheTest::testSetGet
    Failed asserting that false is of type "object".
    /builds/issue/redis-3179757/vendor/phpunit/phpunit/src/Framework/Constraint/Constraint.php:122
    

    I believe that the automated test itself needs to be updated, but I'm not certain. Anyone have a firm answer on that?

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    See the duplicate issue #2877893: Cache Expiration doesn't follow Drupal conventions β†’ where I've explained this about 7 years ago :)

    The test is from core. Core expects that the API allows to return expired cache items if you ask for that. Setting a TTL in redis is incompatible with the expectation that core has for a cache backend implementation.

    The only way to get tests to pass is to make this behavior configurable. Either a boolean enable/disable, or an idea I just had, is that we add a ttl offset setting. So that can be set to e.g. 300, then any ttl we receive is extended by 300s, which means that for this timeframe, we can still return expired items, allowing certain use cases like returning expired items while one process recalculates it. Then we just need to configure that specific test to have enough ttl offset for the tests to still work.

  • Pipeline finished with Failed
    over 1 year ago
    Total: 250s
    #134637
  • Pipeline finished with Failed
    over 1 year ago
    Total: 200s
    #135660
  • Pipeline finished with Failed
    over 1 year ago
    Total: 271s
    #135667
  • Pipeline finished with Failed
    over 1 year ago
    Total: 209s
    #135676
  • Pipeline finished with Failed
    over 1 year ago
    Total: 189s
    #137794
  • Pipeline finished with Failed
    over 1 year ago
    Total: 180s
    #137861
  • Pipeline finished with Failed
    over 1 year ago
    Total: 178s
    #137873
  • Pipeline finished with Failed
    over 1 year ago
    Total: 229s
    #137955
  • Pipeline finished with Failed
    over 1 year ago
    Total: 225s
    #137980
  • Pipeline finished with Failed
    over 1 year ago
    Total: 217s
    #138006
  • Pipeline finished with Failed
    over 1 year ago
    Total: 194s
    #138034
  • Pipeline finished with Failed
    over 1 year ago
    Total: 225s
    #138043
  • πŸ‡ΉπŸ‡·Turkey kyilmaz80

    Any updates?

  • πŸ‡§πŸ‡·Brazil jribeiro Campinas - SΓ£o Paulo

    Rerolled patch for 1.8 version of the module without the change to the line:

    $this->permTtl = ($iv->y * 31536000 + $iv->m * 2592000 + $iv->days * 86400 + $iv->h * 3600 + $iv->i * 60 + $iv->s);
    

    This line is conflicting with the issue/patch https://www.drupal.org/node/2944938 β†’ , since we are using both patches on our projects.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Again, per #21, this is incompatible with test expectations that core has for cache backends and it will never pass tests, don't bother debugging that, that's by design. Core expects expired items to be returned when asked for. This changes it so that it doesn't.

    πŸ› Allow to remove support for invalidateAll() and treat it as deleteAll() Active is an example of what I'm asking for. Add a setting that allows to make this a configurable opt-in behavior, it absolutely makes sense, but I want to have redis by default correspond to core cache backend expecatations as much as possible.

    I closed one issue as a duplicate, I think I'd prefer to also close ✨ Default lifetime for permanent items for Drupal 8 Needs review as a duplicate and merge them together.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Also, if this is a behavior that is important for your site, feel free to reach out and sponsor a few hours of my time through MD Systems to finish this up with tests and documentation and I'm happy to commit it.

  • Issue was unassigned.
  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    I agree that that "core compatibility" should be configurable.

    But if I understand the issue correctly, "core compatibility" currently only exists because of a bug in the calculation?

  • Pipeline finished with Failed
    3 months ago
    #490016
  • Pipeline finished with Success
    3 months ago
    #490039
  • Pipeline finished with Success
    3 months ago
    #490088
  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    I don't think that test would fail, because we correctly handle expiration set on items, you need to do a version with TRUE so that invalid items are returned and then expect it to no longer do that. Only then you could actually make it behave differently. See the expire check in \Drupal\redis\Cache\CacheBase::expandEntry.

    What I suggested in #21 is that we do this as an integer instead of a boolean flag. NULL would mean the current behavior, also for BC, and any integer, including 0 is the offset added to the provided expiration time. This would only be added to the TTL set in redis and not the expire flag.

  • Pipeline finished with Failed
    3 months ago
    Total: 369s
    #490277
  • Pipeline finished with Failed
    3 months ago
    #490289
  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    Writing to Redis sometimes takes more than 2s in the tests :-(
    Expire is time() + 2, but

        'created' => 1746536865.132
        'expire' => '1746536864'
    
  • Pipeline finished with Failed
    3 months ago
    #490311
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Redis used request time

  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    Redis used request time

    That explains why the tests are still failing.
    So I leave it to you improve the test.
    For us, the patch is good enough to fix the critical issue in our production environment.

  • Pipeline finished with Failed
    3 months ago
    #490442
  • Status changed to Needs review 16 days ago
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    For us, the patch is good enough to fix the critical issue in our production environment.

    It's not really clear to me where the critical issue is. expire is working, it's just checked at runtime in Drupal. I don't see how this can possibly anything but an optimization in read time of expired cache entries as well as optimizing how redis uses the available memory (as it frees up expired ones more efficiently.

    I was looking into this a bit, also to better understand the difference between maxmemory-policy volatile vs allkeys.

    I created this script to understand how many cache items actually have expire set:

    
    use Drupal\Core\Site\Settings;
    
    $redis = \Drupal::service('redis.factory')->getClient();
    
    $prefix = Settings::get('cache_prefix', '');
    
    $total = 0;
    $with_expire = 0;
    $no_expire = 0;
    $expired = 0;
    $it = NULL;
    while ($keys = $redis->scan($it, '*', 1000)) {
      foreach ($keys as $key) {
        if (!str_contains($key, ':cachetags:')) {
          $expire = $redis->hget($key, 'expire');
          $total++;
          if ($expire != -1) {
            $ttl = ($redis->hget($key, 'expire') - \Drupal::time()->getCurrentTime());
            echo substr($key, strlen($prefix) + 1) . ": " . $ttl . "\n";
            $with_expire++;
    
            if ($ttl < 0) {
              $expired++;
            }
          }
          else {
            $no_expire++;
          }
        }
      }
      break;
    }
    
    echo 'Total: ' . $total . "\n";
    echo 'No expire: ' . $no_expire . "\n";
    echo 'With expire: ' . $with_expire . "\n";
    echo 'Expired: ' . $expired . "\n";
    

    You can put this in a file and run it with drush scr, preferably on production after it has been running for an extended period of time.

    This will list all cache items that a have non-permanent expiration set and a summary of what it finds, which for me looks like this:

    Total: 327150
    No expire: 319165
    With expire: 7985
    Expired: 2352
    

    That a means ~2.4% of my cache items have an expiration set, and of those, ~30% have expired (0.7% of total) and could be deleted. That's a pretty tiny amount, but depending on what you are doing, this number could be a lot bigger. Also, this doesn't account for using custom per-bin permTtl configuration, you could for example set those to ~1d or something so that those large entries are frequently pruned and not kept for a long time (which is what this issue fixes)

    For me, expiring items are mostly page caches (this will only show up if you use a workaround for πŸ› [pp-3] Bubbling of elements' max-age to the page's headers and the page cache Postponed ) and preload paths, something that will go away with πŸ“Œ Try to replace path alias preloading with lazy generation Active .

  • πŸ‡ΊπŸ‡ΈUnited States neclimdul Houston, TX

    Its been a long time since I've looked at this but for us it wasn't critical but it was getting that way. From memory, expire wasn't working and we where getting a massive constantly growing store. Adding this patch and then wiping the store it seemed to resolve the issue.

    Unfortunately, I don't have an environment to test this on currently. This script definitely would have been useful at the time to debug it though so hopefully someone else can dig in further.

    I will say, even if expiry is working, the permTtl day typo probably is mildly critical since the value currently misses the day component of the calculation so its definitely not working for all cases.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Its been a long time since I've looked at this but for us it wasn't critical but it was getting that way. From memory, expire wasn't working and we where getting a massive constantly growing store. Adding this patch and then wiping the store it seemed to resolve the issue.

    Still struggling to follow this. In my experience, cache on a non-trivial Drupal site is pretty much guaranteed to grow (not sure what exactly "massive" is), due to things like search, personal tracking links and so on. So this is more about optimizing evictions when Redis is full as opposed to preventing growth.

    As mentioned, there is a possibility that some sites have specific patterns that result in much larger percentage of expiring cache items than usual. Maybe every page contains a block that has a low max age and using something that actually bubbles that up into page cache. But in such a case, it might be more useful to focus on optimizing that rather than relying on better expire behavior because it's also it's likely also going to result in a low cache hit rate.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    berdir β†’ changed the visibility of the branch 3179757-ttl-handling-broken--testing to hidden.

  • Pipeline finished with Skipped
    11 days ago
    #555363
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    This took some work.

    Fixed extended the tests, took a while to not have any random fails, time stuff is tricky, especially as this needs to account for both the fake internal request time and the real time that the redis server uses.

    Did a lot of work on the docs as well, not just on offset, but also very outdated docs on ttl in general, documented stuff around maxmemory policy. Created two follow-ups and merged.

Production build 0.71.5 2024