Add a core FileCache implementation that is database-backed

Created on 9 November 2024, about 1 month ago

Problem/Motivation

See notes in 📌 Add a filecache-backed attribute parser Active .

apcu isn't shared between cli and web processes, nor is it shared between cli processes - therefore whenever YAML (or soon attribute) parsing tries to use FileCache on the cli, it won't work.

Steps to reproduce

Proposed resolution

Add a database-backed filecache implementation. However there's a complication that some YAML and attribute parsing happens within container rebuild, so we need figure out if we can use a service there, need to hard-wire something into the bootstrap container etc.

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

📌 Task
Status

Active

Version

11.0 🔥

Component

base system

Created by

🇬🇧United Kingdom catch

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @catch
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    Eek... but yeah this has always been a downside. This brings me back to a micro/precompiled container that that has essential services. FWIW we do have something similar already with \Drupal\Core\Config\BootstrapConfigStorageFactory...

  • 🇬🇧United Kingdom catch

    Pipeline is green.

    There is one remaining thing, which is we need a new FileCacheGarbageCollectionInterface, implement it in the new class, call the ::garbageCollection() method in system_cron() if the service implements the interface. That will complete the expires logic added so the database table doesn't have stale and un-removable cache entries in it forever.

  • Pipeline finished with Failed
    28 days ago
    Total: 851s
    #345203
  • 🇬🇧United Kingdom catch

    Pushed a commit to implement #4.

  • Pipeline finished with Failed
    28 days ago
    Total: 730s
    #345237
  • 🇺🇸United States nicxvan

    Unfortunately lots of failures that seem relevant, have not reviewed any.

  • 🇬🇧United Kingdom catch

    It's my attempt at garbage collection breaking. Pushed a commit which fixes at least one test.

  • Pipeline finished with Success
    28 days ago
    Total: 1632s
    #345266
  • 🇬🇧United Kingdom catch

    MR is green again now. Updated the issue summary with a summary of the implementation.

  • 🇺🇸United States nicxvan

    Did a quick review, a few questions where I feel like I'm missing something obvious.

  • 🇺🇸United States nicxvan

    My questions were answered I'll do a more thorough review later unless someone else gets to it.

  • 🇺🇸United States nicxvan

    Just one final comment that I think needs addressing.

  • 🇬🇧United Kingdom catch

    Addressed the last comment. Did some manual testing of this with 📌 Add a filecache-backed attribute parser Active and got what I would expect running the UI installer with the standard profile. Comments over there.

    When doing that testing I realised another thing.

    FileCache forces use of the apcu_prefix in the cache key. I think we should move that down to the apcu backend so the database doesn't use it - this would enabling persisting file_cache entries across deployments which is potentially a few hundred milliseconds downtime and dozens/hundreds of mb of peak memory usages removed from every deployment. However will start a new issue to explore that and we can either merge it back into here or do it on two issues depending on what it looks like.

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    I think we want to add the table to the schema only suggestion in \Drupal\Core\Command\DbDumpCommand::configure() - we also need an issue against Drush to update the schema only defaults stuff there for db dumping.

  • Pipeline finished with Canceled
    27 days ago
    Total: 281s
    #345793
  • Pipeline finished with Canceled
    27 days ago
    Total: 165s
    #345802
  • 🇬🇧United Kingdom catch

    I think we want to add the table to the schema only suggestion in \Drupal\Core\Command\DbDumpCommand::configure()

    This is done.

    we also need an issue against Drush to update the schema only defaults stuff there for db dumping.

    Happy to open this one once we've finalised the table name.

  • 🇬🇧United Kingdom catch

    Also notable is that I'm only seeing hook implementations go into the cache, nothing from YAML parsing - need to refresh my memory on what we do there but looks like we might need to either fix or add some caching to library/info/config YAML parsing.

    This was bad testing on my part - I already had these cached in APCu from a previous installation attempt. Did a ddev restart, emptied the database, reinstalled, and there were 317 rows just from hitting the first page of the UI installer with a pre-filled settings.php. Shows that there are definitely advantages to the APCu cache for situations like repeatedly reinstalling Drupal.

    After install and hitting the front page, there were 1226 rows from various discovery (library, info etc.) - this was without the hook discovery caching MR.

  • Pipeline finished with Failed
    27 days ago
    Total: 624s
    #345804
  • 🇬🇧United Kingdom catch

    Opened 📌 Move the FileCache apcu prefix handling into the apcu implementation or allow the APCu prefix to be customised Active and in the process of writing the issue summary, decided it definitely needs to be its own issue if we do it at all.

  • 🇺🇸United States nicxvan

    Failures look random, but I cannot rerun.

  • Pipeline finished with Canceled
    25 days ago
    Total: 196s
    #348093
  • Pipeline finished with Failed
    25 days ago
    #348094
  • 🇺🇸United States nicxvan

    Couple of relevant failures, they seem the same type, expecting no tables but db cache exists

  • 🇬🇧United Kingdom catch

    Moving discussion from the MR to here.

    @alexpott brought up the test expectation change in InstallerExistingSettingsTest.

    We added that in #3103529: Drupal 8.8.1+ and 9 can fail to install in the web browser due to cache pollution to prevent cache pollution when hitting index.php then getting redirected to the installer.

    I had a look at what changes would be necessary to prevent the table from being created in the early installer phase (when settings.php is pre-filled), and I wasn't able to find a good way to do this - because of the singleton approach in FileCacheFactory as opposed to it being a service (which is necessary for it being a dependency for container building), finding somewhere that's early enough and also being able to detect the installer state is not really doable. One possible option I didn't try yet (because it seems like a huge hack) would be checking for the install state globals in the implementation. Our entire 'early installer detection state' relies on 'has a table in system_schema() been created yet' which itself is a massive hack that needs to be replaced too.

    However, the bug we added that test for is for the regular cache backends which don't do mtime invalidation, so I think it's fine for this table to be created when settings.php is filled, the content of the cache doesn't vary by the install state or not, it just varies by the contents of the file - so why not rely on it in that very early install state.

  • 🇬🇧United Kingdom catch

    Tests are back to green.

  • 🇺🇸United States nicxvan

    One more comment on the mr once that is addressed I think this is ready. I think @alexpott should confirm his concern has been addressed.

  • 🇬🇧United Kingdom catch

    Fixed the one comment on the MR. Also realised we can/should take the database connection out of the constructor - theoretically at least this can be used without needing one, so might as well get the connection as late as possible.

  • 🇺🇸United States nicxvan

    I can't rerun the failed jobs, but only one looks like it might be relevant.

  • 🇬🇧United Kingdom catch

    Re-ran the three failing jobs - all green now.

  • 🇺🇸United States nicxvan

    From my perspective this is RTBC, @alexpott mentioned on slack he wants to review it again.

  • Pipeline finished with Failed
    22 days ago
    Total: 120s
    #350432
  • Pipeline finished with Failed
    22 days ago
    Total: 762s
    #350435
  • Pipeline finished with Success
    22 days ago
    Total: 1856s
    #350514
  • Pipeline finished with Success
    22 days ago
    Total: 2366s
    #350836
  • 🇺🇸United States nicxvan

    This is ready I think.

    There are two open comments by @daffie, I think both have been addressed appropriately.
    The check does need to be for default not active.
    And the next time we need normalizeCid it should be factored out, but not yet.

    Alex's concerns have also been answered.

  • 🇺🇸United States nicxvan

    Based on @alexpott's comment @daffie's comment was not actually resolved.

  • 🇬🇧United Kingdom catch

    I looked a bit more and I'm not really sure about @daffie's comment either.

    With ::getConnectionInfo() we ensure the default database connection is configured, and then get a connection if it is.

    With ::isActiveConnection() we check if there's an active connection (which will be the default database connection if it's active whenever this would be called), but then if it's not, we won't try to start a new connection.

    To me ::getConnectionInfo() seems more robust, because it's OK if we start a new database connection here, we just need to know we have one to start.

    In practice this might not actually make any difference to a site.

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    @catch and I discussed #28 and decided to keep getConnectionInfo('default') but then do getConnection('default', 'default') to be super explicit about what connection to use. This is because of how $key and $target work for those two methods, this class can create tables and we have good evidence that file cache is used by run-tests.sh when rendering test results!

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍
  • Pipeline finished with Failed
    21 days ago
    Total: 2376s
    #351667
  • 🇺🇸United States nicxvan

    This has the same distribution failure 📌 Add a core FileCache implementation that is database-backed Active

  • 🇬🇧United Kingdom catch

    The fix that just got committed for 📌 Add a core FileCache implementation that is database-backed Active should mean we get a green run again, restoring the RTBC since no code changes here.

  • Pipeline finished with Failed
    21 days ago
    Total: 558s
    #352365
  • 🇺🇸United States nicxvan

    core/profiles/standard/tests/src/FunctionalJavascript/StandardPerformanceTest.php

  • Pipeline finished with Failed
    21 days ago
    Total: 682s
    #352535
  • Pipeline finished with Success
    21 days ago
    Total: 968s
    #352546
  • 🇬🇧United Kingdom catch

    This test failure (not in the most recent run) is very strange, and i can reproduce it locally:

    https://git.drupalcode.org/project/drupal/-/jobs/3513488
    
    
    +····2·=>·Binary·String:·0x494e5345525420494e544f202264617461626173655f66696c655f636163686522202822636964222c2022657870697265222c202263726561746564222c202264617461222c202273657269616c697a656422292056414c55455320282264727570616c2e66696c655f63616368652e31312e322d6465762e2e6c3833444a62637057506668645372695a775a657268432d79736b54374139566b72624c736b705053494d3a6174747269627574655f646973636f766572793a44727570616c5f76696577735f4174747269627574655f56696577734163636573733a4e58687363526530343430504670493564537a6e4556676d61754c32354b6f6a443775346539615a774f4d3a44727570616c5f76696577735f416e6e6f746174696f6e5f56696577734163636573733a2f6275696c325836714e3969314c3759675149686247414776365f314e354f6f656468746264586237544e4d54623973222c20313734303530393230332c20313733323733333230332e3638332c2022613a333a7b733a353a226d74696d65223b693a313733323733333136333b733a383a2266696c6570617468223b733a35303a22636f72652f6d6f64756c65732f757365722f7372632f506c7567696e2f76696577732f6163636573732f526f6c652e706870223b733a343a2264617461223b613a323a7b733a323a226964223b733a343a22726f6c65223b733a373a22636f6e74656e74223b733a3534333a22613a393a7b733a353a22636c617373223b733a33363a2244727570616c5c757365725c506c7567696e5c76696577735c6163636573735c526f6c65223b733a383a2270726f7669646572223b733a343a2275736572223b733a323a226964223b733a343a22726f6c65223b733a353a227469746c65223b4f3a34383a2244727570616c5c436f72655c537472696e675472616e736c6174696f6e5c5472616e736c617461626c654d61726b7570223a333a7b733a393a22002a00737472696e67223b733a343a22526f6c65223b733a31323a22002a00617267756d656e7473223b613a303a7b7d733a31303a22002a006f7074696f6e73223b613a303a7b7d7d733a31313a2273686f72745f7469746c65223b4e3b733a343a2268656c70223b4f3a34383a2244727570616c5c436f72655c537472696e675472616e736c6174696f6e5c5472616e736c617461626c654d61726b7570223a333a7b733a393a22002a00737472696e67223b733a36343a224163636573732077696c6c206265206772616e74656420746f207573657273207769746820616e79206f66207468652073706563696669656420726f6c65732e223b733a31323a22002a00617267756d656e7473223b613a303a7b7d733a31303a22002a006f7074696f6e73223b613a303a7b7d7d733a31333a22646973706c61795f7479706573223b4e3b733a343a2262617365223b613a303a7b7d733a353a226e6f5f7569223b623a303b7d223b7d7d222c203129204f4e204455504c4943415445204b455920555044415445202263696422203d2056414c554553282263696422292c202265787069726522203d2056414c554553282265787069726522292c20226372656174656422203d2056414c55455328226372656174656422292c20226461746122203d2056414c55455328226461746122292c202273657269616c697a656422203d2056414c554553282273657269616c697a65642229,
    

    There are multiple queries that just say 'Binary String' with the same long string. I don't know if they are errors, or informational, but they are weird and being picked up by the performance test query logger (apparently only sometimes).

  • 🇬🇧United Kingdom catch

    Realised thinking about that - we can't log these queries in performance tests as database queries, because an item may or may not exist in APCu depending on whether it gets evicted or is ever set etc. So we would need to ignore them. We could also ignore 'Binary string' queries but I'd like to know what those are ideally.

    Discussed this issue a fair bit with @berdir in slack, and he suggested possibly only opting into the database-backed cache for attribute parsing, since YAML parsing may be faster without it (and shouldn't result in unfreeable memory usage).

  • 🇨🇭Switzerland berdir Switzerland

    catch credited berdir .

  • 🇬🇧United Kingdom catch
  • 🇨🇭Switzerland berdir Switzerland

    Link to the slack thread (it's long): https://drupal.slack.com/archives/C079NQPQUEN/p1732735819875639.

    TLDR: This adds around 1700 database queries on my current setup (core + a bunch of contrib modules + test module discovery. The test module discovery won't be on on real sites, but sites with 100+ contrib projects will not be far off from that). And that's not even counting Hooks discovery yet, because that's part of the container and not part of my test scenario. For Yaml discovery and parsing of those small files, the database lookups seem to take more time than the actual parsing.

    Also created this issue 📌 Remove ExtensionParser::scanDirectory() FileCache usage or share it with InfoParser Active

    Another semi-related thing: also could reproduce reports about massive increase of installer memory usage. from 36MB on 10.4 to 260MB on 11.x, that is with drush. with drush, because it all happens in a single process, none of the current issues are able to make any real improvements to that. No issue created about that yet.

  • 🇬🇧United Kingdom catch

    So from that slack thread, about 800 database queries were due to 📌 Remove ExtensionParser::scanDirectory() FileCache usage or share it with InfoParser Active , we should postpone this on that issue.

    The weird binary string 'queries' are also a blocker here I think at least until we know what they are, and so is excluding queries from the backend from performance tests since we can't guarantee whether anything comes from apcu or the db by design, so it'll need some work once that issue is done. And maybe being more selective about which file caches we apply it to (can be done with $default_config param to FileCacheFactory).

    We could explore a performance logging decorator in performance_test module in a follow-up, probably could only catch things outside container building but that is fine for most of our use-cases, so even if we exclude this from performance tests which can later bring it back.

    Berdir is right that the drush installer currently wouldn't benefit from this, however I would expect e.g. a drush cr after install to benefit because the entries will be cached by then (from the installer or previous drush crs).

    However also from the pure memory reduction standpoint, I was seeing preprocess hooks in system module discovered as hooks and added to the container, which is a side-effect of not having Explore ways to mark hook conversion status per module or file Active .

    So maybe the order should be:
    📌 Add a filecache-backed attribute parser Active - already in as of yesterday.

    Explore ways to mark hook conversion status per module or file Active - try to reduce the amount of parsing we do.

    I also noticed when checking what gets cached vs not, and it's visible in the performance test results too, that we are doing a lot of file scanning and parsing for plugins that aren't plugins.

    e.g.

    core/modules/system/src/Plugin/ImageToolkit/GDToolkit.php
    core/modules/system/src/Plugin/ImageToolkit/Operation/gd/ScaleAndCrop.php
    core/modules/system/src/Plugin/ImageToolkit/Operation/gd/GDImageToolkitOperationBase.php
    core/modules/system/src/Plugin/ImageToolkit/Operation/gd/Crop.php
    core/modules/system/src/Plugin/ImageToolkit/Operation/gd/Resize.php
    core/modules/system/src/Plugin/ImageToolkit/Operation/gd/Rotate.php
    core/modules/system/src/Plugin/ImageToolkit/Operation/gd/Convert.php
    core/modules/system/src/Plugin/ImageToolkit/Operation/gd/Scale.php
    core/modules/system/src/Plugin/ImageToolkit/Operation/gd/Desaturate.php
    core/modules/system/src/Plugin/ImageToolkit/Operation/gd/CreateNew.php
    

    Only GDToolkit is the actual plugin, all the operations are not, so that is nine 'NULL' cache entries where there should be one.

    If I run:

    SELECT cid, data FROM database_file_cache WHERE data LIKE '%"data";a:1:{i:0;N;}%';
    

    (which is looking for a filecache data NULL entry - what we use in plugin attribute parsing)
    I get 139 results.

    We could have an issue which adds an exception when a non-plugin is found in a plugin directory, and gradually move these out. This would save directory traversal, memory, and space in APCu. And then if/when we return here, that's 139 less database queries too. Will open a meta and a sub-issue for the GD toolkit to start.

    An issue against drush to use its own batch system in drush si also sounds like a good idea, I'm a bit surprised it's not already doing that. However thinking about that, we might want to do that in co-ordination with 📌 Add the ability to install multiple modules and only do a single container rebuild to ModuleInstaller Active , that issue also results in a lot less reflection/YAML parsing, although eventually the same number of files will be parsed, just not as many times.

  • 🇬🇧United Kingdom catch

    Actually another thought which would be a change here but not postponed on the info parser issue.

    There are two kinds of things we're using file cache for - extension, services, hook parsing which are all done during container build and therefore need to use the singleton.

    However config parsing, plugin discovery, route attributes when we have them, libary parsing - these are all from services, so could depend on a service.

    So... I think we could change approach here and do the following:

    - add a new file_cache.persistent service, and inject this into the various places we want to use it.

    - this service can be linked to a file_cache.persistent_cache_backend service which uses an actual cache backend, we'll need a no-op tag invalidator to pass in is the only additional thing.

    - this can use BackendChain (not chained fast because we don't need the invalidation support) with the APCu and database backends.

    - that will in turn make it easier to swap out the database backend with redis or memcache

    - cache backends support both multiple get and multiple set. Because this is a new service with a new interface, we can copy those aspects of the cache backend interface (+ mtime invalidation support) and that will make it easier to do a query-per-directory instead of query-per-file and similar.

    - for YAML parsing we'd need to have a non-filecache option for YAML parsing then, don't want to cache things twice..

    This would not help us with hook and service discovery then, but since route and plugin attributes also load PHP classes, and there'll be as many of those potentially, we can save the memory somewhere both on web requests and cli that way.

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    @catch not all Yaml parsing is from services... the ExtensionDiscovery object is used from a few places for example. But also the config importer is not a service - it's complex there because caching config during imports and installs doesn't feel right - this stuff is never read multiple times (well apart from during tests). Unlike config schema yaml where there cache is definitely worth it.

  • 🇬🇧United Kingdom catch

    @alexpott yeah I think this is more interesting for plugin discovery to start with.

    Config - it's not read often, but also you can end up with 1000+ config files in a sync directory and maybe 30 changing when you do a sync, so a persistent cache with a 3 month TTL and no invalidation apart from mtime might still help - but probably need to check.

  • 🇬🇧United Kingdom catch
Production build 0.71.5 2024