Add a filecache-backed attribute parser

Created on 3 October 2024, 4 months ago

Problem/Motivation

With πŸ“Œ OOP hooks using event dispatcher Needs review and πŸ“Œ [meta] Convert all core plugin types to attribute discovery Postponed (and potentially others in the future like autowiring) we're going to be doing a lot of scanning for attributes in core.

In general, each thing-that-looks-for-attributes implements iterating over a load of files, getting the attributes via reflection, then pulling out whatever information it needs, either into the symfony container or into a cache entry.

We won't have reliable profiling numbers on this until lots of conversions are done and possibly not until someone profiles a 150+ module site in the wild, but it's likely to be expensive.

Steps to reproduce

Proposed resolution

I think we can possibly use filecache to mitigate most or all performance overhead from scanning attributes.

We can implement something that:
- given a file
- collects all the attributes from that file at once
- puts them into a structure that we can store in filecache (which is based on the mtime of the file so persists across cache clears)
- allows a single attribute value to be returned from a method, possibly all of them.

If we use this in all/most of the places we are parsing attributes, then files that potentially use more than one type of attribute (like a service that provides a route using route attributes, whenever that lands, as well as a hook implementation), would only be parsed once instead of for each service that needs to check.

This should be easy when Drupal is doing the attribute parsing, it may be more complicated for the container itself, but maybe we can swap that out too.

I don't yet know what the API or data structure might look like, only have the very rough outline above so far.

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

πŸ“Œ Task
Status

Active

Version

11.0 πŸ”₯

Component

base system

Created by

πŸ‡¬πŸ‡§United Kingdom catch

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @catch
  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    Thinking through this a bit, wouldn't the ideal solution be rather than iterating over files the cache would just return the attributes with mappings back to the classes / methods needed?

    I mean if we need it per file we could cache both maps.

    So you might end up with two caches, one with the attributes in a given file, one with files and methods for a given attribute.

  • πŸ‡¬πŸ‡§United Kingdom catch

    It would be good to be able to cache the file scanning more but I'm not sure we'll be able to do that more than we already do in the object cache. For example we'd have to invalidate it when modules are installed/uninstalled because the potential directories to scan will change.

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    So this is more about scanning a given file once for all attributes to prevent scanning a file more than once.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Re #4 yes but also the individual file scans are able to persist across drush cr - drush cr might change which files we scan, but not the contents of those files. The mtime checking and deployment_identifier keys in filecache should handle file content/logic changes.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Took a quick look at this in HookCollectorPass only, not trying to do something generic yet (and it might not be possible, or at least it wouldn't help the procedural bc layer).

    I got it to the point where a drush cr doesn't explode, not sure if it actually 'works' yet.

    Something that is seriously going to affect the utility of this is that the apcu cache is not available to/shared with cli requests.

    I think we need a Drupal\Core\FileCache implementation that takes configuration from settings.php and can use a cache backend (but one that is not tagged for cache invalidation), or something like that. Might need a spin-off issue for that, but pushing where I got to.

  • Merge request !10115Add FileCache to hook collector pass. β†’ (Open) created by catch
  • πŸ‡¬πŸ‡§United Kingdom catch

    Tests are passing.

    This might make a difference with the UI installer, but I think we need πŸ“Œ Add a core FileCache implementation that is database-backed Active to get the real benefit of this, (and not just for this issue).

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    This looks pretty straightforward. Does this need dedicated test coverage to ensure it's actually using the file cache and purging the file cache properly on module install and uninstall?

    Unless I am missing something this is missing the invalidation layer.
    I don't see how this file gets deleted when it needs to be deleted:

    As far as I can see:

    • during development
    • after running composer install or update ( an update or dev version might have new attributes to find)
    • after install
    • after uninstall
  • πŸ‡¬πŸ‡§United Kingdom catch

    It shouldn't be necessary to purge on module install and uninstall - the module list determines discovery so any information cached for an uninstalled module will be ignored. We need add the deployment identifier to the cache key though to handle core updates.

    During development the file mtime will change and that makes the cache invalid each time anyway.

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    Ah yes, forgot about mtime.

    During development the file mtime will change and that makes the cache invalid each time anyway.

    That's true!

    However, I don't follow why we need to add a deployment identifier then?

  • πŸ‡¬πŸ‡§United Kingdom catch

    However, I don't follow why we need to add a deployment identifier then?

    I was thinking if the logic for collecting attributes itself changes. Maybe we don't need it though and could change the prefix directly in that case? It would make the cache work a lot better for sites that use $deployment_identifier without relying on the core version fallback.

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    Ah that makes sense! I'll leave it in needs work then for now.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Added in the deployment_identifier to the cache key.

  • Pipeline finished with Failed
    2 months ago
    Total: 169s
    #335374
  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    PHPCS failed due to $filename not being used.

    I don't want to update it so I can preserve RTBC

  • Pipeline finished with Failed
    2 months ago
    Total: 529s
    #335461
  • πŸ‡ΊπŸ‡ΈUnited States nicxvan
  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    We can't use the Settings::get('deployment_identifier') for the file cache since it relies on the actual file to get the mtime.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Feels obvious now that you've pointed it out - the file cache depends on the filename and can't use arbitrary cids.

    However, when I went to remove it, I realised we can probably add it to the namespace instead - seems like this should work but let's see what the pipeline says.

  • Pipeline finished with Failed
    2 months ago
    Total: 301s
    #339130
  • Pipeline finished with Success
    2 months ago
    Total: 897s
    #339133
  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    That is pretty clever!

    Tagging this RTBC with the assumption we don't need to test individual cache sets and gets.

    If we need that let me know.

    This does have a lot of implicit coverage, if the cache wasn't storing it properly everything would break.

  • πŸ‡¬πŸ‡§United Kingdom catch

    The original issue title was overly optimistic, maybe we can come up with a generic filecache-backed attribute parser but I think we should do a couple of one-off issues like this before that to understand the requirements better.

  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    The changes make sense, there appears to be a further optimisation we can do but otherwise this looks good.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Good point - pushed a commit for this, since it's a trivial change, moving back to RTBC.

  • Pipeline finished with Failed
    2 months ago
    Total: 147s
    #339647
  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    Fixed PHPCS, only rearranging lines and deleting some blank ones.

  • Pipeline finished with Canceled
    2 months ago
    Total: 84s
    #339699
  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    The "validatable config" job is failing on drush cr, unsure if it's directly related to this but looks relevant.

    https://git.drupalcode.org/project/drupal/-/jobs/3366605

    $ vendor/bin/drush cr --quiet
    Error: Call to undefined function \ckeditor5_config_schema_info_alter() in /builds/project/drupal/core/lib/Drupal/Core/Extension/ModuleHandler.php on line 459 #0 /builds/project/drupal/core/lib/Drupal/Core/Plugin/DefaultPluginManager.php(386): Drupal\Core\Extension\ModuleHandler->alter('config_schema_i...', Array)
    #1 /builds/project/drupal/core/lib/Drupal/Core/Config/TypedConfigManager.php(406): Drupal\Core\Plugin\DefaultPluginManager->alterDefinitions(Array)
    
  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    Oh that looks like we missed a direct call to a hook, I can take a look in a bit.

    It could also be in drush or another vendor package.

  • πŸ‡¬πŸ‡§United Kingdom catch

    I think that is specific to this MR and that job. The 11.x branch tip here was from before OOP hooks so it was getting very confused. I've pushed a rebase of the 11.x branch here and will re-run the job.

  • πŸ‡¬πŸ‡§United Kingdom catch

    OK it's not that, might be to do with config_inspector but I can't see any direct hook invocations, which leaves 'something weird' or drush.

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    I would just require drush locally and search for ckeditor5_config_schema_info_alter. If that doesn't find it I'll investigate further when I'm at my desk in about an hour

  • Pipeline finished with Success
    2 months ago
    Total: 3874s
    #339700
  • πŸ‡¬πŸ‡§United Kingdom catch

    No it is that, I just failed to push the correct branch.

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    Oh great! Looks good to me still and you addressed @longwave's comment on the mr.

    Rtbc from me again

    • longwave β†’ committed f6f705f8 on 11.1.x
      Issue #3478621 by catch, longwave, nicxvan: Add filecache to OOP hook...
    • longwave β†’ committed 6aca2acf on 11.x
      Issue #3478621 by catch, longwave, nicxvan: Add filecache to OOP hook...
  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    Let's squeeze this into 11.1.0-beta1.

    Committed and pushed 6aca2acfd19 to 11.x and f6f705f8a29 to 11.1.x. Thanks!

    • longwave β†’ committed bcee82d6 on 11.1.x
      Revert "Issue #3478621 by catch, longwave, nicxvan: Add filecache to OOP...
    • longwave β†’ committed dafc1fb9 on 11.x
      Revert "Issue #3478621 by catch, longwave, nicxvan: Add filecache to OOP...
  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    Reverted, this broke Experience Builder and will break many other modules that implement hooks on behalf of others.

    Experience Builder implements two hooks on behalf of core modules: datetime_range_storage_prop_shape_alter and media_library_storage_prop_shape_alter - this is so the changes only take effect if those modules are installed.

    During a test, we first of all install without media_library or datetime_range enabled. The hook discovery runs and these two hook implementations are not cached, because they aren't in the modules list and therefore aren't included in the function name regex. Later in the test, we install these modules, but the hook implementation cache is not cleared; the code believes it's already discovered all the hooks for experience_builder.module, the cached result is used, and the hooks are never found or executed at runtime.

    This will presumably affect any module that implements hooks on behalf of other modules, the incomplete set of hooks may be cached and then never cleared. Not sure what the fix is yet, maybe we need to hash the regex as part of the cache key?

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    That might work, but can we also just dump it on module install?

    We won't see as much benefit in that case as we'd want, but that should solve this case for now right?

  • πŸ‡¬πŸ‡§United Kingdom catch

    catch β†’ changed the visibility of the branch 3478621-add-a-filecache-backed to active.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Not sure what the fix is yet, maybe we need to hash the regex as part of the cache key?

    The concept of implementing a hook on behalf of another module doesn't exist with OOP hooks (you can check moduleExists() inside a hook implementation of course still), so we only need to cover this with procedural hooks, so I tried implementing it only for them - seems simple enough but see what the pipeline says. The advantage if we do it this way is we only take the hit on the bc layer and only when the module list changes, so regular drush crs still get the benefit as do converted modules.

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    Just a quick note, you can implement on behalf of other modules using #[Hook('toolbar_alter', module: 'module_on_behalf_of')]

    You can't implement on behalf of a theme though.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Ahh #43 is a good point, however I think we're OK here because those implementations will still be discovered by the OOP attribute discovery (just not invoked when hooks are invoked), whereas the problem we have with experience_builder is the procedural implementation not being discovered at all until the module is installed. There's no way to tell which module foo_bar_foo_bar_foo_bar() is implemented on behalf of, until foo, foo_bar or foo_bar_foo module is installed, but the attribute is explicit.

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    These changes look good to me actually, can we test this against XB before going through the commit process again?

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    I created πŸ“Œ [ignore] Test Filecache Active we can apply this there

  • Pipeline finished with Success
    2 months ago
    Total: 555s
    #340170
  • πŸ‡¬πŸ‡§United Kingdom catch
  • πŸ‡ΊπŸ‡ΈUnited States nicxvan
  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    Given we broke contrib let's add a test to ensure we don't regress again in the future

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    I'm taking a crack at this.

  • Pipeline finished with Failed
    2 months ago
    Total: 111s
    #344154
  • Pipeline finished with Success
    2 months ago
    Total: 636s
    #344157
  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    Ok this is ready for review.

    I ran the test only, but that won't fail cause the current code and previous code both work for the situation, but it will prevent the regression.

  • Pipeline finished with Failed
    2 months ago
    Total: 130s
    #344169
  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    I added a procedural test too.

  • Pipeline finished with Success
    2 months ago
    Total: 868s
    #344170
  • πŸ‡¬πŸ‡§United Kingdom catch

    The new test coverage looks good, marked threads on the MR resolved because I agree with the answers given.

    Tagging beta target because this should significantly help mitigate the increased memory usage due to the procedural hook bc layer.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Did some manual testing of this + together.

    This is what the file_cache table looks like after running the UI installer with the standard profile:

    MariaDB [db]> SELECT COUNT(*) FROM file_cache WHERE cid LIKE '%\:hook_implementations%';
    +----------+
    | COUNT(*) |
    +----------+
    |       62 |
    +----------+
    1 row in set (0.001 sec)
    
    MariaDB [db]> SELECT COUNT(*) FROM file_cache WHERE cid LIKE '%:procedural_hook_implementations%';
    +----------+
    | COUNT(*) |
    +----------+
    |     1591 |
    +----------+
    1 row in set (0.002 sec)
    

    As you can see there are far fewer entries for the non-procedural implementations, this is because we don't have to use the module regexp in the cache key so they persist across module installs, whereas procedural implementations are discovered each time to allow for implementing hooks from other modules.

    I pushed one additional commit:

    1. Gave the procedural hook implementation items their own namespace, this made it easier to run the queries above to differentiate the prevalence of the two. Due to the module preg sharing the namespace was fine, but nice to see what things look like.

    2. Removed the deployment identifier from the cache key - it's already in apcu_prefix so this would mean it's in the cid twice.

    Leaving RTBC because these are very small changes.

  • Pipeline finished with Success
    2 months ago
    Total: 614s
    #345947
  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    Keeping an eye on tests.

  • Pipeline finished with Success
    about 2 months ago
    Total: 560s
    #351052
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    My review was minor code-style nits - no reason to not leave rtbc.

    • alexpott β†’ committed 6d78e0cf on 11.1.x
      Issue #3478621 by catch, nicxvan, longwave: Add filecache to OOP hook...
    • alexpott β†’ committed 59598e57 on 11.x
      Issue #3478621 by catch, nicxvan, longwave: Add filecache to OOP hook...
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Committed and pushed 59598e57ab3 to 11.x and 6d78e0cf06e to 11.1.x. Thanks!

    • alexpott β†’ committed 89c81430 on 11.1.x
      Revert "Issue #3478621 by catch, nicxvan, longwave: Add filecache to OOP...
    • alexpott β†’ committed 28c7686a on 11.x
      Revert "Issue #3478621 by catch, nicxvan, longwave: Add filecache to OOP...
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    This broke core/tests/Drupal/FunctionalTests/Installer/DistributionProfileExistingSettingsTest.php https://git.drupalcode.org/project/drupal/-/pipelines/351677/test_report...

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    This is because combined with πŸ“Œ Convert system_theme to OOP or remove it Active we break core/tests/Drupal/FunctionalTests/Installer/DistributionProfileExistingSettingsTest.php

    I've rebased the MR on top of 11.x head so we'll see the failure here.

  • Pipeline finished with Failed
    about 2 months ago
    Total: 642s
    #351763
  • Pipeline finished with Failed
    about 2 months ago
    Total: 736s
    #352266
  • Pipeline finished with Failed
    about 2 months ago
    Total: 558s
    #352288
  • Pipeline finished with Success
    about 2 months ago
    Total: 564s
    #352304
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    For any follow-up that tries to fix this - the hook collector pass that runs before system classes are autoloadable is triggered by install_begin_request() from

      // Override the module list with a minimal set of modules.
      $module_handler = \Drupal::moduleHandler();
      if (!$module_handler->moduleExists('system')) {
        $module_handler->addModule('system', 'core/modules/system');
      }
    
  • πŸ‡¬πŸ‡§United Kingdom catch
  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    Looks good now.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Committed and pushed c79a622227e to 11.x and e01542fd879 to 11.1.x. Thanks!

    • alexpott β†’ committed e01542fd on 11.1.x
      Issue #3478621 by catch, nicxvan, longwave, alexpott: Add filecache to...
    • alexpott β†’ committed c79a6222 on 11.x
      Issue #3478621 by catch, nicxvan, longwave, alexpott: Add filecache to...
  • Status changed to Fixed about 1 month ago
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024