Limit iterator in HookCollectorPass

Created on 4 December 2024, 14 days ago

Problem/Motivation

HCP currently recursively iterates over all files in a module.

It only returns a file for processing if it is in src/Hook or has the following extensions.
in_array($extension, ['inc', 'module', 'profile', 'install'])

We also exclude some files
return !in_array($fileInfo->getFilename(), ['tests', 'js', 'css']) && !array_filter(scandir($key), fn ($filename) => str_ends_with($filename, '.info.yml'));

This also means that if a module has submodules they get scanned at least twice, once for each parent and once for itself. Assuming both are enabled.

We could likely optimize this to scan top level (get ['inc', 'module', 'profile', 'install'])
Scan src/Hooks

Steps to reproduce

N/A

Proposed resolution

TBD

Remaining tasks

Determine if we want to do this since we might be allowing Hook attributes elsewhere.
Do we need a way to opt into scanning everything if a module put hooks in a subfolder

User interface changes

N/A

Introduced terminology

N/A

API changes

N/A

Data model changes

N/A

Release notes snippet

πŸ“Œ Task
Status

Active

Version

11.0 πŸ”₯

Component

base system

Created by

πŸ‡ΊπŸ‡ΈUnited States nicxvan

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @nicxvan
  • πŸ‡¬πŸ‡§United Kingdom catch

    Determine if we want to do this since we might be allowing Hook attributes elsewhere.

    I think if we do that, we can undo or modify a change here. It is hard to tell without numbers on a real site, but my feeling is what we're currently doing is very, very i/o intensive and it should speed things up a lot if we can check less files.

  • πŸ‡¬πŸ‡§United Kingdom catch
  • First commit to issue fork.
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    I can't see files in submodules actually getting scanned.

    When enabling navigation, this is the list of files it actually scans for that module:

    core/modules/navigation/src/Hook/NavigationHooks.php
    core/modules/navigation/navigation.install
    core/modules/navigation/navigation.module
    

    For paragraphs it's

    modules/contrib/paragraphs/paragraphs.module
    modules/contrib/paragraphs/paragraphs.install
    

    Neither hook classes nor module files of submodules are being scanned. I used navigation because it has the navigation_top_bar module, which is AFAIK actually against core policy of not having submodules that aren't tests)

    Thy are traversed, but they are filtered out.

    *Maybe* it would be faster to use a separato recursive iterator just on the Hooks namespace and a non-recursive one on the module folder, but I suspect that's microoptimization and will not result in measurable performance improvements.

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    Yes, that is accurate, traversed is a better word.

  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada

    beware, include files can be in subdirectories

  • πŸ‡ΊπŸ‡ΈUnited States nicxvan

    Ah from hook hook info? I thought they had to be top level.

    Any other includes would have to be manually included so they would be picked up.

  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada

    HookCollectorPass currently indeed loads module which loads includes as well but it parses files return by the iterator to avoid reflection on those functions, who knows whether this is best practice but that's what it currently does so you need the iterator to return every include file wherever the module author has put them.

  • πŸ‡¬πŸ‡§United Kingdom catch

    Yes I think we need to recursively scan for includes, but could still exclude /src. If you put a procedural include in /src... then you win a prize.

  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada

    Hrmmmm yes that could be restricted. I have a feeling I wanted to based on how the code reads but I didn't, sorry. It's quite a bit of a fizzbuzz problem :) so there can be many ways to express this. Here's one

        $subPathname = $iterator->getSubPathname();
        $extension = $fileInfo->getExtension();
        // Under src/ only consider php files which are also under src/Hook.
        $inHook = str_starts_with($subPathname, 'src/Hook/');
        $isDir = $iterator->isDir();
        if ($isDir && ($subPathname === 'src' || $subPathname === 'src/Hook' || $inHook)) {
          return TRUE;
        }
        if ($inHook && $extension === 'php') {
          return TRUE;
        }
        if (str_starts_with($subPathname, 'src/')) {
          return FALSE;
        }
        // Otherwise traverse subdirectories as long as they are not submodules.
        return $isDir ?
          // glob() doesn't support streams but scandir() does.
          !in_array($fileInfo->getFilename(), ['tests', 'js', 'css']) && !array_filter(scandir($key), fn ($filename) => str_ends_with($filename, '.info.yml')) :
          in_array($extension, ['inc', 'module', 'profile', 'install']);
    
Production build 0.71.5 2024