Use alternate parsing method for attribute-based plugin discovery to avoid errors

Created on 27 November 2024, 5 months ago

Problem/Motivation

Using the nikic/php-parser library was mentioned by @ghost of drupal past as a possible alternative to parsing attributes via PHP reflection, because substantive use of reflection can result in performance and memory usage concerns.

From comment #12 ๐Ÿ“Œ Investigate possibilities to parse attributes without reflection Active on ๐Ÿ“Œ Investigate possibilities to parse attributes without reflection Active

A problem that's come up in ๐Ÿ“Œ Convert MigrateSource plugin discovery to attributes Active and ๐Ÿ› Changing plugins from annotations to attributes in contrib leads to error if plugin extends from a missing dependency Needs review is that attributes can't be read via Reflection on classes that implement missing interfaces, extend missing classes, or use missing traits, in the case that the missing dependency comes from an uninstalled module. While Reflection will throw an exception for a missing interface or class, PHP will fatal on a missing trait. The exception thrown during plugin discovery was handled in 3458177, but a solution for handling missing traits gracefully is still outstanding.

Using something like nikic/php-parser to retrieve attribute values without might be a viable way forward to retrieve at least some attribute values in the case of missing dependencies.

Steps to reproduce

For example, the migrate source plugin Drupal\block_content\Plugin\migrate\source\d7\BlockCustomTranslation in the Block Content module uses Drupal\content_translation\Plugin\migrate\source\I18nQueryTrait from the Content Translation module. Block Content does not have a module dependency on Content Translation, so it migrate source plugins were using attributes, discovery would lead to a PHP fatal error when Migrate and Block Content were installed, but not Content Translation.

Proposed resolution

  • Move nikic/php-parser from a dev composer requirement to runtime
  • Use nikic/php-parser on plugin classes during discovery before reflection to identify interface, class, and trait dependencies the class has that may be missing
  • As an additional measure, provide a Dependencies attribute that can be applied to classes. Its only property is modules, which is a list of module machine names that need to be installed in order for the plugin to be discovered
  • Use nikic/php-parser to check whether the plugin class has the attribute and to extract the list of module dependencies
  • Avoid reflection on classes with missing dependencies and do not save to the discovery file cache
  • Developers can then manually declare on their plugin classes which modules need to be installed for the plugin to be discovered like so:
    #[MigrateSource(
      id: 'd7_block_custom_translation',
      ...
    ]
    #[Dependencies(['content_translation', 'migrate_drupal'])]
    

    to ensure that discovery ignores the class and does not fatal error.

    Specifying the dependencies likely should be unnecessary, since the missing trait should have already been identified, but complex cases where a missing trait is somewhere higher in the class hierarchy would not be handled without the Dependencies attribute.

    Note that the nikic parser is not used here to replace Reflection for attribute-based plugin discovery. For now, it is only making attribute-based plugin discovery more error-resistant.

    This approach also provides a start on how attribute parsing via nikic can be done. Dependencies is a simple attribute with only one array property. Plugin attribute classes can be a lot more complex, with nested object properties, so that will need further investigation in ๐Ÿ“Œ Investigate possibilities to parse attributes without reflection Active .

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

๐Ÿ“Œ Task
Status

Active

Version

11.0 ๐Ÿ”ฅ

Component

plugin system

Created by

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @godotislate
  • Merge request !10373Resolve #3490322 "Use alternate parsing" โ†’ (Open) created by godotislate
  • Pipeline finished with Failed
    5 months ago
    Total: 750s
    #352484
  • Pipeline finished with Success
    4 months ago
    Total: 2119s
    #352497
  • MR is up with tests.

    The Dependencies attribute stuff may not be strictly needed, but I think it can help make plugin discovery more robust. If it ends up staying in, there will need to be a CR documenting its use.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch
  • ๐Ÿ‡จ๐Ÿ‡ฆCanada Charlie ChX Negyesi ๐ŸCanada

    /me listens to https://youtu.be/6fgudq4CcH8 for vibes

    Thanks for coding what I have dreamed about for some time now -- or was it a nightmare? I haven't been this torn about something for a long while. But hey! luckily I no longer have even an informal leadership position or whatever so the burden of deciding whether to go down on this path is not on my shoulders. I will post a few thoughts which cause or allay fears.

    1. nikic/php-parser is an amazing work and most importantly it's true to the PHP source code because it's generated from it. Of course, one part of the generating code, that being phpyacc, has one of the most hair raising or most badass comments I've ever seen, I can't decide which: "How does it work? I don't know. I just ported the code until it worked correctly" (Anthony is awesome, always has been, pity his interaction with Drupal was brief). But, despite it has not seen a single commit for four plus years, the code generated by phpyacc has indeed been proven correct. rector depends on php-parser and it's successful. By now, I have faith in php-parser. Phew, was that hard to admit :) (Now of course kmyacc itself has been dead for quite some time now and PHP is using Bison not kmyacc but let's not go into those woods. Let's just be thankful it worked out.)
    2. This needs all the caching goodness catch is cooking because userspace parsing is not a speedy adventure especially not when NameResolver gets involved. But that cooking is going rather well so we can put aside all the performance concerns that stopped the hook oop patch being built on top of php-parser -- the original issue still has some traces of the proof concepts I coded for that.
    3. (string) $attribute->name. This is ... naive, not in any negative ways of course. It just doesn't consider the absolutely wild selection of PHP features attributes support. While https://github.com/nikic/PHP-Parser/blob/master/doc/component/Constant_e... of course exists but it needs helpers for resolving constants (and other things as described under Unsupported expressions and evaluator fallback) which very well might appear here would require some work.... this is one area where I am happy not to work on this or indeed the wider effort to move all attributes (or even just hook attributes) to php-parser. If the decision is to stay with this naive implementation, which is, again, totally fine, it at least needs to be precisely documented to only support literal scalars, no constants, no property fetch expressions, none of that, kthxbai.
    4. hasMissingClassDependencies is again a bit naive, this is still not some harsh criticism it's just this topic has been giving me worse nightmares than what follows a late night visit to Taco Bell. A depends on B depends on C where C doesn't exist , this code parses A and runs class_exists(B) which happily throws a Fatal error: Uncaught Error: Class "C" not found. Yay. Is this a real world problem? Does this lead to the requirement of a soft-class-exists-based-on-php-parser? Maybe, what do I know, I know nothing any more.

    Please carry on. It's awesome to see taking shape.

  • Thanks for the review, @ghost of drupal past. I'll address the MR comments and update it sometime tomorrow or later, but here's a couple quick replies to #6 ๐Ÿ“Œ Use alternate parsing method for attribute-based plugin discovery to avoid errors Active

    I think these limitations need to be documented on the Dependencies::__construct $modules argument. Especially the lack of constants.

    This is a fair point, and I will add the documentation.

    A depends on B depends on C where C doesn't exist , this code parses A and runs class_exists(B) which autoloads and happily throws a Fatal error: Uncaught Error: Class "C" not found.

    Yeah, I tried to acknowledge that in this comment, but it's probably a little opaque:

      // There is a risk that a class or trait uses a missing trait somewhere
      // in its hierarchy and will cause a fatal error.
    

    Let me give some background first. My initial idea was only to introduce the Dependencies attribute, and leave it to plugin developers to purposely designate which modules the plugin depends on, in order to avoid the fatal error on missing traits. While it is an additional DX hurdle, my thought was that this amount of friction is relatively small. It also really only applies to classes using missing traits, since we are already catching the exception for missing interfaces and classes.

    While I was working with nikic/php-parser on this, I realized that the interfaces, classes, and trait dependencies were parsed and available, so I thought it was worth identifying those specifically as dependencies as well. Running the _exists() functions does run the risk of some trait in the class hierarchy being missing and causing the fatal. The chance of this happening is lower than a missing trait being used directly by the plugin class, and even in case there is a fatal error, that's when a plugin developer would use the Dependencies attribute.

    There is a gap if a missing dependency in the hierarchy comes from some PHP library that is not a module, but I think this is a fringe use case.

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada Charlie ChX Negyesi ๐ŸCanada

    So yes, someone needs to make a decision whether

    1. No class/trait/interface checks. Require explicit dependencies and maintain less core code.
    2. Middle ground: immediate classes checked with direct class_exists & friends
    3. Whole hog. Make DrupalKernel::$classLoader public readonly, use $kernel->classLoader->findFile() to find any classes/traits/interfaces being depended on, read 'em, parse 'em and recurse with a static cache.

    I have no opinion which one is best, I merely outlined how #3 could look.

    And/or use statements could easy be parsed for non-existent modules.

  • Whole hog. Make DrupalKernel::$classLoader public readonly, use $kernel->classLoader->findFile() to find any classes/traits/interfaces being depended on, read 'em, parse 'em and recurse with a static cache.
    ...
    And/or use statements could easy be parsed for non-existent modules.

    Migrate source discovery by annotations does this. Trying to provide BC for this annotated discovery specifically for migrate source, on top of adding attribute discovery, results in some complex code. And recreating this functionality, again only for migrate source, makes the code even more complex. Handling recursive dependencies in the attribute discovery base class would eliminate the specific complexity for migrate source. In fact, an issue about moving this functionality from migrate source to core was created, even before attribute-based discovery was a thing, but it's been unresolved for years: https://www.drupal.org/project/drupal/issues/2786355 โ†’ .

    That being said, I think the scope of the missing dependencies issue will expand as attribute-based discovery expands to routes ( ๐Ÿ“Œ Use PHP attributes for route discovery Needs review ), and services ( โœจ Directory based automatic service creation Needs work , ๐Ÿ“Œ Support #[AsEventListener] attribute on classes and methods Active , โœจ Support hooks (Hook attribute) in components Active ), etc. It might actually make sense to provide some sort of missing-dependency-safe wrapper for Reflection to address all these use cases. Recursively iterating through all the files of all the classes in the hierarchies might be really slow, so that could eliminate it as an option.

  • And/or use statements could easy be parsed for non-existent modules.

    Migrate source discovery by annotations does this.

    Adding a correction that migrate source discovery does not quite do this. It does the recursion, but it checks the namespaces of only parent and ancestor classes and does not look at namespaces of traits or interfaces.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom longwave UK

    I haven't seen it mentioned here but https://github.com/Microsoft/tolerant-php-parser is another possible alternative to nikic/php-parser, not looked into it at all, I only know of its existence.

  • Pipeline finished with Failed
    4 months ago
    Total: 727s
    #364503
  • Pipeline finished with Success
    4 months ago
    Total: 708s
    #364533
  • Rebased and addressed comments on the MR. I also moved all the "providers" logic to the core AttributeClassDiscovery class, since that's where it should happen.

    I'm continuing with the hybrid approach (option 2) for now. Hopefully can get this pushed forward and help get ๐Ÿ“Œ Convert MigrateSource plugin discovery to attributes Active unblocked. It should be fairly easy to remove the *_exists() checks and rely solely on the Dependencies attribute if requested to go with option 1. I think option 3 of going through all the class hierarchies is too slow.

    Along those lines, if nickic/php-parser is adopted here, and since it looks like BC support for plugin discovery by Annotations will continue for a while, plugin classes will be essentially parsed three times during discovery: once via the Doctrine-based StaticReflectionParser, then with nikic PhpParser, and then with Reflection. I think a reasonable follow up would be to replace or refactor StaticReflectionParser with PhpParser, and then somehow pass the parsed result into parseClass(), so that would remove one parsing pass. An eventual end goal could be parsing plugin Attributes with PhpParser as well, but as mentioned before, there is a lot of complexity in some of the Attribute classes.

    Only gave a cursory investigation into https://github.com/Microsoft/tolerant-php-parser, which documents itself as "early-stage":

    This is an early-stage PHP parser designed, from the beginning, for IDE usage scenarios (see Design Goals for more details). There is still a ton of work to be done, so at this point, this repo mostly serves as an experiment and the start of a conversation.

    good to see the constant expression evaluator here but without the helpers described in "Unsupported expressions and evaluator fallback" under https://github.com/nikic/PHP-Parser/blob/master/doc/component/Constant_e... it is quite limited and I think these limitations need to be documented on the Dependencies::__construct $modules argument. Especially the lack of constants.

    Added this documentation for the modules parameter in the Dependencies constructor.

  • Pipeline finished with Success
    4 months ago
    Total: 901s
    #366319
  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia larowlan ๐Ÿ‡ฆ๐Ÿ‡บ๐Ÿ.au GMT+10
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    Haven't reviewed the MR yet, but now wondering if we might need something like this for hook discovery too, e.g. if a hook class uses a trait from a module that's not installed or similar?

  • @catch Yes, that was part of my thinking of whether there'd need to be something like this for Hook discovery, or as attribute discovery may expand to services or routes, such as ๐Ÿ“Œ Use PHP attributes for route discovery Needs review , โœจ Directory based automatic service creation Needs work , ๐Ÿ“Œ Support #[AsEventListener] attribute on classes and methods Active , and โœจ Support hooks (Hook attribute) in components Active .

  • Pipeline finished with Success
    3 months ago
    Total: 820s
    #397904
  • Pipeline finished with Failed
    3 months ago
    Total: 568s
    #400420
  • Pipeline finished with Failed
    3 months ago
    Total: 655s
    #400447
  • Pipeline finished with Success
    3 months ago
    Total: 809s
    #400454
  • Refactored this to try to make the functionality available to be used outside plugin discovery.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom oily Greater London

    Edited the issue summary.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom oily Greater London

    More updates to the issue summary and updated the title.

  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

    My 0.02$

    1. If this is driven by performance concerns, note that Reflection has improved substantially since PHP5; it somehow still bears the bad name but some performance benchmark shows that currently it's not even efficient to cache reflection results: A benchmark of reflection API performance in PHP
    2. If this is driven by the need not to load the classes, then roave/better-reflection would be IMHO a better solution; php-parser is very low level to build PHP code into AST, but then would need all the logic to interpret the AST. Better reflection does that and is used by e.g. PHPStan and Reflector.
  • Pipeline finished with Failed
    3 months ago
    Total: 521s
    #404059
  • Pipeline finished with Success
    3 months ago
    Total: 786s
    #404071
  • Read the php-parser docs some, and they recommend reusing the Parser class (https://github.com/Roave/BetterReflection/blob/6.53.x/docs/compatibility...), so I updated the MR to keep the Parser statically after instantiation. I'm not sure what memory or speed impact it has. Running drush si demo_umami -vvv consecutively multiple times did not produce consistent enough metrics to gauge a consistent difference to HEAD.

    If this is driven by performance concerns

    The performance concern is ๐Ÿ“Œ Investigate possibilities to parse attributes without reflection Active . This one is about avoiding fatal errors when using reflection to read class attribute data. Leveraging a library here to do that can maybe provide a complete alternative to reflection in the parent issue, but that's not the goal here.

    If this is driven by the need not to load the classes, then roave/better-reflection would be IMHO a better solution

    I investigated this a little. The current approach here is not to replace Reflection, but to use static analysis (via PhpParserr) to determine whether the class is safe for Reflection, meaning no exceptions or fatal errors because of missing dependencies. Once the class is determined to be safe to reflect, then Reflection is used to get the attribute values. PhpParser works fine for this, and given that BetterReflection is built on top of PHP Parser, it seems unnecessary to use. I did look at whether BetterReflection could be used to replace Reflection in a way that is safe from missing dependencies. Unfortunately, it BetterReflection does not allow creating instances of the attributes (https://github.com/Roave/BetterReflection/blob/6.53.x/docs/compatibility...), so it's not really a fit for this issue.

  • Status changed to Needs review 2 months ago
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    Just had an idea for traits while writing up ๐Ÿ“Œ Add a classloader that can handle class moves Active .

    Could we, only for attribute discovery:

    1. Register an additional classloader
    2. This classloader only deals with classes that can't be autoloaded.
    3. If a class can't be autoloaded, see if it looks like a trait, and if so alias it to Drupal\Core\Something\MissingTrait
    4. Unregister the autoloader at the end of discovery again.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    Opened an issue to implement #25 with an MR: ๐Ÿ“Œ Add a fallback classloader that can handle missing traits Active

  • I gave the suggestions in #25 a shot.

    First, I tried registering an autoloader that just throws an exception. That way, all missing classes, interfaces, and traits throw the same exception that can be caught. Unfortunately this does not work for traits. While the exception is thrown, PHP still fatal errors, which looks something like this: PHP Fatal error: During class fetch: Uncaught Drupal\Component\Discovery\DiscoveryException: Plugin class dependency Drupal\a_module_that_does_not_exist\Plugin\CustomTrait is missing. in /var/www/html/core/lib/Drupal/Component/Discovery/DiscoveryClassLoader.php:19

    Next, I tried aliasing any missing trait to some placeholder trait. This kind of works, but there are issues:

    • The only way I could figure to do trait detection is to see whether the $class string being passed to the autoloader callback ends in "Trait". While there are only one trait in core that does not end in "Trait" (Drupal\Core\Menu\MenuLinkFieldDefinitions, though there are two other test traits: Drupal\Tests\dblog\Functional\FakeLogEntries, Drupal\Tests\workspaces\Functional\WorkspaceTestUtilities), there's nothing preventing people from creating traits in contrib or custom modules that do not end in Trait. Nor anything preventing people from unfortunately creating classes or interfaces with names that end in Trait, though likely very edge-casey
    • Trait use statements with method aliases still fatal error, because the placeholder trait does not have the method:
      use CustomTrait {
          theMethod as theTraitMethod;
      }
      
    • When the fatal error is avoided by aliasing, the plugin can then be erroneously discoverable. We can get around this by add a constant in the placeholder trait, then using Reflection to detect whether the plugin class has the constant.
      trait PlaceholderTrait {
      
        const PLUGIN_HAS_MISSING_DEPENDENCY = TRUE;
      
        public function __call(string $name, array $arguments) {}
      
      }
      
      // in AttributeClassDiscovery.
      protected function parseClass(string $class, \SplFileInfo $fileinfo): array {
          // @todo Consider performance improvements over using reflection.
          // @see https://www.drupal.org/project/drupal/issues/3395260.
          $reflection_class = new \ReflectionClass($class);
      
          if ($reflection_class->hasConstant('PLUGIN_HAS_MISSING_DEPENDENCY')) {
            // Throw an exception or something.
          }
      

      There would still be an issue in the case where the module containing the trait is installed, so the plugin is discovered. But when the module is uninstalled, the plugin definition would still exist in the file cache, and any unfortunate attempt to instantiate the plugin class would hit a fatal error.

  • Just saw the comment about the new issue ๐Ÿ“Œ Add a fallback classloader that can handle missing traits Active . Will cross-post abbreviated thoughts from #27 there.

  • The Needs Review Queue Bot โ†’ tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

    This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide โ†’ to find step-by-step guides for working with issues.

Production build 0.71.5 2024