Use alternate parsing method for attribute-based plugin discovery to avoid errors

Created on 27 November 2024, 22 days ago

Problem/Motivation

Using the nikic/php-parser library was mentioned by @ghost of drupal past as a possible alternative to parsing attributes via PHP reflection, because substantive use of reflection can result in performance and memory usage concerns.

From comment #12 πŸ“Œ Investigate possibilities to parse attributes without reflection Active on πŸ“Œ Investigate possibilities to parse attributes without reflection Active

A problem that's come up in πŸ“Œ Convert MigrateSource plugin discovery to attributes Active and πŸ› Changing plugins from annotations to attributes in contrib leads to error if plugin extends from a missing dependency Needs review is that attributes can't be read via Reflection on classes that implement missing interfaces, extend missing classes, or use missing traits, in the case that the missing dependency comes from an uninstalled module. While Reflection will throw an exception for a missing interface or class, PHP will fatal on a missing trait. The exception thrown during plugin discovery was handled in 3458177, but a solution for handling missing traits gracefully is still outstanding.

Using something like nikic/php-parser to retrieve attribute values without might be a viable way forward to retrieve at least some attribute values in the case of missing dependencies.

Steps to reproduce

For example, the migrate source plugin Drupal\block_content\Plugin\migrate\source\d7\BlockCustomTranslation in the Block Content module uses Drupal\content_translation\Plugin\migrate\source\I18nQueryTrait from the Content Translation module. Block Content does not have a module dependency on Content Translation, so it migrate source plugins were using attributes, discovery would lead to a PHP fatal error when Migrate and Block Content were installed, but not Content Translation.

Proposed resolution

  • Move nikic/php-parser from a dev composer requirement to runtime
  • Use nikic/php-parser on plugin classes during discovery before reflection to identify interface, class, and trait dependencies the class has that may be missing
  • As an additional measure, provide a Dependencies attribute that can be applied to classes. Its only property is modules, which is a list of module machine names that need to be installed in order for the plugin to be discovered
  • Use nikic/php-parser to check whether the plugin class has the attribute and to extract the list of module dependencies
  • Avoid reflection on classes with missing dependencies and do not save to the discovery file cache
  • Developers can then manually declare on their plugin classes which modules need to be installed for the plugin to be discovered like so:
    #[MigrateSource(
      id: 'd7_block_custom_translation',
      ...
    ]
    #[Dependencies(['content_translation', 'migrate_drupal'])]
    

    to ensure that discovery ignores the class and does not fatal error.

    Specifying the dependencies likely should be unnecessary, since the missing trait should have already been identified, but complex cases where a missing trait is somewhere higher in the class hierarchy would not be handled without the Dependencies attribute.

    Note that the nikic parser is not used here to replace Reflection for attribute-based plugin discovery. For now, it is only making attribute-based plugin discovery more error-resistant.

    This approach also provides a start on how attribute parsing via nikic can be done. Dependencies is a simple attribute with only one array property. Plugin attribute classes can be a lot more complex, with nested object properties, so that will need further investigation in πŸ“Œ Investigate possibilities to parse attributes without reflection Active .

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

πŸ“Œ Task
Status

Active

Version

11.0 πŸ”₯

Component

plugin system

Created by

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @godotislate
  • Merge request !10373Resolve #3490322 "Use alternate parsing" β†’ (Open) created by godotislate
  • Pipeline finished with Failed
    22 days ago
    Total: 750s
    #352484
  • Pipeline finished with Success
    22 days ago
    Total: 2119s
    #352497
  • MR is up with tests.

    The Dependencies attribute stuff may not be strictly needed, but I think it can help make plugin discovery more robust. If it ends up staying in, there will need to be a CR documenting its use.

  • πŸ‡¬πŸ‡§United Kingdom catch
  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada

    /me listens to https://youtu.be/6fgudq4CcH8 for vibes

    Thanks for coding what I have dreamed about for some time now -- or was it a nightmare? I haven't been this torn about something for a long while. But hey! luckily I no longer have even an informal leadership position or whatever so the burden of deciding whether to go down on this path is not on my shoulders. I will post a few thoughts which cause or allay fears.

    1. nikic/php-parser is an amazing work and most importantly it's true to the PHP source code because it's generated from it. Of course, one part of the generating code, that being phpyacc, has one of the most hair raising or most badass comments I've ever seen, I can't decide which: "How does it work? I don't know. I just ported the code until it worked correctly" (Anthony is awesome, always has been, pity his interaction with Drupal was brief). But, despite it has not seen a single commit for four plus years, the code generated by phpyacc has indeed been proven correct. rector depends on php-parser and it's successful. By now, I have faith in php-parser. Phew, was that hard to admit :) (Now of course kmyacc itself has been dead for quite some time now and PHP is using Bison not kmyacc but let's not go into those woods. Let's just be thankful it worked out.)
    2. This needs all the caching goodness catch is cooking because userspace parsing is not a speedy adventure especially not when NameResolver gets involved. But that cooking is going rather well so we can put aside all the performance concerns that stopped the hook oop patch being built on top of php-parser -- the original issue still has some traces of the proof concepts I coded for that.
    3. (string) $attribute->name. This is ... naive, not in any negative ways of course. It just doesn't consider the absolutely wild selection of PHP features attributes support. While https://github.com/nikic/PHP-Parser/blob/master/doc/component/Constant_e... of course exists but it needs helpers for resolving constants (and other things as described under Unsupported expressions and evaluator fallback) which very well might appear here would require some work.... this is one area where I am happy not to work on this or indeed the wider effort to move all attributes (or even just hook attributes) to php-parser. If the decision is to stay with this naive implementation, which is, again, totally fine, it at least needs to be precisely documented to only support literal scalars, no constants, no property fetch expressions, none of that, kthxbai.
    4. hasMissingClassDependencies is again a bit naive, this is still not some harsh criticism it's just this topic has been giving me worse nightmares than what follows a late night visit to Taco Bell. A depends on B depends on C where C doesn't exist , this code parses A and runs class_exists(B) which happily throws a Fatal error: Uncaught Error: Class "C" not found. Yay. Is this a real world problem? Does this lead to the requirement of a soft-class-exists-based-on-php-parser? Maybe, what do I know, I know nothing any more.

    Please carry on. It's awesome to see taking shape.

  • Thanks for the review, @ghost of drupal past. I'll address the MR comments and update it sometime tomorrow or later, but here's a couple quick replies to #6 πŸ“Œ Use alternate parsing method for attribute-based plugin discovery to avoid errors Active

    I think these limitations need to be documented on the Dependencies::__construct $modules argument. Especially the lack of constants.

    This is a fair point, and I will add the documentation.

    A depends on B depends on C where C doesn't exist , this code parses A and runs class_exists(B) which autoloads and happily throws a Fatal error: Uncaught Error: Class "C" not found.

    Yeah, I tried to acknowledge that in this comment, but it's probably a little opaque:

      // There is a risk that a class or trait uses a missing trait somewhere
      // in its hierarchy and will cause a fatal error.
    

    Let me give some background first. My initial idea was only to introduce the Dependencies attribute, and leave it to plugin developers to purposely designate which modules the plugin depends on, in order to avoid the fatal error on missing traits. While it is an additional DX hurdle, my thought was that this amount of friction is relatively small. It also really only applies to classes using missing traits, since we are already catching the exception for missing interfaces and classes.

    While I was working with nikic/php-parser on this, I realized that the interfaces, classes, and trait dependencies were parsed and available, so I thought it was worth identifying those specifically as dependencies as well. Running the _exists() functions does run the risk of some trait in the class hierarchy being missing and causing the fatal. The chance of this happening is lower than a missing trait being used directly by the plugin class, and even in case there is a fatal error, that's when a plugin developer would use the Dependencies attribute.

    There is a gap if a missing dependency in the hierarchy comes from some PHP library that is not a module, but I think this is a fringe use case.

  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada

    So yes, someone needs to make a decision whether

    1. No class/trait/interface checks. Require explicit dependencies and maintain less core code.
    2. Middle ground: immediate classes checked with direct class_exists & friends
    3. Whole hog. Make DrupalKernel::$classLoader public readonly, use $kernel->classLoader->findFile() to find any classes/traits/interfaces being depended on, read 'em, parse 'em and recurse with a static cache.

    I have no opinion which one is best, I merely outlined how #3 could look.

    And/or use statements could easy be parsed for non-existent modules.

  • Whole hog. Make DrupalKernel::$classLoader public readonly, use $kernel->classLoader->findFile() to find any classes/traits/interfaces being depended on, read 'em, parse 'em and recurse with a static cache.
    ...
    And/or use statements could easy be parsed for non-existent modules.

    Migrate source discovery by annotations does this. Trying to provide BC for this annotated discovery specifically for migrate source, on top of adding attribute discovery, results in some complex code. And recreating this functionality, again only for migrate source, makes the code even more complex. Handling recursive dependencies in the attribute discovery base class would eliminate the specific complexity for migrate source. In fact, an issue about moving this functionality from migrate source to core was created, even before attribute-based discovery was a thing, but it's been unresolved for years: https://www.drupal.org/project/drupal/issues/2786355 β†’ .

    That being said, I think the scope of the missing dependencies issue will expand as attribute-based discovery expands to routes ( πŸ“Œ Use PHP attributes for route discovery Needs review ), and services ( ✨ Directory based automatic service creation Needs review , πŸ“Œ Support #[AsEventListener] attribute on classes and methods Active , ✨ Support hooks (Hook attribute) in components Active ), etc. It might actually make sense to provide some sort of missing-dependency-safe wrapper for Reflection to address all these use cases. Recursively iterating through all the files of all the classes in the hierarchies might be really slow, so that could eliminate it as an option.

  • And/or use statements could easy be parsed for non-existent modules.

    Migrate source discovery by annotations does this.

    Adding a correction that migrate source discovery does not quite do this. It does the recursion, but it checks the namespaces of only parent and ancestor classes and does not look at namespaces of traits or interfaces.

  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    I haven't seen it mentioned here but https://github.com/Microsoft/tolerant-php-parser is another possible alternative to nikic/php-parser, not looked into it at all, I only know of its existence.

  • Pipeline finished with Failed
    9 days ago
    Total: 727s
    #364503
  • Pipeline finished with Success
    9 days ago
    Total: 708s
    #364533
  • Rebased and addressed comments on the MR. I also moved all the "providers" logic to the core AttributeClassDiscovery class, since that's where it should happen.

    I'm continuing with the hybrid approach (option 2) for now. Hopefully can get this pushed forward and help get πŸ“Œ Convert MigrateSource plugin discovery to attributes Active unblocked. It should be fairly easy to remove the *_exists() checks and rely solely on the Dependencies attribute if requested to go with option 1. I think option 3 of going through all the class hierarchies is too slow.

    Along those lines, if nickic/php-parser is adopted here, and since it looks like BC support for plugin discovery by Annotations will continue for a while, plugin classes will be essentially parsed three times during discovery: once via the Doctrine-based StaticReflectionParser, then with nikic PhpParser, and then with Reflection. I think a reasonable follow up would be to replace or refactor StaticReflectionParser with PhpParser, and then somehow pass the parsed result into parseClass(), so that would remove one parsing pass. An eventual end goal could be parsing plugin Attributes with PhpParser as well, but as mentioned before, there is a lot of complexity in some of the Attribute classes.

    Only gave a cursory investigation into https://github.com/Microsoft/tolerant-php-parser, which documents itself as "early-stage":

    This is an early-stage PHP parser designed, from the beginning, for IDE usage scenarios (see Design Goals for more details). There is still a ton of work to be done, so at this point, this repo mostly serves as an experiment and the start of a conversation.

    good to see the constant expression evaluator here but without the helpers described in "Unsupported expressions and evaluator fallback" under https://github.com/nikic/PHP-Parser/blob/master/doc/component/Constant_e... it is quite limited and I think these limitations need to be documented on the Dependencies::__construct $modules argument. Especially the lack of constants.

    Added this documentation for the modules parameter in the Dependencies constructor.

  • Pipeline finished with Success
    7 days ago
    Total: 901s
    #366319
  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10
Production build 0.71.5 2024