Improve Performance

Created on 10 April 2020, over 4 years ago
Updated 25 January 2024, 10 months ago

I work on an enterprise site that heavily leverages paragraphs, media, and general entity references. This site also makes use of many taxonomy vocabularies, terms, and taxonomy reference fields. While permissions_by_term + permissions_by_entity does work exactly as described, we are experiencing very noticeable performance issues due to the number of recursive field/entity checks and non-cached database queries. This is despite that fact that we only set permissions on a single access_roles vocab, and reference that with a single field_limit_access_roles on a few content types.

These modules are, quite noticeably, decreasing our site's performance. Queries selecting from the following tables are consistently giving us trouble:

  • taxonomy_index
  • node_access
  • permissions_by_term_role_select (with ~50 entries)
  • permissions_by_term_user_select (with 0 entries)

I'd love to open up a discussion about how we might be able to mitigate these issues. Some ideas I've had involve heavy use of additional caching and/or config data, paired with storage of situations where permissions need to be checked instead of checking everything on the fly when hitting an entity. These ideas include:

  • Storing the information from Drupal\permissions_by_term\Service\AccessCheck::isAnyPermissionSetForTerm() in cache/config, and updating it as needed
  • Leveraging \Drupal::service('entity_field.manager')->getFieldDefinitions('') to keep track of fields/entities that reference taxonomy terms/vocabs that assign permissions, and using that information to limit the fields/entities that are checked in Drupal\permissions_by_entity\Service\AccessChecker::isAccessAllowed() and Drupal\permissions_by_entity\Service\AccessChecker::isAccessControlled()
  • Do something similar to track fields that reference entities noted above to limit the need to recursively every entity in every field in Drupal\permissions_by_entity\Service\AccessChecker::isAccessAllowed() and Drupal\permissions_by_entity\Service\AccessChecker::isAccessControlled()
  • Determine whether or not is really necessary to recursively check entities in the above methods (or at all) given that the recursive checks never occur at all for fields/entities that would be checked after a field that has see a TRUE result from either isAccessControlled or isAccessAllowedByDatabase(). The main reason for the kernalsubscriber(s) appears to be to serve a 404, which only needs to happen if access is denied to the parent entity. Nested entities would, I believe, be handled by the two modules' hook_entity_access checks, mitigating the need for the recursive checks.
  • Use the cached/stored data relating to relevant terms/vocabs to limit the amount of DB hits to the taxonomy_index table. This one seems like it'll be the hardest to pin down given that it's referenced in a number of places.

Does anybody (ideally maintainers? :D) foresee issues with the above or have any other ideas for how we might be able to optimize the performance of these modules? I'm marking this as Major due to how much these issues are affecting our site in particular, but if that's an inappropriate priority then of course it can be downgraded to Normal.

🌱 Plan
Status

Needs review

Version

3.1

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States mrweiner

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡¦πŸ‡ΊAustralia alex.skrypnyk Melbourne

    I've validated patch #8 and the whole approach: everything said above makes sense and works. Thank you @slashrsm for working on this.

    I'm attaching a re-rolled #8 with prefixed cache id name and code formatted according to Drupal coding standards.

    Unfortunately, this module in the current state will most likely have a significant performance impact on a site with more than 1000 nodes, especially for content editors.

  • πŸ‡ΊπŸ‡ΈUnited States scottsawyer Atlanta

    I think this patch breaks \Drupal\permissions_by_entity\Service\AccessChecker since the constructor changed. Anyway, it blew up on me, but I made a lot of changes, so need to do some additional testing.

  • πŸ‡ΊπŸ‡ΈUnited States scottsawyer Atlanta

    FYI, there is nothing to test against. Please review.

  • πŸ‡ΊπŸ‡ΈUnited States scottsawyer Atlanta

    Even with these changes applied, looking at New Relic, \Drupal\permissions_by_entity\Service\AccessChecker::isAccessControlled is still taking too much time.

    Looking at the code:

     public function isAccessControlled(FieldableEntityInterface $entity, bool $clearCache = TRUE): bool {
        if ($clearCache) {
          $this->checkedEntityCache->clear();
        }
    
        if ($entity->getEntityTypeId() == 'node') {
          // Make sure to leave nodes to the permissions_by_term module.
          return FALSE;
        }
    
    

    Maybe rearranging these two so that cache is not cleared before the node check might help. Also, I am wondering if we always need to clear all the cache for every entity every time access is checked?

    Also it looks like:

    // Iterate over the fields the entity contains.
        foreach ($entity->getFields() as $field) {
          // We only need to check for entity reference fields
          // which references to a taxonomy term.
          if (
            $field->getFieldDefinition()->getType() == 'entity_reference' &&
            $field->getFieldDefinition()->getSetting('target_type') == 'taxonomy_term'
          ) {
            // Iterate over each referenced taxonomy term.
            /** @var \Drupal\Core\Field\FieldItemInterface $item */
            foreach ($field->getValue() as $item) {
              if(!empty($item['target_id']) && $this->isAnyPermissionSetForTerm($item['target_id'], $entity->language()->getId())) {
                return TRUE;
              }
            }
          }
    ...
    

    We never cache this result whether true or not (and by extension, we never check cache).

    Maybe changing to something like:

    
      /**
       * {@inheritdoc}
       */
      public function isAccessControlled(FieldableEntityInterface $entity, bool $clearCache = TRUE): bool {
        if ($entity->getEntityTypeId() == 'node') {
          // Make sure to leave nodes to the permissions_by_term module.
          return FALSE;
        }
    
        if ($clearCache) {
          $this->checkedEntityCache->clear();
        }
        elseif ($this->checkedEntityCache->isChecked($entity)) {
          return TRUE;
        }
    
        // Iterate over the fields the entity contains.
        foreach ($entity->getFields() as $field) {
          // We only need to check for entity reference fields
          // which references to a taxonomy term.
          if (
            $field->getFieldDefinition()->getType() == 'entity_reference' &&
            $field->getFieldDefinition()->getSetting('target_type') == 'taxonomy_term'
          ) {
            // Iterate over each referenced taxonomy term.
            /** @var \Drupal\Core\Field\FieldItemInterface $item */
            foreach ($field->getValue() as $item) {
              if(!empty($item['target_id']) && $this->isAnyPermissionSetForTerm($item['target_id'], $entity->language()->getId())) {
                // Add entity to cache.
                $this->checkedEntityCache->add($entity);
                return TRUE;
              }
            }
          }
    ...
    

    This way we can short circuit the entire recursive access check.

  • First commit to issue fork.
  • πŸ‡―πŸ‡΄Jordan Ahmad Khader

    Added langcode to cache tags to support multilingual.

  • πŸ‡―πŸ‡΄Jordan Ahmad Khader

    fix to commit #13 patch

  • πŸ‡―πŸ‡΄Jordan alaa abbad

    Reroll the patch to ensure compatibility with the most recent version.

  • Tried the patch from #16, minimal performance gains on our site with thousands of nodes. Switched to Taxonomy Access Control Lite β†’
    instead with minimal configuration changes required. Works the same, just faster!

Production build 0.71.5 2024