EntityQueryAlter slowness in large site

Created on 15 February 2024, 9 months ago

Problem/Motivation

In the process of altering the query, we in turn perform a query of the plugins in use https://git.drupalcode.org/project/group/-/blob/1d12bada0c146b8f31fb74fd...

In large enough sites, this query can take a significant amount of time to complete (~2 seconds each invocation (profiled with php-spx) w/ ~3.7 million nodes and a few dozen users), and across altering many entity queries, can cause page loads to increase significantly.

Steps to reproduce

Create many nodes associated to groups.

Alternatively, could artificially induce by including a short `sleep()` when the "plugin-in-use" query is performed.

Then in a somewhat complex node display, one should see that the page takes a significant amount of time to build/load.

Proposed resolution

Implement some additional caching.

Remaining tasks

User interface changes

API changes

Data model changes

πŸ“Œ Task
Status

Active

Version

2.2

Component

Code

Created by

πŸ‡¨πŸ‡¦Canada adam-vessey PE, Canada

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @adam-vessey
  • πŸ‡¨πŸ‡¦Canada adam-vessey PE, Canada

    Slapped together a patch introducing a naive bit of memoization/caching which seems to do the trick; however, not entirely a fan of it. Half-thinking that this query and caching thereof could make sense being moved up to a/the plugin manager?

    Also, unsure of particular cache bins to use (went with `cache.data`), and cache key/id, but was sufficient to get from >30 second pages loads down to ~3 seconds (still more to pursue elsewhere).

  • πŸ‡·πŸ‡΄Romania ion.macaria Bucharest, πŸ‡·πŸ‡΄

    My investigation sent me to this issue. We are expecting the same problem with our very large site. The patch is doing it job, but I don't understand why we are altering query for entities which are not related with groups at all?

  • πŸ‡ΊπŸ‡ΈUnited States mettasoul

    I believe the query is do to the entity access check for permissions around the 'View any content item entities'. Dissabling SQL rewriting in the view will remove it. Assuming you don't need the access check in your views.

  • πŸ‡©πŸ‡ͺGermany mxh Offenburg

    Same slow query is happening within group_entity_access function so that part may be optimized as well using the cache.

  • πŸ‡¨πŸ‡¦Canada adam-vessey PE, Canada

    @ion.macaria: The alter is invoked for all entity queries, but contains logic to exit early without modifying the query when there are no relevant entity types. It is in determining if there are any relevant entity types that the slow query is located, which can be improved by the memoization/caching in the patch from #2

    @stevenjay: Disabling SQL rewriting could affect/suppress other access control mechanism that might be expected to also alter the queries. If the view in question is exclusively for administrative uses, perhaps? Otherwise, can be risky.

    @mxh: Meaning the `$plugin_ids_used` bit on https://git.drupalcode.org/project/group/-/blob/2.3.x/group.module?ref_t... / https://git.drupalcode.org/project/group/-/blob/3.3.x/group.module?ref_t... ? I vaguely recall looking at it, but I'm wanting to say that it was less of an issue with the additional `entity_id` condition it contains and possibly further mitigated by the expectation of the access check being cacheable; however, it has been a while since I've looked at it.

  • πŸ‡¨πŸ‡¦Canada adam-vessey PE, Canada

    Rerolled patch for/against group 2.3.0.

    Will put together an MR.

  • Merge request !192Memoize/cache plugins in use. β†’ (Open) created by adam-vessey
  • Pipeline finished with Success
    10 days ago
    Total: 1178s
    #330453
  • πŸ‡·πŸ‡΄Romania ion.macaria Bucharest, πŸ‡·πŸ‡΄

    @adam-vessey entity type check is ok, but I think we need to check also bundle if exist, because we are linking bundles to the groups, not just entity types.

  • πŸ‡¨πŸ‡¦Canada adam-vessey PE, Canada

    @ion.macaria: Tagged queries are not necessarily originating from views, but could be being filtered by other subqueries in entity queries.

    I feel there is a balance to extracting those set-reducing statements from the primary query to push into the secondary/plugin-in-use query, where the results of the secondary query would no longer be generally reusable, effectively preventing the caching I have implemented?

    Following along that path, it would probably make sense to get rid of the secondary/plugin-in-use query entirely, to instead add additional joins/subqueries to the primary query; however, such would end up affecting _ALL_ entity queries, instead of allowing us to exit early to avoid making the query more complex where it is not necessary to do so?

    Taking from the other direction: The criteria of bundles is more a matter of the functionality of the plugin used to relate content to groups. To know which bundles are relevant for a given entity type, would it not be necessary to instantiate the relevant plugins? Probably requiring expanding the interface of the plugins such that they could be made to provide/apply their specific conditions in some manner? Such is sounding more like there might be a separate issue, with slow queries as a result of many entity types or bundles thereof being relatable to groups? I have not encountered such myself.

Production build 0.71.5 2024