EntityQueryAlter adds unnecessary extra join to group_relationship_field_data causing the query to time out

Created on 7 November 2023, about 1 year ago
Updated 21 February 2024, 10 months ago

Problem/Motivation

After the upgrade from 1.x to 2.x for Drupal 10 we have a media overview for groups that can no longer be accessed. This seems to be caused by the fact that our group media view already has a relationship joining the media_field_data table and the group_relationship_field_data table. The EntityQueryAlter::doAlter() method adds an extra join between the same tables, which somehow causes the query to hang. When this extra join is removed, the query runs perfectly fine.

Proposed resolution

If a view already has a join between the base table and group_relationship_field_data, reuse that join and don't add another one.

Remaining tasks

User interface changes

None

API changes

None

Data model changes

None

πŸ’¬ Support request
Status

Fixed

Version

2.2

Component

Code

Created by

πŸ‡³πŸ‡±Netherlands seanB Netherlands

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @seanB
  • Status changed to Needs review about 1 year ago
  • Open on Drupal.org β†’
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    Waiting for branch to pass
  • πŸ‡³πŸ‡±Netherlands seanB Netherlands

    Patch is attached.

  • Status changed to Needs work about 1 year ago
  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    We can't reuse queries the way this patch suggests as it's critical that we add those conditions on the join. If we were to reuse another join, we might not get those conditions and then we have a security issue on our hands.

  • πŸ‡³πŸ‡±Netherlands seanB Netherlands

    In the end we couldn't get the query to run with all the added joins from the query alters. Attached patch fixes a notice to allow us to remove the query alters of the groups module completely so we can implement our own (lightweight) version that works for our specific case.

    If anyone else needs this, to remove the group module hooks we do this:

    /**
    * Implements hook_module_implements_alter().
     */
    function mymodule_module_implements_alter(&$implementations, $hook) {
      // We specifically unset the query alters from the groups module because they
      // add a bunch of joins that make the query super slow. Since we add the
      // required filters in our views already, we don't need the ones from the groups
      // module.
      $query_alter_hooks = [
        'query_alter',
        'query_entity_query_alter',
        'views_query_alter',
      ];
      if (isset($implementations['group']) && in_array($hook, $query_alter_hooks, TRUE)) {
        unset($implementations['group']);
      }
    }
    
  • Status changed to Fixed about 1 year ago
  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Moving this to a support request as it has been made clear this is not a bug.

    BE VERY CAREFUL when implementing #5 as you are effectively killing all query access checks offered by Group.

  • πŸ‡³πŸ‡±Netherlands seanB Netherlands

    What we implemented now is a workaround. I think there is still a bug/issue somewhere. The regular views query, in combination with the group's module query alters, simply causes timeouts. This seems to be caused by the amount of added joins and this wasn't the cause in groups v1.

    Not sure yet how to fix it or how to further optimize the query.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Well in Group 1, we would process all of the access logic in PHP and then slap a giant "where gid IN ()" onto the query, leading to timeouts too depending on how big that array was.

    Could you turn on Views query debugging, grab the query and then try to run that manually using EXPLAIN? Also could you check the indexes? They should be configured correctly but in case they somehow aren't, that would explain why your query breaks.

  • Automatically closed - issue fixed for 2 weeks with no activity.

  • Status changed to Fixed 10 months ago
  • πŸ‡ΊπŸ‡ΈUnited States adriancotter

    Just upgraded to Group 2.0. So I am seeing Group add a huge group related query to many content views that we have -- whether or not it is related to group. I haven't gone back to compare this to how 1.x was working, but it does seem to be negatively effecting our site compared to 1.x.

    Many of our content types are not used by group, and many views do not including any group content.
    And in those views, I am seeing giant "gcfd"."gid" IN () for every group node plugin, even though they are not involved in a particular view. Our site speed is way down.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Many of our content types are not used by group

    Those do not make it into the query unless you installed them on a group type.

    and many views do not including any group content.

    Doesn't matter, if the view contains nodes and nodes could be shown/hidden by Group then Group will attach itself to the query. If it's a View for an admin UI where you know the people viewing it have full access, turn off query rewriting for the view.

    And in those views, I am seeing giant "gcfd"."gid" IN () for every group node plugin

    This means you have a large amount of "individual" group roles assigned, see if you can get rid of many of these memberships in favor of synchronized roles (outsider/insider). They will drastically speed up your queries.

    Finally, it would really help to see these slower sites in action, because I've got a feeling that a large part of the performance hit is due to a misconfiguration of roles and plugins.

  • πŸ‡ΊπŸ‡ΈUnited States adriancotter

    Those do not make it into the query unless you installed them on a group type.

    That is not my experience. These views where I am seeing the query altered are looking at content that are NOT installed on a group type. Have a query below

    This means you have a large amount of "individual" group roles assigned, see if you can get rid of many of these memberships in favor of synchronized roles (outsider/insider). They will drastically speed up your queries.

    The list is of installed group node plugins and group IDs... I'm not sure how that relates to our member group roles. We do have about a dozen. Most people only have 1 or 2 roles assigned.

    I have a very simple view that looks at the published pages of a certain article content type (sierraarticle). It is not part of group, the article content type is not installed on any group type.

    This is a shortened version of the query that I see just what is added on:
    The query does include any of the nodes in that first Outer Join

    LEFT OUTER JOIN {group_relationship_field_data} "gcfd" ON node_field_data.nid=gcfd.entity_id AND gcfd.plugin_id IN ('group_node:program_page', 'group_node:sierra_club_entity', 'group_node:sierra_club_entity_blog', 'group_node:sierra_club_entity_index_page', 'group_node:sierra_club_entity_legislation', 'group_node:sierra_club_entity_slideshow', 'group_node:webform')
    LEFT OUTER JOIN {group_relationship_field_data} "gcfd_2" ON gcfd.gid=gcfd_2.gid AND gcfd_2.plugin_id='group_membership' AND gcfd_2.entity_id='1'
    WHERE (("node_field_data"."status" = '1') AND ("node_field_data"."type" IN ('sierraarticle'))) AND (("gcfd"."entity_id" IS NULL) OR (("node_field_data"."status" = '0') AND ((("gcfd"."gid" IN ('28867', '629942', '489', '1983', '1992', '2001', '2030', '2041', '2042', '7505', '7891', '8435', '8633', '11805', '13483', '18307', '18776', '18835', '19019', '24185', '24532', '24857', '24871', '25041', '25353', '25370' ...
    

    Thos numbers are the group IDs, and are repeated again for each group plugin node type.

    For the moment, I have mitigated some of our problems by turning off query rewriting. But don't think I want to leave that as a permanent solution. If there's a way I can show you the setup, I'd be happy to try.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    I have a very simple view that looks at the published pages of a certain article content type (sierraarticle). It is not part of group, the article content type is not installed on any group type.

    Which is correct because group does not add that group_node plugin to the query.

    However, it seems like you do have about 7 other node types installed on your group types and those do need to be added to the query, even if it doesn't make sense to you. All Group gets from core is "this is a node query", not "this is an article query", so Group has to run its access checks on the node types that are grouped.

    Thos numbers are the group IDs, and are repeated again for each group plugin node type.

    Are you using any extra module that hands out permissions on individual group levels? Because it seems like your calculated permissions have a lot of entries for the individual scope. if someone only has maybe one or two memberships with individual roles assigned, they should only see one or two group IDs in that query.

    I am not seeing any group type checks in your query at all, indicating no synchronized permissions (inside or outsider) have been handed out whatsoever.

  • πŸ‡ΊπŸ‡ΈUnited States adriancotter

    Ok, I went back to your group permissions youtube video and understood it a bit better than the last time, at least in terms of what I can do to switch up my existing set of memberships -- and see there you mention performance hit, which I totally missed. I don't think I understand why it impacts performance so much when the traffic is mostly anonymous, but I can figure out that later.
    https://www.youtube.com/watch?v=xo2z8NuKEH4

    All Group gets from core is "this is a node query", not "this is an article query", so Group has to run its access checks on the node types that are grouped.

    As to the queries, conceptually I understand what you are saying here. I also think the queries I'm seeing in the views admin were for the admin user -- which I definitely didn't have properly set up as an outside member.
    I guess I'd hope it'd be more efficient when it wasn't necessary (outside of any group context in particular) but I also know this is a complicated tool -- and appreciate all your work here.
    Haste makes wastes as they say -- I should have watched the video a few more times. Thanks for being so responsive in these threads.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    No worries, hopefully you'll see a significant performance boost once you have your roles set up properly!

  • πŸ‡ΊπŸ‡ΈUnited States adriancotter

    Hey Kristiaan, unfortunately, we've not seen any great gains with performance after the change. All of our Group users now have an Outsider or Insider role of one type or another. There are still some individual roles that some percentage of user have (hoping still to reduce one or two more individual roles).

    Overall, through Blackfire profiling, comparing before the upgrade to after,

    • The overall site memory use (I think the graph is an average) has gone from roughly 14-17MB to 16-20MB (comparing about a weeks worth of time). The latter might have been 1 or 2 MB higher when I first made the switch, so there might have been some gain, but it is not as clear in the graph.
    • Looking at the NodeViewController information -- its gone from a 53% impact to a 72% impact (memory 36.4 to 52.5 MB)
    • Our overall cache use seems to have gone up (in terms of CPU processing)

    I also excluded user.group_permissions from cache as per this page:
    https://www.drupal.org/docs/contributed-modules/group/turning-off-cachin... β†’
    (we have ~300 groups averaging 2-3 active members)

    We've been working to try and eliminate any other processes of concern. Obviously there's a lot to any particular site and group configuration. Our particular use of Group, custom code, there might be some other things we should be doing differently now.

    If there's a better place to bring this up, or you'd prefer I start a separate ticket, please let me know. Appreciate any thoughts or additional pointers.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    I also excluded user.group_permissions

    That was for Group v1, for v2/v3 I wouldn't recommend turning it off but rather switching to more optimized use of insider/outsider roles, like you did.

    I feel like this might become very specific to your project and I am curious as to what might be causing your performance issues. We are currently running a large website ourselves with Group and the query performance isn't that bad. However, my bandwidth for these specific type of problems is rather limited when it comes to Factorial-sponsored time.

    If you can convince your managers, Factorial could do a more in-depth review of your installation. If it turns out to be Group related we can then contribute a fix back to the community, if it's specific to your installation we can try to fix it. Either way, we could free up some dedicated time then to help solve your performance issues.

    Feel free to send a mail to hello@factorial.io and/or kristiaan@factorial.io if you want us to do an in-depth review of your installation. If your managers don't feel like that, then you can always open a separate support request with as much detail you can provide and we can try to fix this at an admittedly far slower pace.

  • πŸ‡ΊπŸ‡ΈUnited States annie2512

    I had this extra join causing issues in one of my views.
    I could get around it by, going to Advanced->Query Settings and checking off Disable SQL rewriting. This disables node access checks as well as override hook_query_alter() implementations. Hope this helps.

Production build 0.71.5 2024