Entity queries querying the latest revision very slow with lots of revisions

Created on 7 March 2018, over 6 years ago
Updated 18 September 2024, about 2 months ago

After upgrading to 8.4 we noticed a performance decrease on one of our sites. The page that is taking minutes to load is a multilanguage page with several field collections and revisions enabled.

Some debugging lead me to the change in issue 2864995 β†’ .

This change introduces a latestRevision() method which is used by QuickEdit (this explains why it is only when logged in). On a field collection that has about 4000 revisions, this query takes seconds to run.

If I understand correctly the change introduces a self-join to the base revision table. Because of this, for each revision in the table, a set of revisions + next revisions is added to the result. From this result only the row that has no next revision (is null) is used (in my case 7000000 rows are searched for the one that has no next revisions).

Maybe we can think of a more efficient way of querying the latest revision?

Please do not create patches or re-roll previous patches. Use the Merge Request functionality β†’ .

πŸ› Bug report
Status

Needs review

Version

11.0 πŸ”₯

Component
EntityΒ  β†’

Last updated about 11 hours ago

Created by

πŸ‡³πŸ‡±Netherlands corneboele

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

  • Needs reroll

    The patch will have to be re-rolled with new suggestions/changes described in the comments in the issue.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    This issue is being reviewed by the kind folks in Slack, #needs-review-queue-initiative. We are working to keep the size of Needs Review queue [2700+ issues] to around 400 (1 month or less), following Review a patch or merge request β†’ as a guide.

    @jonathan1055 is correct. If an issue is being worked on in the MR it just adds noise to continue with patches. MRs are also easier to review.

    At this time we will need a MR for D10 though addressing the open threads in the current. Recommend starting a new one vs updating the existing.

    Also starting in #134 the patch size almost tripled.

  • First commit to issue fork.
  • @lucassc opened merge request.
  • Status changed to Needs review almost 2 years ago
  • πŸ‡§πŸ‡·Brazil lucassc Rio de Janeiro

    Hi! I opened a MR for 10.1.x, which was not available first from the "Create new branch" link at the top of the issue. So I followed this doc β†’ to add 10.1.x as the basis for my issue branch 2950869-optimizing-revision-queries.

    Not sure if this is the correct way, I'm novice.

    Please, review.

  • πŸ‡§πŸ‡·Brazil lucassc Rio de Janeiro

    We had comments in the old MR that still need to be discussed.

    core/lib/Drupal/Core/Entity/Query/Sql/Query.php:

    // Add a self-join to the base revision table if we're querying only the
    // latest revisions.
    

    @daffie said "This comment needs an update as it no longer reflects what we are doing."

    core/tests/Drupal/KernelTests/Core/Entity/EntityQueryTest.php:

    $results = $this->queryResults = $this->storage
      ->getQuery()
      ->latestRevision()
      ->notExists("$figures.color")
      ->accessCheck(TRUE)
      ->execute();
    $expected = [16 => '4', 8 => '8', 20 => '12'];
    $this->assertSame($expected, $results);
    
    // Update an entity.
    $entity = EntityTestMulRev::load(4);
    $entity->setNewRevision();
    $entity->$figures->color = 'red';
    $entity->save();
    
    $results = $this->queryResults = $this->storage
      ->getQuery()
      ->latestRevision()
      ->notExists("$figures.color")
      ->accessCheck(TRUE)
      ->execute();
    $expected = [8 => '8', 20 => '12'];
    $this->assertSame($expected, $results);
    

    @daffie said: "We need to do a change that keeps it in the result and therefore we see the changed revision id. The current change removes it from the result."

    @neclimdul replied "I'm not sure what this is asking. What result would we be able to see the revision ID in?"

  • πŸ‡¬πŸ‡§United Kingdom catch

    Now that quickedit is no longer in core, this is less likely to cause timeouts during normal site operation - so moving back down to major, but of course we should still fix it.

  • Status changed to Needs work almost 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Removing reroll as @lucassc you seemed to get it correct.

    But there does appear to be a test failure.

  • πŸ‡¬πŸ‡·Greece magtak

    I work for Acquia Support and I came across this issue while debugging a use case where all the revisionable entities (Site Studio layout_canvas entities) were set to be sequentially re-saved (Site Studio rebuild).

    Looking at the query,

    ...LEFT OUTER JOIN "cohesion_layout_revision" "base_table_2" ON "base_table"."id" = "base_table_2"."id" AND "base_table"."revision" < "base_table_2"."revision"...
    

    The LEFT OUTER JOIN and the '<' conditional operator will match rows proportionate to the number of revisions an ID has, this will create new rows in an exponential way.

    For the case I was looking at in particular, a 10h+ operation that resulted in 1000+ second SQL queries at times which locked the revisions table and thus blocked any content editing, finished in about 35 minutes with the patch applied with barely any queries taking over 1 second. The items generating those 1000+ second queries were processed instantly.

    I checked our fleet's logs and there are multiple sites impacted by this as I can see thousands of slow ( > 1s) queries generated by the code above.

    Patch #133 worked for the client I was working on given their Drupal version (9.5.x).

    While we will be advising clients to use patches from this thread if they encounter this issue, this fix should be high priority and should be backported to all affected Drupal branches (8, 9, 10).

  • πŸ‡§πŸ‡ͺBelgium Dries

    Great to see we are getting closer to a solution.

    In addition to improving the queries, it might not be a bad idea to show a warning on /admin/reports/status when there are entities with more than $number revisions. At a minimum, we should inform a site administrator about this.

    As a next step, I would not be opposed to a new feature in core to prune revision after $number days/weeks/months/year/never. We should tackle that in a separate issue though.

  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    Hey @Dries, cool to have your feedback here! :)
    +1 on that (while the limit should be configurable in config, at least in settings.php)

    I guess that should be combined with the work from: #2925532: Provide bulk action to delete all old revisions of an entity β†’
    #53 β†’ there please.

    So if you create or find a task for that revision prune in core, please link both. It would really make a lot of sense to at least have a full-featured Entity revisioning API in core. So contrib can implement more specific concepts, using a consistent API for all entity types. Like bulk deletion manually (checkboxes in revisions tab and as views bulk operation), by age, by limit, etc. if that's too specific for core.

    Thanks :)

  • heddn Nicaragua

    As FYI, https://www.drupal.org/project/node_revision_delete β†’ has this feature in contrib. The hardest part of removing revisions is when the site has multiple languages installed. The way revisions are stored w/ multiple languages is complicated to say the least. You can't just remove the last 5 revisions of a node. Because you'll loose a revision that is published as an alternative language.

  • πŸ‡¨πŸ‡¦Canada joel_osc

    Also as an FYI we are experiencing performance issues due to revisions and wanted to use the well-written node_revision_delete module, but found that not only was multilingual problematic, things got worse with content moderation and trying to ensure that you don't prune forward revisions. Also, we had a requirement to preserve all historic 'published' revisions in all languages. We have contributed this module: https://www.drupal.org/project/cm_revision_delete β†’ to help meet these requirements. Cheers!

  • πŸ‡¬πŸ‡·Greece magtak

    I second @heddn 's concern about multilingual sites.
    While working on this bug I used both

    - node_revision_delete and
    - entity_reference_revisions (*)

    ... to prune both node revisions ( (*) but also "orphaned entities" that I believe do *not* get deleted when the node revision does) and it did remove revisions in a way that wouldn't make sense for the governance of multilingual sites.

  • πŸ‡§πŸ‡ͺBelgium rp7

    FWIW, I work on a high traffic Drupal installation & the DB CPU usage skyrocketed to 100% after applying the patch from https://git.drupalcode.org/project/drupal/-/merge_requests/715. It brought the DB (and the site) to a halt eventually. So be aware :-)

    This was (a part of) the query causing the high CPU usage (for some reason the AWS RDS UI won't show me the whole query):

    SELECT "base_table"."vid" AS "vid", "base_table"."nid" AS "nid"
    FROM
    "node_revision" "base_table"
    INNER JOIN "node" "node" ON "node"."nid" = "base_table"."nid"
    LEFT JOIN "content_moderation_state_field_data" "cmsfd" ON cmsfd.content_entity_type_id = 'node' AND cmsfd.content_entity_id = base_table.nid AND cmsfd.content_entity_revision_id = base_table.vid
    WHERE ("base_table"."vid" IN (SELECT MAX(base_table.vid) AS "expression"
    FROM
    "node_revision" "base_table"
    GROUP BY base_table.nid)) AND ("node"

    The project has about ~500K node revisions. It's multilingual & uses content moderation.

  • First commit to issue fork.
  • Pipeline finished with Failed
    about 2 months ago
    Total: 551s
    #285816
  • Pipeline finished with Failed
    about 2 months ago
    Total: 743s
    #285824
  • Pipeline finished with Success
    about 2 months ago
    Total: 545s
    #285883
  • Status changed to Needs review about 2 months ago
  • πŸ‡¨πŸ‡¦Canada pavlosdan

    Added merge request targeting 11.x and addressed the comments in #147.

  • πŸ‡§πŸ‡ͺBelgium wim leers Ghent πŸ‡§πŸ‡ͺπŸ‡ͺπŸ‡Ί

    @rp7 in #159:

    So probably there's an incompatibility that I haven't figured out yet. Just wanted to post this to inform people (which are doing custom stuff like above) to be cautious.

    Any chance you did figure that out? 🀞 If you were able to narrow it down to your hook_query_entity_reference_alter() implementation, then we should be able to proceed with the current MR (with caution), but if the root cause was the MR, then we'd need to rethink it.

    Additional information would be very valuable! πŸ™πŸ˜Š

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Issue summary should be using the standard issue template so it's clear for reviewers

    There are 3 MRs open which is to be reviewed?

  • It's really annoying, especially when using Scheduler Module.
    So, does anyone have any experience to share? Which MR should be considered?
    Any advice , please ? Thanks

Production build 0.71.5 2024