Entity queries querying the latest revision very slow with lots of revisions

Created on 7 March 2018, over 7 years ago
Updated 24 January 2023, over 2 years ago

After upgrading to 8.4 we noticed a performance decrease on one of our sites. The page that is taking minutes to load is a multilanguage page with several field collections and revisions enabled.

Some debugging lead me to the change in issue 2864995 β†’ .

This change introduces a latestRevision() method which is used by QuickEdit (this explains why it is only when logged in). On a field collection that has about 4000 revisions, this query takes seconds to run.

If I understand correctly the change introduces a self-join to the base revision table. Because of this, for each revision in the table, a set of revisions + next revisions is added to the result. From this result only the row that has no next revision (is null) is used (in my case 7000000 rows are searched for the one that has no next revisions).

Maybe we can think of a more efficient way of querying the latest revision?

Please do not create patches or re-roll previous patches. Use the Merge Request functionality β†’ .

πŸ› Bug report
Status

Needs work

Version

10.1 ✨

Component
EntityΒ  β†’

Last updated 10 days ago

Created by

πŸ‡³πŸ‡±Netherlands corneboele

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

  • Needs reroll

    The patch will have to be re-rolled with new suggestions/changes described in the comments in the issue.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    This issue is being reviewed by the kind folks in Slack, #needs-review-queue-initiative. We are working to keep the size of Needs Review queue [2700+ issues] to around 400 (1 month or less), following Review a patch or merge request β†’ as a guide.

    @jonathan1055 is correct. If an issue is being worked on in the MR it just adds noise to continue with patches. MRs are also easier to review.

    At this time we will need a MR for D10 though addressing the open threads in the current. Recommend starting a new one vs updating the existing.

    Also starting in #134 the patch size almost tripled.

  • First commit to issue fork.
  • @lucassc opened merge request.
  • Status changed to Needs review over 2 years ago
  • πŸ‡§πŸ‡·Brazil lucassc Rio de Janeiro

    Hi! I opened a MR for 10.1.x, which was not available first from the "Create new branch" link at the top of the issue. So I followed this doc β†’ to add 10.1.x as the basis for my issue branch 2950869-optimizing-revision-queries.

    Not sure if this is the correct way, I'm novice.

    Please, review.

  • πŸ‡§πŸ‡·Brazil lucassc Rio de Janeiro

    We had comments in the old MR that still need to be discussed.

    core/lib/Drupal/Core/Entity/Query/Sql/Query.php:

    // Add a self-join to the base revision table if we're querying only the
    // latest revisions.
    

    @daffie said "This comment needs an update as it no longer reflects what we are doing."

    core/tests/Drupal/KernelTests/Core/Entity/EntityQueryTest.php:

    $results = $this->queryResults = $this->storage
      ->getQuery()
      ->latestRevision()
      ->notExists("$figures.color")
      ->accessCheck(TRUE)
      ->execute();
    $expected = [16 => '4', 8 => '8', 20 => '12'];
    $this->assertSame($expected, $results);
    
    // Update an entity.
    $entity = EntityTestMulRev::load(4);
    $entity->setNewRevision();
    $entity->$figures->color = 'red';
    $entity->save();
    
    $results = $this->queryResults = $this->storage
      ->getQuery()
      ->latestRevision()
      ->notExists("$figures.color")
      ->accessCheck(TRUE)
      ->execute();
    $expected = [8 => '8', 20 => '12'];
    $this->assertSame($expected, $results);
    

    @daffie said: "We need to do a change that keeps it in the result and therefore we see the changed revision id. The current change removes it from the result."

    @neclimdul replied "I'm not sure what this is asking. What result would we be able to see the revision ID in?"

  • πŸ‡¬πŸ‡§United Kingdom catch

    Now that quickedit is no longer in core, this is less likely to cause timeouts during normal site operation - so moving back down to major, but of course we should still fix it.

  • Status changed to Needs work over 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Removing reroll as @lucassc you seemed to get it correct.

    But there does appear to be a test failure.

  • πŸ‡¬πŸ‡·Greece magtak

    I work for Acquia Support and I came across this issue while debugging a use case where all the revisionable entities (Site Studio layout_canvas entities) were set to be sequentially re-saved (Site Studio rebuild).

    Looking at the query,

    ...LEFT OUTER JOIN "cohesion_layout_revision" "base_table_2" ON "base_table"."id" = "base_table_2"."id" AND "base_table"."revision" < "base_table_2"."revision"...
    

    The LEFT OUTER JOIN and the '<' conditional operator will match rows proportionate to the number of revisions an ID has, this will create new rows in an exponential way.

    For the case I was looking at in particular, a 10h+ operation that resulted in 1000+ second SQL queries at times which locked the revisions table and thus blocked any content editing, finished in about 35 minutes with the patch applied with barely any queries taking over 1 second. The items generating those 1000+ second queries were processed instantly.

    I checked our fleet's logs and there are multiple sites impacted by this as I can see thousands of slow ( > 1s) queries generated by the code above.

    Patch #133 worked for the client I was working on given their Drupal version (9.5.x).

    While we will be advising clients to use patches from this thread if they encounter this issue, this fix should be high priority and should be backported to all affected Drupal branches (8, 9, 10).

  • πŸ‡§πŸ‡ͺBelgium Dries

    Great to see we are getting closer to a solution.

    In addition to improving the queries, it might not be a bad idea to show a warning on /admin/reports/status when there are entities with more than $number revisions. At a minimum, we should inform a site administrator about this.

    As a next step, I would not be opposed to a new feature in core to prune revision after $number days/weeks/months/year/never. We should tackle that in a separate issue though.

  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    Hey @Dries, cool to have your feedback here! :)
    +1 on that (while the limit should be configurable in config, at least in settings.php)

    I guess that should be combined with the work from: #2925532: Provide bulk action to delete all old revisions of an entity β†’
    #53 β†’ there please.

    So if you create or find a task for that revision prune in core, please link both. It would really make a lot of sense to at least have a full-featured Entity revisioning API in core. So contrib can implement more specific concepts, using a consistent API for all entity types. Like bulk deletion manually (checkboxes in revisions tab and as views bulk operation), by age, by limit, etc. if that's too specific for core.

    Thanks :)

  • heddn Nicaragua

    As FYI, https://www.drupal.org/project/node_revision_delete β†’ has this feature in contrib. The hardest part of removing revisions is when the site has multiple languages installed. The way revisions are stored w/ multiple languages is complicated to say the least. You can't just remove the last 5 revisions of a node. Because you'll loose a revision that is published as an alternative language.

  • πŸ‡¨πŸ‡¦Canada joel_osc

    Also as an FYI we are experiencing performance issues due to revisions and wanted to use the well-written node_revision_delete module, but found that not only was multilingual problematic, things got worse with content moderation and trying to ensure that you don't prune forward revisions. Also, we had a requirement to preserve all historic 'published' revisions in all languages. We have contributed this module: https://www.drupal.org/project/cm_revision_delete β†’ to help meet these requirements. Cheers!

  • πŸ‡¬πŸ‡·Greece magtak

    I second @heddn 's concern about multilingual sites.
    While working on this bug I used both

    - node_revision_delete and
    - entity_reference_revisions (*)

    ... to prune both node revisions ( (*) but also "orphaned entities" that I believe do *not* get deleted when the node revision does) and it did remove revisions in a way that wouldn't make sense for the governance of multilingual sites.

  • πŸ‡§πŸ‡ͺBelgium rp7

    FWIW, I work on a high traffic Drupal installation & the DB CPU usage skyrocketed to 100% after applying the patch from https://git.drupalcode.org/project/drupal/-/merge_requests/715. It brought the DB (and the site) to a halt eventually. So be aware :-)

    This was (a part of) the query causing the high CPU usage (for some reason the AWS RDS UI won't show me the whole query):

    SELECT "base_table"."vid" AS "vid", "base_table"."nid" AS "nid"
    FROM
    "node_revision" "base_table"
    INNER JOIN "node" "node" ON "node"."nid" = "base_table"."nid"
    LEFT JOIN "content_moderation_state_field_data" "cmsfd" ON cmsfd.content_entity_type_id = 'node' AND cmsfd.content_entity_id = base_table.nid AND cmsfd.content_entity_revision_id = base_table.vid
    WHERE ("base_table"."vid" IN (SELECT MAX(base_table.vid) AS "expression"
    FROM
    "node_revision" "base_table"
    GROUP BY base_table.nid)) AND ("node"

    The project has about ~500K node revisions. It's multilingual & uses content moderation.

  • First commit to issue fork.
  • Pipeline finished with Failed
    about 1 year ago
    Total: 551s
    #285816
  • Pipeline finished with Failed
    about 1 year ago
    Total: 743s
    #285824
  • Pipeline finished with Success
    about 1 year ago
    Total: 545s
    #285883
  • Status changed to Needs review about 1 year ago
  • πŸ‡¨πŸ‡¦Canada pavlosdan

    Added merge request targeting 11.x and addressed the comments in #147.

  • πŸ‡§πŸ‡ͺBelgium wim leers Ghent πŸ‡§πŸ‡ͺπŸ‡ͺπŸ‡Ί

    @rp7 in #159:

    So probably there's an incompatibility that I haven't figured out yet. Just wanted to post this to inform people (which are doing custom stuff like above) to be cautious.

    Any chance you did figure that out? 🀞 If you were able to narrow it down to your hook_query_entity_reference_alter() implementation, then we should be able to proceed with the current MR (with caution), but if the root cause was the MR, then we'd need to rethink it.

    Additional information would be very valuable! πŸ™πŸ˜Š

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Issue summary should be using the standard issue template so it's clear for reviewers

    There are 3 MRs open which is to be reviewed?

  • It's really annoying, especially when using Scheduler Module.
    So, does anyone have any experience to share? Which MR should be considered?
    Any advice , please ? Thanks

  • πŸ‡΅πŸ‡±Poland dmitry.korhov Poland, Warsaw
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    There are multiple MRs up, not super clear which is to be reviewed.

    Appears to still need an issue summary update too.

  • πŸ‡ͺπŸ‡ΈSpain nicobot Granada

    I encountered the same issue and initially created a duplicate here: Issue #3491274 πŸ› Improve performance of queries fetching latest node revisions by optimizing self-join logic in Entity\Query\Sql\Query::prepare() Active .

    After investigating, I reached the same conclusion regarding updating \Drupal\Core\Entity\Query\Sql\Query::prepare() with the following implementation:

    // Add a self-join to the base revision table if we're querying only the
    // latest revisions.
    if ($this->latestRevision && $revision_field) {
      $this->sqlQuery->where("base_table.$revision_field = (
        SELECT MAX($revision_field)
        FROM $base_table
        WHERE $id_field = base_table.$id_field
      )");
    }
    

    The merge request (MR) containing my changes is available here: MR #10437.

    This change has been applied to a live site and has significantly improved query performance, reducing execution time from nearly 10 seconds to 0.007 seconds when working with nodes that have over 13,000 revisions.

    However, I think the implementation in the proposed MR #9523 is cleaner and more maintainable. That said, I am uncertain about the connection between the primary performance issue and the changes in ContentEntityStorageBase.php in MR #9523.

    It might be worth exploring whether these changes could be addressed in a separate issue, as their relationship to the main performance problem isn't entirely clear (to me). IMHO this could help avoid blocking progress on resolving the current issue.

    Thank you all

  • Status changed to Needs work 8 months ago
  • Has anyone updated the patch for Drupal 10.4.x?
    2950869-139.patch πŸ› Entity queries querying the latest revision very slow with lots of revisions Needs work fails on 10.4.1.

    Thanks

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    I'm hiding the old 10.x and earlier branches as they are not ever going to land and focussing on the 11.x branch.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    alexpott β†’ changed the visibility of the branch 2950869-optimizing-revision-queries to hidden.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    alexpott β†’ changed the visibility of the branch 10.1.x to hidden.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    alexpott β†’ changed the visibility of the branch 2950869-entity-queries-querying to hidden.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Updated issue credit.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Fixed the issue summary.

  • Pipeline finished with Failed
    about 1 month ago
    Total: 949s
    #566020
  • Pipeline finished with Failed
    about 1 month ago
    Total: 499s
    #566055
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Given we can't do the optimisation in \Drupal\Core\Entity\ContentEntityStorageBase::getLatestRevisionId() because of workspaces. We can just do the better approach in \Drupal\Core\Entity\Query\Sql\Query::prepare()

    Updated the issue summary to reflect this.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Also given this affects param conversion and is used by workspaces I get the feeling this is actually a critical - we can also see the query in the performance tests.

  • Pipeline finished with Success
    about 1 month ago
    Total: 1809s
    #566081
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Editing a node with 4000 revisions... before the GET on node/N/edit takes 976ms afterwards 226ms... we also have autosave enabled... before the posts to autosave take 730ms and afterwards they take 50ms... YES nearly the entire time on an autosave is spent doing param conversion to get the latest revision ID... lolz...

    In my opinion this is RTBC and given the code is @fabianx's and only the test is mine I think I'm in a good place to RTBC this.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ
    • catch β†’ committed 6528a0f3 on 11.2.x
      Issue #2950869 by amateescu, alexpott, fabianx, lucassc, abhishek-anand...
    • catch β†’ committed b450d189 on 11.x
      Issue #2950869 by amateescu, alexpott, fabianx, lucassc, abhishek-anand...
  • πŸ‡¬πŸ‡§United Kingdom catch

    Opened πŸ“Œ Revisit latest revision / ::getActive() param conversion behaviour Active too to revisit the logic that causes this to be called so often (and also causes various other problems).

    Committed/pushed to 11.x and cherry-picked to 11.2.x, thanks!

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Going to do a 10.x backport - I think this is worth it.

  • Merge request !12973Backport 2950869 β†’ (Closed) created by alexpott
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    I've backported the fix to 10.x by cherry-picking the 11.x commit.

  • Pipeline finished with Failed
    about 1 month ago
    #570988
    • catch β†’ committed ad6d4462 on 10.5.x
      Issue #2950869 by amateescu, alexpott, fabianx, lucassc, abhishek-anand...
    • catch β†’ committed 88413b86 on 10.6.x
      Issue #2950869 by amateescu, alexpott, fabianx, lucassc, abhishek-anand...
  • πŸ‡¬πŸ‡§United Kingdom catch

    I've gone ahead and backported this to 10.6.x and 10.5.x, was borderline on 10.5.x but there's literally no API change or behaviour change here it's just improving the performance of a very specific database query.

  • πŸ‡³πŸ‡ΏNew Zealand quietone

    mondrake reported in Slack that testing is failing on HEAD for PostgreSQL and SQLite.

    I git bisected with PostgreSQL and found that this issue caused the problem.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Opened πŸ› Fix EntityQueryTest on SQLite and Postgres Active to fix Postgres and SQLite

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    @quietone thanks for the bisect. Went ahead with πŸ› Fix EntityQueryTest on SQLite and Postgres Active so we can close this one again.

  • Automatically closed - issue fixed for 2 weeks with no activity.

  • Pipeline finished with Failed
    1 day ago
    Total: 643s
    #602254
Production build 0.71.5 2024