Improve performance of queries fetching latest node revisions by optimizing self-join logic in Entity\Query\Sql\Query::prepare()

Created on 3 December 2024, 23 days ago

Problem/Motivation

When saving or editing content with many revisions, especially for nodes with high activity (e.g., homepage of an editorial site), a query used to fetch the latest node revision performs poorly.

Example of the query observed in the slow query log:

SELECT "base_table"."vid" AS "vid", "base_table"."nid" AS "nid"
FROM
"node_revision" "base_table"
LEFT OUTER JOIN "node_revision" "base_table_2" ON "base_table"."nid" = "base_table_2"."nid" AND "base_table"."vid" < "base_table_2"."vid"
INNER JOIN "node_field_data" "node_field_data" ON "node_field_data"."nid" = "base_table"."nid"
WHERE ("base_table_2"."nid" IS NULL) AND ("node_field_data"."nid" = '1000029');

Performance metrics:

Query time: ~9.36 seconds
Rows examined: 77,943,855

Investigation revealed that even with proper indexes, the query remains inefficient due to the LEFT OUTER JOIN with a NULL check.

Steps to reproduce

  1. Use a Drupal site with nodes having a large number of revisions (e.g., 10,000+).
  2. Edit or save a highly revised node.
  3. Observe the slow performance and query execution times using MySQL's general_log or EXPLAIN.

Proposed resolution

Replace the LEFT OUTER JOIN with a NOT EXISTS pattern or use a subquery for fetching the latest revision.

Optimized query:

SELECT base_table.vid AS vid, base_table.nid AS nid
FROM node_revision base_table
INNER JOIN node_field_data ON node_field_data.nid = base_table.nid
WHERE node_field_data.nid = '1000029'
  AND base_table.vid = (
      SELECT MAX(vid)
      FROM node_revision
      WHERE nid = base_table.nid
  );

Performance metrics with the new query:

Query time: ~0.007 seconds
Rows examined: Reduced drastically

Proposed patch:
Modify \Drupal\Core\Entity\Query\Sql\Query::prepare() as follows:

// Add a self-join to the base revision table if we're querying only the
// latest revisions.
if ($this->latestRevision && $revision_field) {
  $this->sqlQuery->where("base_table.$revision_field = (
    SELECT MAX($revision_field)
    FROM $base_table
    WHERE $id_field = base_table.$id_field
  )");
}

Remaining tasks

  1. Generate patch.
  2. Test the patch in different scenarios:
    • Nodes with a small number of revisions.
    • Nodes with a large number of revisions.
  3. Evaluate the impact on sites with and without custom hook_query_alter() implementations.
  4. Review the patch for coding standards compliance.

User interface changes

None.

Introduced terminology

None.

API changes

None.

Data model changes

None.

Release notes snippet

Optimized queries fetching the latest node revisions to improve performance for nodes with many revisions. Replaced an inefficient LEFT OUTER JOIN with a subquery in \Drupal\Core\Entity\Query\Sql\Query::prepare().

🐛 Bug report
Status

Active

Version

10.5

Component

entity system

Created by

🇪🇸Spain nicobot Granada

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.71.5 2024