Problem/Motivation
When saving or editing content with many revisions, especially for nodes with high activity (e.g., homepage of an editorial site), a query used to fetch the latest node revision performs poorly.
Example of the query observed in the slow query log:
SELECT "base_table"."vid" AS "vid", "base_table"."nid" AS "nid"
FROM
"node_revision" "base_table"
LEFT OUTER JOIN "node_revision" "base_table_2" ON "base_table"."nid" = "base_table_2"."nid" AND "base_table"."vid" < "base_table_2"."vid"
INNER JOIN "node_field_data" "node_field_data" ON "node_field_data"."nid" = "base_table"."nid"
WHERE ("base_table_2"."nid" IS NULL) AND ("node_field_data"."nid" = '1000029');
Performance metrics:
Query time: ~9.36 seconds
Rows examined: 77,943,855
Investigation revealed that even with proper indexes, the query remains inefficient due to the LEFT OUTER JOIN with a NULL check.
Steps to reproduce
- Use a Drupal site with nodes having a large number of revisions (e.g., 10,000+).
- Edit or save a highly revised node.
- Observe the slow performance and query execution times using MySQL's general_log or EXPLAIN.
Proposed resolution
Replace the LEFT OUTER JOIN with a NOT EXISTS pattern or use a subquery for fetching the latest revision.
Optimized query:
SELECT base_table.vid AS vid, base_table.nid AS nid
FROM node_revision base_table
INNER JOIN node_field_data ON node_field_data.nid = base_table.nid
WHERE node_field_data.nid = '1000029'
AND base_table.vid = (
SELECT MAX(vid)
FROM node_revision
WHERE nid = base_table.nid
);
Performance metrics with the new query:
Query time: ~0.007 seconds
Rows examined: Reduced drastically
Proposed patch:
Modify \Drupal\Core\Entity\Query\Sql\Query::prepare() as follows:
// Add a self-join to the base revision table if we're querying only the
// latest revisions.
if ($this->latestRevision && $revision_field) {
$this->sqlQuery->where("base_table.$revision_field = (
SELECT MAX($revision_field)
FROM $base_table
WHERE $id_field = base_table.$id_field
)");
}
Remaining tasks
- Generate patch.
- Test the patch in different scenarios:
- Nodes with a small number of revisions.
- Nodes with a large number of revisions.
- Evaluate the impact on sites with and without custom hook_query_alter() implementations.
- Review the patch for coding standards compliance.
User interface changes
None.
Introduced terminology
None.
API changes
None.
Data model changes
None.
Release notes snippet
Optimized queries fetching the latest node revisions to improve performance for nodes with many revisions. Replaced an inefficient LEFT OUTER JOIN with a subquery in \Drupal\Core\Entity\Query\Sql\Query::prepare().