Problem/Motivation
Our site uses drush to call scheduler's lightweight cron every minute, to publish scheduled content.
We have recently noticed that we are getting entries in our mysql slow query log every time it runs, due to the query in SchedulerManager::publish().
Example of the slow query:
# Query_time: 12.723781 Lock_time: 0.000127 Rows_sent: 17 Rows_examined: 35198933
SET timestamp=1671124649;
SELECT "base_table"."vid" AS "vid", "base_table"."nid" AS "nid", "node_field_revision_2"."publish_on" AS "publish_on"
FROM
"node_revision" "base_table"
LEFT OUTER JOIN "node_revision" "base_table_2" ON "base_table"."nid" = "base_table_2"."nid" AND "base_table"."vid" < "base_table_2"."vid"
INNER JOIN "node_field_revision" "node_field_revision" ON "node_field_revision"."vid" = "base_table"."vid"
INNER JOIN "node_field_data" "node_field_data" ON "node_field_data"."nid" = "base_table"."nid"
LEFT JOIN "node_field_revision" "node_field_revision_2" ON "node_field_revision_2"."vid" = "base_table"."vid"
WHERE ("base_table_2"."nid" IS NULL) AND ("node_field_revision"."publish_on" IS NOT NULL) AND ("node_field_revision"."publish_on" <= '1671124636') AND ("node_field_data"."type" IN ('announcement', 'campaign', 'campaign_case_study', 'consultation', 'detailed_guide', 'guide', 'html_document', 'html_document_contents', 'page', 'publication', 'stats_announcement', 'stats_background', 'stats_html_data', 'stats_release', 'stats_series'))
ORDER BY "node_field_revision_2"."publish_on" ASC, "base_table"."nid" ASC;
Running the query through EXPLAIN gives us:
+----+-------------+-----------------------+------------+--------+--------------------------------------------------------------------------+-----------+---------+---------------------------------+---------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+--------+--------------------------------------------------------------------------+-----------+---------+---------------------------------+---------+----------+----------------------------------------------+
| 1 | SIMPLE | node_field_revision | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1346298 | 30.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | base_table | NULL | eq_ref | PRIMARY,node__nid | PRIMARY | 4 | drupal8.node_field_revision.vid | 1 | 100.00 | NULL |
| 1 | SIMPLE | node_field_revision_2 | NULL | ref | PRIMARY | PRIMARY | 4 | drupal8.node_field_revision.vid | 1 | 100.00 | NULL |
| 1 | SIMPLE | node_field_data | NULL | ref | PRIMARY,node__id__default_langcode__langcode,node_field__type__target_id | PRIMARY | 4 | drupal8.base_table.nid | 1 | 100.00 | Using where |
| 1 | SIMPLE | base_table_2 | NULL | ref | PRIMARY,node__nid | node__nid | 4 | drupal8.base_table.nid | 17 | 100.00 | Using where; Not exists; Using index |
+----+-------------+-----------------------+------------+--------+--------------------------------------------------------------------------+-----------+---------+---------------------------------+---------+----------+----------------------------------------------+
Notice there is no key used on node_field_revision. I tried adding a key on publish_on, which then gets used by the query, but it doesn't seem to make a noticeable improvement in query speed.
Can we optimise the query or tables somehow to reduce the query time?
Steps to reproduce
Run drush scheduler-cron on a site with more than 100,000 revisions.