Simple "DELETE from cache_render" causing database deadlock

Created on 10 February 2025, 5 months ago

I've got a problem I'm trying to diagnose, hoping to get a little help.

We had a website outage over the weekend. When the server died, the database SHOW FULL PROCESSLIST; showed the server locked on a DELETE FROM cache_render, followed by a number of waiting "INSERT INTO cache_data" queries:

Id	User	Host	db	Command	Time	State	Info	Progress
48	zabbix	localhost	NULL	Sleep	34		NULL	0.000
1234482	REDACTED	localhost	REDACTED	Query	6702	Updating	DELETE FROM "cache_render" \nWHERE "created" <= '1739073503.459'	0.000
1234485	REDACTED	localhost	REDACTED	Query	6698	Update	INSERT INTO "cache_data" ("cid", "expire", "created", "tags", "checksum", "data", "serialized") VALUES ('route:[language]=en:[query_parameters]=:/REDACTED', '-1', '1739077810.561', 'route_match', '12327', 'a:3:{s:4:\\"path\\";s:33:\\"/REDACTED\\";s:5:\\"query\\";a:0:{}s:6:\\"routes\\";O:41:\\"Symfony\\\\Component\\\\Routing\\\\RouteCollection\\":4:{s:49:\\"\\0Symfony\\\\Component\\\\Routing\\\\RouteCollection\\0routes\\";a:0:{}s:50:\\"\\0Symfony\\\\Component\\\\Routing\\\\RouteCollection\\0aliases\\";a:0:{}s:52:\\"\\0Symfony\\\\Component\\\\Routing\\\\RouteCollection\\0resources\\";a:0:{}s:53:\\"\\0Symfony\\\\Component\\\\Routing\\\\RouteCollection\\0priorities\\";a:0:{}}}', '1') ON DUPLICATE KEY UPDATE "cid" = VALUES("cid"), "expire" = VALUES("expire"), "created" = VALUES("created"), "tags" = VALUES("tags"), "checksum" = VALUES("checksum"), "data" = VALUES("data"), "serialized" = VALUES("serialized")	0.000
1234486	REDACTED	localhost	REDACTED	Query	6697	Statistics	SELECT "cid", "data", "created", "expire", "serialized", "tags", "checksum" FROM "cache_data" WHERE "cid" IN ( 'route:[language]=en:[query_parameters]=REDACTED' ) ORDER BY "cid"	0.000
1234487 REDACTED	localhost	REDACTED	Query	6696	Statistics	SELECT "cid", "data", "created", "expire", "serialized", "tags", "checksum" FROM "cache_data" WHERE "cid" IN ( 'route:[language]=en:[query_parameters]=REDACTED' ) ORDER BY "cid"	0.000

And it goes on and on like that with about 25 stuck inserts into the cache_data table only.

We're running MariaDB, and the database is in READ COMMITTED mode.

The end of the innodb_engine_status.txt is interesting...

---TRANSACTION 544647116, ACTIVE 6933 sec fetching rows
mysql tables in use 1, locked 1
782 lock struct(s), heap size 90232, 5344 row lock(s), undo log entries 2672
MariaDB thread id 1234482, OS thread handle 140537398113984, query id 265541955 localhost REDACTED Updating
DELETE FROM "cache_render" 
WHERE "created" <= '1739073503.459'
---TRANSACTION (0x7fd179119b80), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
Pending flushes (fsync): 0
32817243 OS file reads, 33500754 OS file writes, 15031130 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
---
LOG
---
Log sequence number 2921588133075
Log flushed up to   2921588133075
Pages flushed up to 2921563678894
Last checkpoint at  2921563678894
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 167772160
Dictionary memory allocated 1304616
Buffer pool size   8064
Free buffers       0
Database pages     8059
Old database pages 2964
Modified db pages  6209
Percent of dirty pages(LRU & free pages): 77.035
Max dirty pages percent: 90.000
Pending reads 0
Pending writes: LRU 8, flush list 7
Pages made young 31507660, not young 445152644
0.00 youngs/s, 0.00 non-youngs/s
Pages read 32817037, created 7325289, written 19677775
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 8059, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
21 read views open inside InnoDB
2 tablespace extents now reserved for B-tree split operations
state: sleeping
----------------------------
END OF INNODB MONITOR OUTPUT
============================

As the DB is in this locked status, the CPU usage on the DB is 0%.

I'm a bit stumped as to how to start diagnosing this. The cache_render and cache_data DBs don't appear to be overly large - maybe 10,000 entries. It's not clear why cache_data queries appear to be queueing up when cache_render is the only table that's locked.

My Drupal gut says "This is a simple query, against a table with light indexing, it shouldn't be an issue and therefore must be a DB stability issue", but I'm also assured the DB is stable and it's not the problem either.

Does anyone have ideas how to continue to diagnose this issue?

Are there any quick fixes we can do to decrease the chances of it happening again?

💬 Support request
Status

Active

Version

10.3

Component

other

Created by

🇨🇦Canada TrevorBradley

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.71.5 2024