- Issue created by @TrevorBradley
- 🇨🇦Canada TrevorBradley
**MIGHT** be related to https://jira.mariadb.org/browse/MDEV-35886 , we're investigating.
I've got a problem I'm trying to diagnose, hoping to get a little help.
We had a website outage over the weekend. When the server died, the database SHOW FULL PROCESSLIST; showed the server locked on a DELETE FROM cache_render, followed by a number of waiting "INSERT INTO cache_data" queries:
Id User Host db Command Time State Info Progress
48 zabbix localhost NULL Sleep 34 NULL 0.000
1234482 REDACTED localhost REDACTED Query 6702 Updating DELETE FROM "cache_render" \nWHERE "created" <= '1739073503.459' 0.000
1234485 REDACTED localhost REDACTED Query 6698 Update INSERT INTO "cache_data" ("cid", "expire", "created", "tags", "checksum", "data", "serialized") VALUES ('route:[language]=en:[query_parameters]=:/REDACTED', '-1', '1739077810.561', 'route_match', '12327', 'a:3:{s:4:\\"path\\";s:33:\\"/REDACTED\\";s:5:\\"query\\";a:0:{}s:6:\\"routes\\";O:41:\\"Symfony\\\\Component\\\\Routing\\\\RouteCollection\\":4:{s:49:\\"\\0Symfony\\\\Component\\\\Routing\\\\RouteCollection\\0routes\\";a:0:{}s:50:\\"\\0Symfony\\\\Component\\\\Routing\\\\RouteCollection\\0aliases\\";a:0:{}s:52:\\"\\0Symfony\\\\Component\\\\Routing\\\\RouteCollection\\0resources\\";a:0:{}s:53:\\"\\0Symfony\\\\Component\\\\Routing\\\\RouteCollection\\0priorities\\";a:0:{}}}', '1') ON DUPLICATE KEY UPDATE "cid" = VALUES("cid"), "expire" = VALUES("expire"), "created" = VALUES("created"), "tags" = VALUES("tags"), "checksum" = VALUES("checksum"), "data" = VALUES("data"), "serialized" = VALUES("serialized") 0.000
1234486 REDACTED localhost REDACTED Query 6697 Statistics SELECT "cid", "data", "created", "expire", "serialized", "tags", "checksum" FROM "cache_data" WHERE "cid" IN ( 'route:[language]=en:[query_parameters]=REDACTED' ) ORDER BY "cid" 0.000
1234487 REDACTED localhost REDACTED Query 6696 Statistics SELECT "cid", "data", "created", "expire", "serialized", "tags", "checksum" FROM "cache_data" WHERE "cid" IN ( 'route:[language]=en:[query_parameters]=REDACTED' ) ORDER BY "cid" 0.000
And it goes on and on like that with about 25 stuck inserts into the cache_data table only.
We're running MariaDB, and the database is in READ COMMITTED mode.
The end of the innodb_engine_status.txt is interesting...
---TRANSACTION 544647116, ACTIVE 6933 sec fetching rows
mysql tables in use 1, locked 1
782 lock struct(s), heap size 90232, 5344 row lock(s), undo log entries 2672
MariaDB thread id 1234482, OS thread handle 140537398113984, query id 265541955 localhost REDACTED Updating
DELETE FROM "cache_render"
WHERE "created" <= '1739073503.459'
---TRANSACTION (0x7fd179119b80), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
Pending flushes (fsync): 0
32817243 OS file reads, 33500754 OS file writes, 15031130 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
---
LOG
---
Log sequence number 2921588133075
Log flushed up to 2921588133075
Pages flushed up to 2921563678894
Last checkpoint at 2921563678894
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 167772160
Dictionary memory allocated 1304616
Buffer pool size 8064
Free buffers 0
Database pages 8059
Old database pages 2964
Modified db pages 6209
Percent of dirty pages(LRU & free pages): 77.035
Max dirty pages percent: 90.000
Pending reads 0
Pending writes: LRU 8, flush list 7
Pages made young 31507660, not young 445152644
0.00 youngs/s, 0.00 non-youngs/s
Pages read 32817037, created 7325289, written 19677775
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 8059, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
21 read views open inside InnoDB
2 tablespace extents now reserved for B-tree split operations
state: sleeping
----------------------------
END OF INNODB MONITOR OUTPUT
============================
As the DB is in this locked status, the CPU usage on the DB is 0%.
I'm a bit stumped as to how to start diagnosing this. The cache_render and cache_data DBs don't appear to be overly large - maybe 10,000 entries. It's not clear why cache_data queries appear to be queueing up when cache_render is the only table that's locked.
My Drupal gut says "This is a simple query, against a table with light indexing, it shouldn't be an issue and therefore must be a DB stability issue", but I'm also assured the DB is stable and it's not the problem either.
Does anyone have ideas how to continue to diagnose this issue?
Are there any quick fixes we can do to decrease the chances of it happening again?
Active
10.3 ✨
other
**MIGHT** be related to https://jira.mariadb.org/browse/MDEV-35886 , we're investigating.