Active trail gets corrupted in cache_menu for menus with numeric machine names

Created on 4 November 2023, 8 months ago
Updated 16 February 2024, 4 months ago

Problem/Motivation

We have a site where some menus have numeric machine names (i.e. menu title is 'Transport', but its machine name is '583'). I don't know the reason for this (probably they were migrated or created via code to match some legacy system), but this can also be created easily via menu UI.

We noticed occasional problems with these menus, causing them to disappear from the pages where they would normally appear on. After much debugging, I found out that the problem comes from corrupted entries in cache_menu (cid begins with active-trail:....), which affect menus with number-like machine names.

This issue has happened for us on multiple Drupal 9.x versions. I was able to replicate and debug it in Drupal 9.5.11, and then replicate it on 10.1.6 also.

In the following steps to reproduce this bug, I used two menus, which I called 2 and 4. Note that the bug is independent of these values.

Steps to reproduce

1. install a fresh Drupal site (standard profile is enough)
2. create a basic page (e.g. /node/1)
3. create two menus with numeric machine name (call them 2 and 4 and any title you want) and one link in each:

  2
    link 2.1 (pointing to any url)
  4
    link 4.1 (pointing to any url)

4. place two blocks (e.g. in content area) to display these two menus on /node/1 (i.e. Restrict to certain pages set to /node/1)
5. go to /node/1 and confirm the menus are showing
6. check the cache_menu table and look at the active-trail entry for that page:

drush sql-query "select data from cache_menu where cid='active-trail:route:entity.node.canonical:route_parameters:a:1:{s:4:\"node\";s:1:\"1\";}'"

You should see something like this:

a:4:{s:4:"main";a:2:{s:54:"menu_link_content:53f35910-253f-4c3f-9089-abd5884416a3";s:54:"menu_link_content:53f35910-253f-4c3f-9089-abd5884416a3";s:0:"";s:0:"";}s:7:"account";a:1:{s:0:"";s:0:"";}i:2;a:1:{s:0:"";s:0:"";}i:4;a:1:{s:0:"";s:0:"";}}

Note that the last two entries in this serialized data are i:2 and i:4 (i.e. the machine names of the menus, but converted to integer values).
7. rebuild the cache using drush cr or at least clear these bins drush cc bin menu render page dynamic_page_cache
8. send multiple simultaneous requests for /node/1 page, either via browser using F5 multiple times very fast, or (better) via commands:

wget -qO /dev/null https://drupal.sandbox.local/node/1 &
wget -qO /dev/null https://drupal.sandbox.local/node/1 &
wget -qO /dev/null https://drupal.sandbox.local/node/1 &
wget -qO /dev/null https://drupal.sandbox.local/node/1 &
wget -qO /dev/null https://drupal.sandbox.local/node/1 &
wget -qO /dev/null https://drupal.sandbox.local/node/1 &

9. check again the cache_menu table:

drush sql-query "select data from cache_menu where cid='active-trail:route:entity.node.canonical:route_parameters:a:1:{s:4:\"node\";s:1:\"1\";}'"

This time you will see something like this:

a:6:{s:4:"main";a:1:{s:0:"";s:0:"";}s:7:"account";a:1:{s:0:"";s:0:"";}i:0;a:1:{s:0:"";s:0:"";}i:1;a:1:{s:0:"";s:0:"";}i:2;a:1:{s:0:"";s:0:"";}i:3;a:1:{s:0:"";s:0:"";}}

Note that i:2 and i:4 have been renumbered (by array_merge here https://github.com/drupal/core/blob/11.x/lib/Drupal/Core/Cache/CacheColl...) into i:0 and i:1 and also duplicated as i:2 and i:3. This is according to https://www.php.net/manual/en/function.array-merge.php ("Values in the input arrays with numeric keys will be renumbered with incrementing keys starting from zero in the result array."). Basically, two copies of the active trail (one created by current request and another one cache microseconds before by another request), got merged together: the text-like menus were kept (a single entry for each), but the numeric-like menus were renumbered and duplicated.

If you don't see this, try again to clear the cache and run the wget commands (maybe add some more to simulate a busier site).
Depending on the site load (i.e. number of simultaneous requests that don't find the active-trail cache entry) and the number of menus with numeric machine names, you might see tens or hundreds of such entries.

Moreover, when entries in the dynamic pages cache expire, this problem will be increased even more. Each request that runs into this use case, will copy again the numeric entries. The steps to replicate this are (run them multiple times):

1. Use the steps from Phase 1. to corrupt the cache.
2. Clear page caches (don't clear entire cache as that will cancel previous step):

drush cc bin dynamic_page_cache page

3. load the /node/1 page (just one page request is enough, no need for parallel ones):

wget -qO /dev/null https://drupal.sandbox.local/node/1

4. view the cached entry

drush sql-query "select data from cache_menu where cid='active-trail:route:entity.node.canonical:route_parameters:a:1:{s:4:\"node\";s:1:\"1\";}'"

You will see this:

a:7:{s:4:"main";a:1:{s:0:"";s:0:"";}s:7:"account";a:1:{s:0:"";s:0:"";}i:0;a:1:{s:0:"";s:0:"";}i:1;a:1:{s:0:"";s:0:"";}i:2;a:1:{s:0:"";s:0:"";}i:3;a:1:{s:0:"";s:0:"";}i:4;a:1:{s:0:"";s:0:"";}}

5. Repeat previous three steps and you will see this:

a:8:{s:4:"main";a:1:{s:0:"";s:0:"";}s:7:"account";a:1:{s:0:"";s:0:"";}i:0;a:1:{s:0:"";s:0:"";}i:1;a:1:{s:0:"";s:0:"";}i:2;a:1:{s:0:"";s:0:"";}i:3;a:1:{s:0:"";s:0:"";}i:4;a:1:{s:0:"";s:0:"";}i:5;a:1:{s:0:"";s:0:"";}}

6. Repeat previous step and will see the cached entry having more and more numeric entries.

Eventually, if the site runs long enough with corrupted cache, the bad entries start to overwrite the correct entries coming from numeric menus, causing various problems with them (e.g. menus to disappear from site pages). This can be replicated like this:

1. edit the basic content type to allow adding these nodes to menu 2 and 4
2. edit node/1 and add a menu entry for it in menu 4
3. edit menu 4 to look like this (link 4.1 is sublink of node 1 link)

  4
    node 1 (added in step 2 before)
      link 4.1 (pointing to any url)

4. edit the block that displays menu 4 and set Initial visibility level to 2
5. clear the cache (drush cr)
6. visit /node/1 and check that link 4.1 (from menu 4) shows on that page
7. Look at the cached entry:

drush sql-query "select data from cache_menu where cid='active-trail:route:entity.node.canonical:route_parameters:a:1:{s:4:\"node\";s:1:\"1\";}'"

You will see something like this (with different UUIDs):

a:4:{s:4:"main";a:1:{s:0:"";s:0:"";}s:7:"account";a:1:{s:0:"";s:0:"";}i:2;a:1:{s:0:"";s:0:"";}i:4;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}}

Notice the value for i:4 (menu 4) is
i:4;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}
because the .
8. corrupt the cache (using the steps 7-8 from Phase 1 above, but use more wget commands. I used 30)
9. visit /node/1 and check that link 4.1 disappeared from the page (you might need to run previous step multiple times to make it happen, or just use more wgets).
10. Look at the cached entry:

drush sql-query "select data from cache_menu where cid='active-trail:route:entity.node.canonical:route_parameters:a:1:{s:4:\"node\";s:1:\"1\";}'"

You will see something like this:

a:12:{s:4:"main";a:1:{s:0:"";s:0:"";}s:7:"account";a:1:{s:0:"";s:0:"";}i:0;a:1:{s:0:"";s:0:"";}i:1;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}i:2;a:1:{s:0:"";s:0:"";}i:3;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}i:4;a:1:{s:0:"";s:0:"";}i:5;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}i:6;a:1:{s:0:"";s:0:"";}i:7;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}i:8;a:1:{s:0:"";s:0:"";}i:9;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}}

Note that the same values were copied over and over again. Also notice that the entry for i:4 has changed to
i:4;a:1:{s:0:"";s:0:"";} (i.e. an empty array), which means the (which causes the menu block to be hidden instead of visible).
11. clear the cache (drush cr or drush cc bin menu dynamic_page_cache page)
12. visit /node/1 and check that link 4.1 (from menu 4) shows again on that page

Proposed resolution

Apply provided patch (which changes MenuActiveTrail.php to use set method instead of changing directly the storage property).

Remaining tasks

Test this MR https://git.drupalcode.org/project/drupal/-/merge_requests/6121 against 11.x

User interface changes

None

API changes

None

Data model changes

None

Release notes snippet

Fix cache_menu bug affecting menus with numeric machine names

๐Ÿ› Bug report
Status

Needs work

Version

11.0 ๐Ÿ”ฅ

Component
Menu systemย  โ†’

Last updated 5 days ago

Created by

๐Ÿ‡ท๐Ÿ‡ดRomania abautu

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.69.0 2024