- 🇬🇧United Kingdom catch
✨ Make serializer customizable for Cache\DatabaseBackend RTBC just landed so we're unblocked here.
However, I'm still stuck on how we'll deal with the issues from #2/#4. Could we maybe put this in $settings and write it out in the installer? That way, if you install on an environment with igbinary enabled, you would get that serializer, and it's up to you to then make sure that other environments your site gets migrated to also have igbinary available (which is not unlike a lot of other issues with php extensions). But then existing sites would not have things changed under their feet then.
But also, do we want to provide some kind of way to migrate?
Or instead of a settings flag, should we just bring igbinary module into core - but prevent installing it/uninstalling it on existing sites at least until a migration path is worked out?
Should we have a fallback serializer than can read PHP string serialization but only writes igbinary?
- 🇫🇷France andypost
If the module enabled then all caches should be cleaned at its install or dump will contain serialized with it.
So instead of container param it could be a call to moduleExists()
- 🇫🇷France andypost
btw when cache is stored in APCu it can use igbinary without core https://www.php.net/manual/en/apcu.configuration.php#ini.apcu.serializer
So the new question is how to deal with this option when cache in chained-fast...
- 🇨🇭Switzerland berdir Switzerland
Looking at the igbinary module, it contains a check if the returned value is igbinary: https://git.drupalcode.org/project/igbinary/-/blob/2.0.x/src/Component/S...
It won't work for other serializer implementations, but what if we introduce a new IgBinaryIfAvailableSerializer that on encode has a function exists check and on decode checks if the returned value is igbinary-encoded and and a function exists. The problem is if it's igbinary and the function doesn't exist. We might need to extend the interface with an isValid() option or something or allow for it to throw an exception.
- 🇨🇭Switzerland berdir Switzerland
I wanted to review the possible benefits of using igbinary based on real world examples, including gz compression (both the redis module offers that as an option based on cache data size and the igbinary module does too as a separate serializer but then always.
Based on an umami demo install, I picked a few example cache entries, views_data, entity types, module list and some small ones and compared serialize, serialize with compression level 1, igbinary and also that with gz compression level 1,6 and 9. Both redis and igbinary default to 1, redis has it a a configurable setting. All of that in terms of size and speed of serialize and unserialize. For speed, I did run each operation 1000x times and then reported the total in ms (microtime() * 1000), absolute numbers aren't meant to be meaningful, just as a baseline for the relative speed. doing it 1000x seemed useful to even out random stuff, reported times seem to be vary +/- 10% (views_data:en unserialize was between ~90 and ~110). note that compression numbers are always *including* the respective serialize/unserialize call.
The script I used is attached. Results probably vary quite a bit between different systems, and I directly accessed the cache entries, so it relies on having warm caches. use
select cid, length(data) as length from cache_default order by length asc;
to get list of available cache entries and their size.compact strings on: https://gist.githubusercontent.com/Berdir/e0bbdbf3922fdc9c8ae905fd80ac2d...
compact strings off: https://gist.githubusercontent.com/Berdir/c35f0007efba8c50cd896689290758...This is on DDEV PHP 8.3, igbinary 3.2.16. WSL2 on Windows 10, i9-11900K @ 3.50GHz.
Takeways:
* reduction especially on large cache entries is massive. views_data on igbinary is only 18% of serialize. The compact strings setting is doing the heavy lifting here, it's 60% with that turned off. Combined with gz1, it's only 4% of the serialize size (gz1 on serialize is 9%). views data is extremely repetitive. For most other cache entries, it's around 30% with igbinary and 10% combined with gz1
* igbinary serialize speed is almost the same, up to 10% slower on most sizes with compact string. it's up to 30% faster with compact strings off, but the size benefit seem to outweigh this easily, see also unserialize performance next.
* compression is fairly expensive
* igbinary unserialize is not quite as much faster as the issue title claims, but it seems to be a fairly stable 6x% of unserialize() time on most sizes. it's actually slower with compact strings off, probably because there's a lot more data to work through?
* combined with uncompression, igbinary unserialize is about the same as serialize(). I haven't done any comparisons with how much we save in network/communication with the cache backend, but I would assume that igbinary + uncompression having the same speed as unserialize() seems like a huge win when we only have to transfer 4-10% of the data instead from redis and can essentially store up to 10x as much in the cache (1.5x as much maybe if already using the redis compression setting considering overhead existing compression size and overhead of hashes)
* didn't think too much about absolute numbers, but the compression cost increases a lot in relative numbers on small cache entries. around 8x (3x for unserialize) for that 1300 length ckeditor cache and and 25x (5x for unserialize) for that tiny 300 length locale cache. for comparison, empty/small/redirect render caches start around 250 length. Redis currently documents a length of 100 for the compress flag. I just picked that number fairly randomly and documented that people should do their own tests. I doubt any/many did. Anyway, I think that is clearly too low. maybe 1000? or higher? maybe someone who is better at math than me could calculate the network vs cpu overhead and what value makes sense.
* as expected, higher compression levels only result in minor improvements in size, with a massive cost in serialize speed (6 is 2x as slow as 1), so doesn't make sense to go higher. unserialize on higher compression level is actually slightly faster, but not enough to justify it I think.
* compression and uncompression is faster on igbinary than serialize, probably because the input string is already way shorter. for example views_data unserialize is 87 vs 32 overhead.
* When I was about to submit, I realized that having a mostly string cache, specifically page cache could also be interesting. Not for igbinary because serialize() and igbinary are pretty much identical there but for compression. So I added that as well (umami frontpage) and did rerun the script. compression in relative numbers looks very expensive there, but that's just because serialize of a very long string is very fast. the html too gets reduced to ~18% of the size, at a cost of 35 (compress) and 17 (uncompress). That's pretty consistent in regards to size with other caches, unsurprisingly. - 🇬🇧United Kingdom catch
6x% of unserialize() time on most sizes
Should this be 60%?
- 🇨🇭Switzerland berdir Switzerland
Yes, I meant to say 60-70% with that x, edited to make that clearer.
- 🇬🇧United Kingdom catch
Thanks that makes sense.
I would assume that igbinary + uncompression having the same speed as unserialize() seems like a huge win when we only have to transfer 4-20% of the data instead from redis and can store much more in the cache
I think this is very likely to be the case for the database cache backend too.
Also, when looking at memory issues I often see quite a lot from database queries and unserialize for large cache items. it's possible that uncompression means that would get transferred elsewhere but we might get lucky and it ends up a net reduction.
Also I wonder if it's worth looking into gzip compressing tags?
- 🇨🇭Switzerland berdir Switzerland
FWIW, looking at blackfire data does show that in some cases, the cost of a gzuncompress is considerable, specifically on page cache hits:
Not sure but I would assume that blackfire doesn't add a huge overhead to a function call like that. That's 3.8ms. That's still with plain serialize, about to do compare that with igbinary and also without compression.
I've tried to do some testing with either blackfire or just plain ab on all combinations of serialize/igbinary/compression, but it's tricky, variations are too high between runs to really see a clear pattern. It's also not always that high I think.
- 🇨🇭Switzerland berdir Switzerland
As expected, variation is too high for page cache to really compare in terms of speed/IO wait, without compression the response time was a bit lower, but it also claims to have less IO wait, which doesn't make sense obviously.
Network is probably the most reliable metric change:
Network+547 kB (+419%)
131 kB → 678 kBMemory also went up a bit, but only 2%.
On a dynamic page cache hit, the gzuncompress is at 2% or so, so way less visible and comparing with and without compress gives me:
Network-1.37 MB (-86%)
1.58 MB → 215 kB
That's pretty neat.
One random but completely unrelated thing that I saw pop up is Drupal\help\HelpTopicTwigLoader, that adds all modules as extension folder and does an is_dir() check on them. that's enough to account for about 5ms in Blackfire and happens even on a dynamic page cache hit it seems. Will try to create a separate issue for that.
Last unrelated side note: If I'm seeing this correctly, then all the performance things I've been working on in redis and core (where we've included patches so far) resulted a reduction of Redis::hgetAll() calls from 54 to 27 in this project, and combined with the switch to igbinary, from 400kb network to 200kb. And I'm working on more, such as route preloading.
- 🇫🇷France fgm Paris, France
I think there is one important issue here : is the Redis under test remote from the Drupal instance, or local to it ?
In most - if not all - enterprise setups I see doing audits, the Redis/Valkey servers are always either on the DB instances or completely standalone, not on the web servers (unlike memcached). This means that the impact of bandwidth reduction is much more relevant to overall performance in those setups than it is when benchmarking on a local instance.
These tests being run on DDEV make me suspect the measurements are for a local instance, however. Maybe it would be useful to try the same on two separate AWS/GCP/Azure instances instead ? Or even a container and something like Elasticache for Redis in the same AZ, which will likely not co-locate the Redis on the same instance.
- 🇬🇧United Kingdom catch
Adding 🐛 Json and PHP serializers should throw excepiton on failure Active as related. Didn't review that issue properly yet but from the summary looks like we should already be catching and ignoring that exception in the cache backends.
- 🇨🇭Switzerland berdir Switzerland
Re #27: Yes, redis being on the same server or not can make a huge difference. FWIW, all data in comment #27 is based on tests on a regular platform.sh project. The question I'm trying to answer/provide data for an answer in #25/#26 is whether core should use compression by default or not.
We also have a dedicated Gen2 project on platform.sh that uses multiple servers and would be more interesting for the impact/advantage of compression in such a scenario, but I don't have enough insights/blackfire there to be able to do that kind of profiling.
Either way, our default configuration probably shouldn't be optimized for that scenario at the cost of a less "enterprise" setup, that said, at the other end of the scale you have "classic" webhosting that often also has separate database servers.
- 🇬🇧United Kingdom catch
For the database cache there's also the issue that cache tables can end up holding more data than the rest of the database itself. We did https://www.drupal.org/node/2891281 → but there was someone in slack actually trying to use that with what sounded like a medium-traffic site and running into lots of problems with the delete queries and similar. So if it's neutral or a very small regression with the database cache, it might be worth it anyway.
- 🇫🇷France fgm Paris, France
I think at some point Platform tried to maximize co-locating containers in a project to close instance, from what DamZ told me long ago. I wonder if that is the case in these experiments.
- 🇦🇷Argentina hanoii 🇦🇷UTC-3
I recently stumbled upon igbinary and this great thread, and I wanted to share some recent real-life improvements that really surprised me.
This is a very VERY heavy setup (in terms of entity relationships, site building, lots of Layout Builder and custom stuff) with significant technical debt, so Drupal is doing a lot. For example: 8k block plugins, ~80 MB of RAM just for PHP deserializing this. (The site has a moderate amount of content — not tiny, but not massive either. It used to run as a multisite with 12 sites on a Platform.sh large plan, and the sites frequently stalled and timed out. I recently moved each site to its own medium plan on Platform.sh — 1 GB RAM Redis/Valkey, 1.25 GB RAM MySQL, 256 MB RAM app container with only 2 FPM processes.)
On Redis I only keep the entity and render cache bins plus the smaller ones. I had to move page_cache and dynamic_page_cache to the DB because Redis was evicting too many keys (before igbinary; I might revisit this after things stabilize). I’m also being very lax about when I clear the page rendering–related bins (they don’t normally get cleared on every deploy).
The figures below are from php.access.log on Platform.sh, which logs response time, RAM, and CPU. I collected these over a few days — I specifically wanted to see how it behaved on a non-weekend day. This sample is from one of the sites with the most traffic. Even after a fully cleared cache (everything, including an empty Redis service) it performed A LOT better.
I honestly wasn’t expecting this improvement. Response times improved — less dramatically, since most of the traffic is anonymous and once pages are cached they already respond fast enough (<100 ms). But there’s still a clear shift of the distribution towards the lower end. RAM, on the other hand, shows an even greater improvement. Before igbinary I was seeing cached responses take over 150 MB of RAM. I’m not exactly sure why, but I think it was mostly due to large deserializations. CPU is less conclusive — I don’t think the peak info from the logs is a very meaningful metric. CPU will vary a lot depending on many other factors, especially in a containerized app like this.
I’m open to any suggestions, questions, or follow-up!
============================================ TABLE 1: Column 6 Analysis (Memory Used) ============================================ Range 2025-08-13 2025-08-14 2025-08-15 2025-08-16 2025-08-17 2025-08-18 -------------------------------------------------------------------------------------------- 1-10000kb 3645 3832 3668 7047 11021 12146 10001-20000kb 1744 1432 1608 2753 3369 3468 20001-30000kb 852 729 795 1711 1884 2110 30001-40000kb 1088 1006 1050 694 493 1341 40001-50000kb 433 394 457 1572 1669 1310 50001-60000kb 719 711 768 508 321 594 60001-70000kb 693 633 685 622 458 1928 70001-80000kb 434 340 416 3569 2416 2627 80001-90000kb 254 260 285 1140 525 336 90001-100000kb 660 424 562 63 14 27 100001-125000kb 2537 2779 2974 37 4 10 125001-150000kb 654 630 590 7 0 1 150001-175000kb 317 255 287 1 0 0 175001-200000kb 159 119 159 3 1 2 200001-225000kb 727 476 528 4 0 0 225001-250000kb 1129 972 1156 0 0 0 250001-275000kb 1 0 0 0 0 0 275001-300000kb 0 1 0 0 0 0 -------------------------------------------------------------------------------------------- TOTAL 16046 14993 15988 19731 22175 25900 ============================================ TABLE 2: Column 4 Analysis (Time) ============================================ Range 2025-08-13 2025-08-14 2025-08-15 2025-08-16 2025-08-17 2025-08-18 -------------------------------------------------------------------------------------------- 1-500ms 11310 10546 11097 15079 19190 22563 501-1000ms 550 513 571 1545 1059 2371 1001-1500ms 1244 1779 2324 2781 1827 797 1501-2000ms 523 276 225 130 42 56 2001-2500ms 227 171 55 53 14 27 2501-3000ms 203 129 39 44 10 28 3001-3500ms 118 103 63 25 5 14 3501-4000ms 247 249 434 20 5 9 4001-4500ms 460 663 932 10 4 4 4501-5000ms 347 144 133 8 3 2 5001-6000ms 336 101 48 8 8 14 6001-7000ms 149 91 13 2 3 12 7001-8000ms 107 74 10 4 2 1 8001-9000ms 112 68 10 1 0 1 9001-10000ms 58 28 6 1 0 0 10001-15000ms 47 36 19 7 2 1 15001-20000ms 3 4 5 4 1 0 20001-25000ms 2 1 2 2 0 0 25001-30000ms 0 2 0 2 0 0 30001-35000ms 1 0 1 1 0 0 35001-40000ms 0 0 1 0 0 0 40001-45000ms 1 1 0 1 0 0 45001-50000ms 1 1 0 1 0 0 50001-55000ms 0 1 0 0 0 0 70001-75000ms 0 1 0 0 0 0 75001-80000ms 0 5 0 0 0 0 80001-85000ms 0 5 0 0 0 0 85001-90000ms 0 1 0 2 0 0 -------------------------------------------------------------------------------------------- TOTAL 16046 14993 15988 19731 22175 25900 ============================================ TABLE 3: Column 8 Analysis (CPU%) ============================================ Range 2025-08-13 2025-08-14 2025-08-15 2025-08-16 2025-08-17 2025-08-18 -------------------------------------------------------------------------------------------- 0-25% 1774 1458 1607 1689 2795 2936 25-50% 9180 8169 8590 11137 11742 14177 50-75% 4712 4911 5266 5386 7474 8543 75-100% 380 455 525 1519 164 244 -------------------------------------------------------------------------------------------- TOTAL 16046 14993 15988 19731 22175 25900