Make igbinary the default serializer if available, it saves 50% time on unserialize and memory footprint

Created on 17 November 2018, almost 7 years ago
Updated 5 February 2024, over 1 year ago

Problem/Motivation

Core's 8 philosophy has always been to improve performance automatically _WITHOUT_ configuration.

Making use of igbinary is giving us performance for free (similar to how APCu improves performance - if detected - transparently).

Proposed resolution

  1. - Add igbinary serialization component (can copy from igbinary module)
  2. - Use a factory for serialization.phpserialize service
  3. - If igbinary is present use it, else use the standard component

For concerns that igbinary could be switched on/off on the fly and hence lead to invalid data Plan B is:

  1. - Detect igbinary during container build
  2. - Add DefaultPhpSerialize class, which extends from Drupal\Component\Serialization\PhpSerialize
  3. - Replace serialization.phpserialize service with other class (only if it's the default class, so it can still be overridden)

Remaining tasks

- Create patch
- Review
- Commit

User interface changes

- None

API changes

- None

Data model changes

- None

📌 Task
Status

Active

Version

11.0 🔥

Component
Cache 

Last updated 5 days ago

Created by

🇩🇪Germany Fabianx

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇬🇧United Kingdom catch

    Make serializer customizable for Cache\DatabaseBackend RTBC just landed so we're unblocked here.

    However, I'm still stuck on how we'll deal with the issues from #2/#4. Could we maybe put this in $settings and write it out in the installer? That way, if you install on an environment with igbinary enabled, you would get that serializer, and it's up to you to then make sure that other environments your site gets migrated to also have igbinary available (which is not unlike a lot of other issues with php extensions). But then existing sites would not have things changed under their feet then.

    But also, do we want to provide some kind of way to migrate?

    Or instead of a settings flag, should we just bring igbinary module into core - but prevent installing it/uninstalling it on existing sites at least until a migration path is worked out?

    Should we have a fallback serializer than can read PHP string serialization but only writes igbinary?

  • 🇫🇷France andypost

    If the module enabled then all caches should be cleaned at its install or dump will contain serialized with it.

    So instead of container param it could be a call to moduleExists()

  • 🇫🇷France andypost

    btw when cache is stored in APCu it can use igbinary without core https://www.php.net/manual/en/apcu.configuration.php#ini.apcu.serializer

    So the new question is how to deal with this option when cache in chained-fast...

  • 🇨🇭Switzerland berdir Switzerland

    Looking at the igbinary module, it contains a check if the returned value is igbinary: https://git.drupalcode.org/project/igbinary/-/blob/2.0.x/src/Component/S...

    It won't work for other serializer implementations, but what if we introduce a new IgBinaryIfAvailableSerializer that on encode has a function exists check and on decode checks if the returned value is igbinary-encoded and and a function exists. The problem is if it's igbinary and the function doesn't exist. We might need to extend the interface with an isValid() option or something or allow for it to throw an exception.

  • 🇨🇭Switzerland berdir Switzerland

    I wanted to review the possible benefits of using igbinary based on real world examples, including gz compression (both the redis module offers that as an option based on cache data size and the igbinary module does too as a separate serializer but then always.

    Based on an umami demo install, I picked a few example cache entries, views_data, entity types, module list and some small ones and compared serialize, serialize with compression level 1, igbinary and also that with gz compression level 1,6 and 9. Both redis and igbinary default to 1, redis has it a a configurable setting. All of that in terms of size and speed of serialize and unserialize. For speed, I did run each operation 1000x times and then reported the total in ms (microtime() * 1000), absolute numbers aren't meant to be meaningful, just as a baseline for the relative speed. doing it 1000x seemed useful to even out random stuff, reported times seem to be vary +/- 10% (views_data:en unserialize was between ~90 and ~110). note that compression numbers are always *including* the respective serialize/unserialize call.

    The script I used is attached. Results probably vary quite a bit between different systems, and I directly accessed the cache entries, so it relies on having warm caches. use select cid, length(data) as length from cache_default order by length asc; to get list of available cache entries and their size.

    compact strings on: https://gist.githubusercontent.com/Berdir/e0bbdbf3922fdc9c8ae905fd80ac2d...
    compact strings off: https://gist.githubusercontent.com/Berdir/c35f0007efba8c50cd896689290758...

    This is on DDEV PHP 8.3, igbinary 3.2.16. WSL2 on Windows 10, i9-11900K @ 3.50GHz.

    Takeways:
    * reduction especially on large cache entries is massive. views_data on igbinary is only 18% of serialize. The compact strings setting is doing the heavy lifting here, it's 60% with that turned off. Combined with gz1, it's only 4% of the serialize size (gz1 on serialize is 9%). views data is extremely repetitive. For most other cache entries, it's around 30% with igbinary and 10% combined with gz1
    * igbinary serialize speed is almost the same, up to 10% slower on most sizes with compact string. it's up to 30% faster with compact strings off, but the size benefit seem to outweigh this easily, see also unserialize performance next.
    * compression is fairly expensive
    * igbinary unserialize is not quite as much faster as the issue title claims, but it seems to be a fairly stable 6x% of unserialize() time on most sizes. it's actually slower with compact strings off, probably because there's a lot more data to work through?
    * combined with uncompression, igbinary unserialize is about the same as serialize(). I haven't done any comparisons with how much we save in network/communication with the cache backend, but I would assume that igbinary + uncompression having the same speed as unserialize() seems like a huge win when we only have to transfer 4-10% of the data instead from redis and can essentially store up to 10x as much in the cache (1.5x as much maybe if already using the redis compression setting considering overhead existing compression size and overhead of hashes)
    * didn't think too much about absolute numbers, but the compression cost increases a lot in relative numbers on small cache entries. around 8x (3x for unserialize) for that 1300 length ckeditor cache and and 25x (5x for unserialize) for that tiny 300 length locale cache. for comparison, empty/small/redirect render caches start around 250 length. Redis currently documents a length of 100 for the compress flag. I just picked that number fairly randomly and documented that people should do their own tests. I doubt any/many did. Anyway, I think that is clearly too low. maybe 1000? or higher? maybe someone who is better at math than me could calculate the network vs cpu overhead and what value makes sense.
    * as expected, higher compression levels only result in minor improvements in size, with a massive cost in serialize speed (6 is 2x as slow as 1), so doesn't make sense to go higher. unserialize on higher compression level is actually slightly faster, but not enough to justify it I think.
    * compression and uncompression is faster on igbinary than serialize, probably because the input string is already way shorter. for example views_data unserialize is 87 vs 32 overhead.
    * When I was about to submit, I realized that having a mostly string cache, specifically page cache could also be interesting. Not for igbinary because serialize() and igbinary are pretty much identical there but for compression. So I added that as well (umami frontpage) and did rerun the script. compression in relative numbers looks very expensive there, but that's just because serialize of a very long string is very fast. the html too gets reduced to ~18% of the size, at a cost of 35 (compress) and 17 (uncompress). That's pretty consistent in regards to size with other caches, unsurprisingly.

  • 🇬🇧United Kingdom catch

    6x% of unserialize() time on most sizes

    Should this be 60%?

  • 🇨🇭Switzerland berdir Switzerland

    Yes, I meant to say 60-70% with that x, edited to make that clearer.

  • 🇬🇧United Kingdom catch

    Thanks that makes sense.

    I would assume that igbinary + uncompression having the same speed as unserialize() seems like a huge win when we only have to transfer 4-20% of the data instead from redis and can store much more in the cache

    I think this is very likely to be the case for the database cache backend too.

    Also, when looking at memory issues I often see quite a lot from database queries and unserialize for large cache items. it's possible that uncompression means that would get transferred elsewhere but we might get lucky and it ends up a net reduction.

    Also I wonder if it's worth looking into gzip compressing tags?

  • 🇨🇭Switzerland berdir Switzerland

    FWIW, looking at blackfire data does show that in some cases, the cost of a gzuncompress is considerable, specifically on page cache hits:

    Not sure but I would assume that blackfire doesn't add a huge overhead to a function call like that. That's 3.8ms. That's still with plain serialize, about to do compare that with igbinary and also without compression.

    I've tried to do some testing with either blackfire or just plain ab on all combinations of serialize/igbinary/compression, but it's tricky, variations are too high between runs to really see a clear pattern. It's also not always that high I think.

  • 🇨🇭Switzerland berdir Switzerland

    As expected, variation is too high for page cache to really compare in terms of speed/IO wait, without compression the response time was a bit lower, but it also claims to have less IO wait, which doesn't make sense obviously.

    Network is probably the most reliable metric change:
    Network+547 kB (+419%)
    131 kB → 678 kB

    Memory also went up a bit, but only 2%.

    On a dynamic page cache hit, the gzuncompress is at 2% or so, so way less visible and comparing with and without compress gives me:

    Network-1.37 MB (-86%)

    1.58 MB → 215 kB

    That's pretty neat.

    One random but completely unrelated thing that I saw pop up is Drupal\help\HelpTopicTwigLoader, that adds all modules as extension folder and does an is_dir() check on them. that's enough to account for about 5ms in Blackfire and happens even on a dynamic page cache hit it seems. Will try to create a separate issue for that.

    Last unrelated side note: If I'm seeing this correctly, then all the performance things I've been working on in redis and core (where we've included patches so far) resulted a reduction of Redis::hgetAll() calls from 54 to 27 in this project, and combined with the switch to igbinary, from 400kb network to 200kb. And I'm working on more, such as route preloading.

  • 🇫🇷France fgm Paris, France

    I think there is one important issue here : is the Redis under test remote from the Drupal instance, or local to it ?

    In most - if not all - enterprise setups I see doing audits, the Redis/Valkey servers are always either on the DB instances or completely standalone, not on the web servers (unlike memcached). This means that the impact of bandwidth reduction is much more relevant to overall performance in those setups than it is when benchmarking on a local instance.

    These tests being run on DDEV make me suspect the measurements are for a local instance, however. Maybe it would be useful to try the same on two separate AWS/GCP/Azure instances instead ? Or even a container and something like Elasticache for Redis in the same AZ, which will likely not co-locate the Redis on the same instance.

  • 🇬🇧United Kingdom catch

    Adding 🐛 Json and PHP serializers should throw excepiton on failure Active as related. Didn't review that issue properly yet but from the summary looks like we should already be catching and ignoring that exception in the cache backends.

  • 🇨🇭Switzerland berdir Switzerland

    Re #27: Yes, redis being on the same server or not can make a huge difference. FWIW, all data in comment #27 is based on tests on a regular platform.sh project. The question I'm trying to answer/provide data for an answer in #25/#26 is whether core should use compression by default or not.

    We also have a dedicated Gen2 project on platform.sh that uses multiple servers and would be more interesting for the impact/advantage of compression in such a scenario, but I don't have enough insights/blackfire there to be able to do that kind of profiling.

    Either way, our default configuration probably shouldn't be optimized for that scenario at the cost of a less "enterprise" setup, that said, at the other end of the scale you have "classic" webhosting that often also has separate database servers.

  • 🇬🇧United Kingdom catch

    For the database cache there's also the issue that cache tables can end up holding more data than the rest of the database itself. We did https://www.drupal.org/node/2891281 but there was someone in slack actually trying to use that with what sounded like a medium-traffic site and running into lots of problems with the delete queries and similar. So if it's neutral or a very small regression with the database cache, it might be worth it anyway.

  • 🇫🇷France fgm Paris, France

    I think at some point Platform tried to maximize co-locating containers in a project to close instance, from what DamZ told me long ago. I wonder if that is the case in these experiments.

  • 🇦🇷Argentina hanoii 🇦🇷UTC-3

    I recently stumbled upon igbinary and this great thread, and I wanted to share some recent real-life improvements that really surprised me.

    This is a very VERY heavy setup (in terms of entity relationships, site building, lots of Layout Builder and custom stuff) with significant technical debt, so Drupal is doing a lot. For example: 8k block plugins, ~80 MB of RAM just for PHP deserializing this. (The site has a moderate amount of content — not tiny, but not massive either. It used to run as a multisite with 12 sites on a Platform.sh large plan, and the sites frequently stalled and timed out. I recently moved each site to its own medium plan on Platform.sh — 1 GB RAM Redis/Valkey, 1.25 GB RAM MySQL, 256 MB RAM app container with only 2 FPM processes.)

    On Redis I only keep the entity and render cache bins plus the smaller ones. I had to move page_cache and dynamic_page_cache to the DB because Redis was evicting too many keys (before igbinary; I might revisit this after things stabilize). I’m also being very lax about when I clear the page rendering–related bins (they don’t normally get cleared on every deploy).

    The figures below are from php.access.log on Platform.sh, which logs response time, RAM, and CPU. I collected these over a few days — I specifically wanted to see how it behaved on a non-weekend day. This sample is from one of the sites with the most traffic. Even after a fully cleared cache (everything, including an empty Redis service) it performed A LOT better.

    I honestly wasn’t expecting this improvement. Response times improved — less dramatically, since most of the traffic is anonymous and once pages are cached they already respond fast enough (<100 ms). But there’s still a clear shift of the distribution towards the lower end. RAM, on the other hand, shows an even greater improvement. Before igbinary I was seeing cached responses take over 150 MB of RAM. I’m not exactly sure why, but I think it was mostly due to large deserializations. CPU is less conclusive — I don’t think the peak info from the logs is a very meaningful metric. CPU will vary a lot depending on many other factors, especially in a containerized app like this.

    I’m open to any suggestions, questions, or follow-up!

    ============================================
    TABLE 1: Column 6 Analysis (Memory Used)
    ============================================
    Range               2025-08-13  2025-08-14  2025-08-15  2025-08-16  2025-08-17  2025-08-18  
    --------------------------------------------------------------------------------------------
    1-10000kb           3645        3832        3668        7047        11021       12146       
    10001-20000kb       1744        1432        1608        2753        3369        3468        
    20001-30000kb       852         729         795         1711        1884        2110        
    30001-40000kb       1088        1006        1050        694         493         1341        
    40001-50000kb       433         394         457         1572        1669        1310        
    50001-60000kb       719         711         768         508         321         594         
    60001-70000kb       693         633         685         622         458         1928        
    70001-80000kb       434         340         416         3569        2416        2627        
    80001-90000kb       254         260         285         1140        525         336         
    90001-100000kb      660         424         562         63          14          27          
    100001-125000kb     2537        2779        2974        37          4           10          
    125001-150000kb     654         630         590         7           0           1           
    150001-175000kb     317         255         287         1           0           0           
    175001-200000kb     159         119         159         3           1           2           
    200001-225000kb     727         476         528         4           0           0           
    225001-250000kb     1129        972         1156        0           0           0           
    250001-275000kb     1           0           0           0           0           0           
    275001-300000kb     0           1           0           0           0           0           
    --------------------------------------------------------------------------------------------
    TOTAL               16046       14993       15988       19731       22175       25900       
    
    ============================================
    TABLE 2: Column 4 Analysis (Time)
    ============================================
    Range               2025-08-13  2025-08-14  2025-08-15  2025-08-16  2025-08-17  2025-08-18  
    --------------------------------------------------------------------------------------------
    1-500ms             11310       10546       11097       15079       19190       22563       
    501-1000ms          550         513         571         1545        1059        2371        
    1001-1500ms         1244        1779        2324        2781        1827        797         
    1501-2000ms         523         276         225         130         42          56          
    2001-2500ms         227         171         55          53          14          27          
    2501-3000ms         203         129         39          44          10          28          
    3001-3500ms         118         103         63          25          5           14          
    3501-4000ms         247         249         434         20          5           9           
    4001-4500ms         460         663         932         10          4           4           
    4501-5000ms         347         144         133         8           3           2           
    5001-6000ms         336         101         48          8           8           14          
    6001-7000ms         149         91          13          2           3           12          
    7001-8000ms         107         74          10          4           2           1           
    8001-9000ms         112         68          10          1           0           1           
    9001-10000ms        58          28          6           1           0           0           
    10001-15000ms       47          36          19          7           2           1           
    15001-20000ms       3           4           5           4           1           0           
    20001-25000ms       2           1           2           2           0           0           
    25001-30000ms       0           2           0           2           0           0           
    30001-35000ms       1           0           1           1           0           0           
    35001-40000ms       0           0           1           0           0           0           
    40001-45000ms       1           1           0           1           0           0           
    45001-50000ms       1           1           0           1           0           0           
    50001-55000ms       0           1           0           0           0           0           
    70001-75000ms       0           1           0           0           0           0           
    75001-80000ms       0           5           0           0           0           0           
    80001-85000ms       0           5           0           0           0           0           
    85001-90000ms       0           1           0           2           0           0           
    --------------------------------------------------------------------------------------------
    TOTAL               16046       14993       15988       19731       22175       25900       
    
    ============================================
    TABLE 3: Column 8 Analysis (CPU%)
    ============================================
    Range               2025-08-13  2025-08-14  2025-08-15  2025-08-16  2025-08-17  2025-08-18  
    --------------------------------------------------------------------------------------------
    0-25%               1774        1458        1607        1689        2795        2936        
    25-50%              9180        8169        8590        11137       11742       14177       
    50-75%              4712        4911        5266        5386        7474        8543        
    75-100%             380         455         525         1519        164         244         
    --------------------------------------------------------------------------------------------
    TOTAL               16046       14993       15988       19731       22175       25900     
Production build 0.71.5 2024