Managed db performance documentation

Created on 27 October 2023, 11 months ago
Updated 9 August 2024, about 2 months ago

Problem/Motivation

Azure's managed db with MariaDB/MySQL is being reported as 3x slower than a non managed container db.

I've come up with a document to challenge this issue.

Steps to reproduce

Compare Drupal performance on Azure managed db with Drupal performance on a normal container db.

In a fairly simple comparison we've noticed the Azure managed db to be 3x slower performance on average.

Proposed resolution

Let's find a winning recipe to maximize performance.

Remaining tasks

Review and discuss.

Contents

AMAZON AWS – 1

Microsoft AZURE 1

Choosing the right version of MySQL/MariaDB 1

Caching 2

DB Monitoring and Profiling: 3

AZURE conclusion to challenge 3

Drupal Application Performance 3

Upgrade Drupal to 10.1.5, skip 10.0.x 3

Latest new core includes some performance enhancements 3

Table Indexes 3

Table indexes could be added in some tables that may benefit. This can significantly speed up query performance. 3

Application Performance Patches 3

Performance profiling to identify pain points 4

Node revision delete 4

AMAZON AWS –

MySQL Performance considerations within an AWS managed db (RDS)

  • To get more iops we had to go to a better class of storage. This vastly increased cost but also performance.

Microsoft AZURE

MySQL Performance considerations within an Azure managed db system.

Choosing the right version of MySQL/MariaDB
  • ⁠**Upgrade MySQL/MariaDB Versions**: Ensure that you are running the latest version of MySQL/MariaDB supported by Azure for potential performance improvements - (upgrade from 10.3 MariaDB to a newer release)

Increase IOPS (Input / Output Per Second)

  • **Choose the Right Service Tier**: Azure offers different service tiers with varying levels of performance. Consider upgrading to a higher tier if you are experiencing performance issues. Higher tiers offer more resources like CPU and memory.  This is often the simplest way to improve performance. You can upgrade to a higher service tier with more CPU, memory, and better storage performance.

  • **Faster and costlier** selecting a higher service tier can increase IOPS, it may also come with a higher cost.

  • **Read Replicas**: For read-heavy workloads, you can consider using read replicas to offload read traffic from the primary database, which can help improve performance.

  • **Monitoring and Tuning**: Continuously monitor your database's performance and make adjustments as needed based on actual usage patterns.

Caching

innodb_buffer_pool_size:

The `innodb_buffer_pool_size` determines the size of the InnoDB buffer pool, which is a critical component of MySQL's InnoDB storage engine.

  • Increasing `innodb_buffer_pool_size` allocates more memory for caching data and index pages. This means that more of your frequently accessed data can be stored in memory, reducing the need to read from disk. This can lead to significantly faster SELECT queries, as data retrieval from memory is much faster than disk I/O.

  • Reduced Disk I/O - A larger buffer pool reduces the frequency of disk reads

  • Improved Concurrency - When data is readily available in memory, there is less contention for disk resources.

some considerations and potential drawbacks

  • ⁠**Memory Usage**: A larger buffer pool consumes more memory.

  • ⁠**Diminishing Returns**: Increasing the buffer pool size beyond a certain point may yield diminishing returns.  The optimal size depends on the workload and the available system resources.

  • ⁠**InnoDB Locking**: A large buffer pool doesn't directly affect write operations, which are influenced more by factors like the InnoDB log and transaction configuration. So, for write-heavy workloads, optimizing other parameters is necessary.

  • ⁠**Configuration Impact**: Modifying the `innodb_buffer_pool_size` requires a server restart to take effect. It's essential to carefully plan and test any changes

To determine the optimal `innodb_buffer_pool_size` for your specific workload, you should monitor your database's performance, analyze its behavior, and consider factors like the available system memory, the size of your dataset, and the nature of your queries. It's a good practice to start with a conservative setting, monitor performance, and gradually increase it as needed, all while keeping an eye on system memory usage. Consulting with a database administrator or MySQL performance expert can also be helpful in making these decisions.

DB Monitoring and Profiling:

  • Use database monitoring tools to keep track of query performance and system resource utilization. Regularly review performance metrics and adjust configurations accordingly.

AZURE conclusion to challenge
  • To get the best cost , performance, do not use Azure managed db

  • Consider switching to Amazon AWS or continue without managed db.

  • Azure managed db currently 3x slower than container based db

Drupal Application Performance Upgrade Drupal to 10.1.5, skip 10.0.x
  • Latest new core includes some performance enhancements

Table Indexes
  • Table indexes could be added in some tables that may benefit. This can significantly speed up query performance.

Application Performance Patches
  • Review core and contrib module performance tags on drupal.org for performance patches and related information. Example:

Performance profiling to identify pain points
  • Use XHProf for Application performance and logic profiling to target our efforts

Node revision delete
  • Clean up and optimize existing content.

  • This will improve db dump times, possibly other performance benefits



 

User interface changes

N/A

API changes

TBD

Data model changes

TBD

📌 Task
Status

Active

Version

5.0

Component

Documentation

Created by

🇨🇦Canada joseph.olstad

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

Sign in to follow issues

Comments & Activities

  • Issue created by @joseph.olstad
  • 🇨🇦Canada sylus

    Just a bit of background that I have more information in this regard in relation to Azure. :)

    The problem is with the difference btw Single Server and Flexible Server.

    Single Server

    Single Server is on a retirement path w/Azure and all future installs are asked to use Flexible Server for both MySQL and PostgreSQL.

    This is where most of the performance issues arise due to the fact that Single Server is on a Shared pool of backend windows node behind a gateway. Performance can be slightly improved by using redirection which reduces network latency between client applications and MySQL servers by allowing applications to connect directly to backend server nodes. Please note no functionality like this was added for the PostgreSQL Single Server.

    https://learn.microsoft.com/en-us/azure/mysql/single-server/how-to-redir...

    Flexible Server

    With Flexible Server you get pretty much a managed virtual machine of which you get complete control over the IOPS and it is actually running under Linux rather then Windows. With this setup I have found performance to be acceptable. I would say with all the proper configuration you can probably in most cases get similar to 75-80% of the performance you would get with local containerized db that would obviously have less latency then a remote db even if in the same availability zone.

    https://github.com/canada-ca-terraform-modules/terraform-azurerm-flex-mysql

  • 🇨🇦Canada joseph.olstad

    @sylus, thanks so much for the explanation, this is a huge leap forward for us!

    I know at least one skilled devops resource who hit a wall on this for quite some time, this is key information that will bring redemption!

    This information was not easy to find. It's good to know that Azure has fixed the problem with the Flexible server option.

  • 🇨🇦Canada joseph.olstad

    I host my own server that absolutely blows away anything AWS and Azure has to offer.

    I run two RAID arrays.

    The first RAID array is for the the system drive is the fastest currently available nand devices with an M2.1 interface on the motherboard.

    The second RAID array is mounted to /data and is a conventional array. I knock off one of the main system devices out of the first array, so that it's unmounted and make an image and store the image on the /data partition.

    After making an image , re-add the device back to the array. This way easy to make a fresh clone to secondary warm spare live without down time.

    From there, a weekly or daily rsync.

    For high availability situations, two warm spares where the second warm spare is a couple weeks cold.

    I have yet to see a proper high performance Azure server or AWS server.

    Very disappointing performance from both Azure and Amazon AWS and extremely high cost.

  • 🇨🇦Canada joseph.olstad

    @sylus,

    We're continuing ongoing evaluation of Azure. The flexible server option is in use.

    I'm going to paraphrase a colleague on this.

    He says this:

    "IOPS is not even the issue here, I still believe. The issue is whatever Azure did to secure connections etc. such as logging, auditing, intrution detection overhead. It is not a Drupal issue, it is an azure design issue.

    Profiling with AWS, only 9 seconds vs 71 seconds with Azure"

    time (for i in {1..1000}; do mysql -h hostname_for_mysql -u username --password='password_password' database_name -e "select 1 from dual;" 2>&1; done)>/dev/null 
    
    real	0m7.653s
    user	0m3.513s
    sys	0m3.704s
    

    7.653 seconds with my personal laptop running the wxt-docker-scaffold from the application docker container to the mariadb container.

    However run the same test in an Azure environment and we're seeing between 32seconds and 47 seconds for the exact same test.

    It's between 6 and 8 times slower than a 15 watt low power laptop running a full stack of 6 docker containers.

    AWS at least is able to deliver decent performance but Azure is not.

  • 🇨🇦Canada joseph.olstad

    An update on this, I ran another benchmark directly on a server grade db host

    Using a socket connection (localhost)
    we found a real result of 3.8 seconds

    Compared to the loop back 127.0.0.1 came in at over 7.8 seconds (very surprising result)

    Compare this to the real result of between 32 and 48 seconds for the azure db from the application side

    So, it appears that there's up to 44 seconds of network lag with an azure managed db on 1000 simple select calls.

    Surprising to see that a socket is over twice as fast as the loopback 127.0.0.1 when testing directly from the db server host.

  • 🇨🇦Canada joseph.olstad

    ok, ran some more tests, @sylus, you're correct with saying the MariaDB is significantly better, for some reason MySQL has much more TCP lag than MariaDB.

    a side note:
    if using a socket connection (localhost instead of 127.0.0.1) for example, MySQL 8 compares to MariaDB in the above mentioned test metric. MariaDB 10.6.1 however appears to handle TCP much more efficiently than MySQL 8.

    Still evaluating.

  • 🇨🇦Canada joseph.olstad

    @sylus, you mentioned MariaDB in the last paragraph of comment #2, how was this provisioned in the context of azure managed db? Is MariaDB a special option , which option did you choose to provision the azure managed db which enabled you to get MariaDB?

  • 🇨🇦Canada joseph.olstad

    Yesterday I ran some additional tests:

    installed MariaDB 10.6.1

    ran tests

    MARIADB 10.6.1 TCP (127.0.0.1) test

    time (for i in {1..1000}; do mysql -h 127.0.0.1 -u username --password='password_password' database_name -e "select 1 from dual;" 2>&1; done)>/dev/null
    real 0m6.084s

    user	0m3.403s
    sys	0m2.546s
    

    MARIADB 10.6.1 SOCKET test

    time (for i in {1..1000}; do mysql -h localhost -u username --password='password_password' database_name -e "select 1 from dual;" 2>&1; done)>/dev/null
    real 0m5.946s

    user 0m3.224s
    sys   0m2.608s

    then on the same server -
    purged MariaDB then:

    MySQL 8 installed

    MySQL v8 TCP (127.0.0.1) test

    time (for i in {1..1000}; do mysql -h 127.0.0.1 -u username --password='password_password' database_name -e "select 1 from dual;" 2>&1; done)>/dev/null 
    

    real 0m15.260s

    user	0m9.498s
    sys	0m4.024s

    MySQL v8 SOCKET test

    time (for i in {1..1000}; do mysql -h localhost -u username --password='password_password' database_name -e "select 1 from dual;" 2>&1; done)>/dev/null
    

    real 0m6.649s

    user 0m3.492s
    sys   0m2.909s

    Conclusion: MariaDB 10.6.1 handles TCP much more efficiently than MySQL v8 given a default configuration with only one option set
    The only option set was:
    max_allowed_packets=64M

    Same option set for both installations and test runs.

  • 🇨🇦Canada joseph.olstad

    Hi @sylus, I've been told that MariaDB is no longer available as a flexible server option.

    With my own benchmarks I've noticed IP lag issues with MySQL compared to what MariaDB is delivering.

    https://azure.microsoft.com/en-us/updates/azure-database-for-mariadb-wil...

    Wondering if you had any idea how this will affect those being forced into the managed db and what might be our options to mitigate this?

    It seems very strange that MySQL would have a marked IP lag issue compared to MariaDB, wondering if it is doing something or have a configuration option enabled by default that possibly could be disabled or ajusted to mitigate the performance issue?

  • 🇨🇦Canada joseph.olstad

    ongoing performance work trying to get MySQL 8 to perform fast in Azure has not yet been fruitful. Have cranked up IOPS as high as they go, cranked up the cost of the instance to 16000$ per month or more , still slow, cranked up memory, cpu cores, filesystem options, still slow as molasses.

    Yet when testing a db instance on a 200$ option with MariaDB in azure, it's nearly 3x faster than anything we've gotten out of Azure+MySQL 8.

  • 🇨🇦Canada joseph.olstad

    Additional analysis comparing MariaDB to MySQL:

    Measuring the /var/lib/mysql folders respectively from a fresh install and loading the same wxt based drupal dump file into both MySQL and MariaDB.

    1. Install db server
    2. load dump file
    3. rebuild caches
    4. turn off db service
    5. measure size of /var/lib/mysql folder(s)
    • MySQL folder contents size uncompressed approximately 5x more data compared to MariaDB uncompressed
    • MySQL folder contents compressed: ~ 250mb
    • MariaDB folder contents compressed: ~ 59 mb

    Conclusion:

    MySQL spreads it's filesystem out 5x larger than MariaDB with the exact same number of data rows and tables.
    MariaDB performance is vastly superior in comparison.

    Implications:

    This means 5x more IO operations for the same data on MySQL compared to MariaDB which has a profound effect on performance.

  • 🇨🇦Canada joseph.olstad

    Another note on the TCP performance.

    MariaDB has ProxySQL built-in which vastly improves tcp performance as seen from the socket vs localhost test results comparing MariaDB and MySQL.

    MySQL does not have ProxySQL - TCP performance is very poor in comparison to MariaDB.

    Exploring a new option now, PostgreSQL with Azure.

Production build 0.71.5 2024