[Meta] The quest for optimizations of Drupal's installer

Created on 7 September 2023, about 1 year ago

Problem/Motivation

This past week I dove into the performance of Open Social's installation. Installation generally requires a lot of RAM and makes a lot of database queries, taking a lot of time. After an initial reduction by removing a trigger that rebuild things on every entity actions I was able to bring this down from ~1.25 million to 521.789. While I understand that Open Social's situation of installing 309 modules is perhaps unique and people's initial reaction might be to "use fewer modules", I strongly believe that with Drupal powering ever more creative and ambitious projects, its installer should live up to this task more easily and we hopefully won't be unique for long.

This issue does not offer solutions per-se, but I do want to share my week-long journey in the hopes that something useful can come of it. My initial goal was to reduce the amount of memory used (with run-time being a nice-to-have but not necessarily important). However, an initial profile mostly showed that the entire database layer was consuming memory, so to figure out what data was actually needed when, my first side quest was to reduce the amount of database queries.

The top-10 queries when starting out are listed below. It was surprising to see that key_value look-ups were 60% of the queries. [Text continues after the able].

Of course those 336011 queries were not exactly the same query, they all had slightly different arguments. For brevity I'll omit the arguments table I made but the large majority was some value in the entity.storage_schema.sql collection with slight deviations in the key being loaded.

Looking at the KeyValue Store's DatabaseStorage class I saw that mostly all operations could be cached in-memory and did not actually require database queries (yes this increases memory usage of the process, but it also allows other data fetching routes to surface for analysis).

This already reduced the number of queries for the key value store by 60% (from 336812 to 110947).

One thing I noticed in the arguments list left over was that the number of queries from the entity.storage_schema.sql collection were now reduced to once per module being installed, however all the individual fields were being loaded separately.

This is something that's done in SqlContentEntityStorage. I don't know exactly whether at run-time it needs the entire schema or that in those cases loading the individual fields is more optimised, but at least during an extension install it seems to need the entire schema every time. I found that calling getAll when the collection was instantiated reduced the amount of queries significantly.

It's important to note a pattern though in that it seemed something higher up was actually being cleared from the memory cache on every module install. Catch noted that "the container gets reset by DrupalKernel::updateModules() which is called from ModuleInstaller::install()" which could explain why the loading of the individual fields before aggregation of getAll showed a pattern of being loaded once per module installed (with modules installed earlier in the process having a higher query count than modules installed later).

This change of loading the entire collection reduced the amount of key_value table queries from 110.947 to 13.953, a ~90% reduction. Half of that was now entity.storage_schema.sql at 6.257 queries, but that's now only 3% of all database queries in the installation process. The total number of queries for the process were brought down from 295.924 to 198.930, another 30% reduction.

Although this is where I ran out of time to dive further, this was the first time where the top contenders for the number of database queries changed. The new top 10 is listed below.

The top two queries are now reading configuration at about 25% of the queries. Followed by 3 menu_tree reading queries for about 20% of the total queries.

Steps to reproduce

Proposed resolution

I don't purport to have any read-to-use solutions, but I do hope to provide people smarter than me and more knowledgeable than me about Drupal's installation process with insights into installing a project with 100s of modules. It looks like for safety reasons we throw out pretty much all the data we've loaded after we install something. For single modules or themes this generally works fine; however especially in an installer which knows it's going to iterate on the state it just created that produces a lot of waste.

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

📌 Task
Status

Active

Version

11.0 🔥

Component
Install 

Last updated 2 days ago

No maintainer
Created by

🇳🇱Netherlands Kingdutch

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.71.5 2024