Problem/Motivation
It would be useful to have automated performance testing for Drupal core. Manual performance testing is sometime required for issues, but it has a number of limitations.
Much like test coverage for bugs, it's easier to introduce a regression without a test than to fix it. This is because performance regressions don't always look like 'performance' issues.
Even where we do manual performance testing, it can be hard to determine what and how to test - benchmarks, xhprof, blackfire, devtools, EXPLAIN etc.. And people often struggle with producing useful performance data - i.e. ensuring that before/after comparisons are done on a site in exactly the same state for things like whether the cache is warm or not. It's also not easy to present performance data back to issues - links to blackfire go stale before issues get fixed, xhprof screenshots aren't accessible etc.
If we had some performance testing built into our CI framework, then we'd be able to see the improvements from some performance improvements, and the regressions from some performance regressions automatically. This would also provide some examples for people to apply to manual testing, or to expand coverage when new improvements are added or regressions found.
Steps to reproduce
Some recent fixed issues that introduced what should be measurable improvements or regressions. We can use these to see if performance testing shows a difference once they're reverted or not.
📌
Leverage the 'loading' html attribute to enable lazy-load by default for images in Drupal core
Fixed
🐛
Stampedes and cold cache performance issues with css/js aggregation
Fixed
🐛
Aggregation creates two extra aggregates when it encounters {media: screen} in a library declaration
Fixed
🐛
Performance regression introduced by container serialization solution
Fixed
Proposed resolution
There are broadly two types of performance tests we can do:
1. Absolute/objective/hard-coded/deterministic - write a phpunit test that ensures a certain thing (database queries, network requests) only happens a certain number of times, on a certain request.
An example of this that we already have in core is
#2120457: Add test to guarantee that the Standard profile does not load any JavaScript for anonymous users on critical pages →
. These allow us to fail commits on regressions, but the number of things we can check like this is extremely limited - it needs to be consistent across hardware and configurations. Also as well as actual regressions, tests will need to be adjusted due to functional changes (i.e. an extra block on the Umami front page, 'vegetable of the day', could mean an extra http request for an image, but this wouldn't be a 'regression' as such, just a new UX element in Umami).
2. Relative/subjective/dynamic/non-deterministic - these are metrics which are useful, but which vary on hardware, network, what else the machine is doing (like running other phpunit tests etc.) For these, we can collect certain metrics (time to first byte, largest contentful paint, entire xhprof runs), store those metrics permanently outside the test itself, i.e. with Open Telemetry, then graph over time, compare runs, show traces from specific pages etc. This might allow us to do things like compare the runs between a known state like 10.0.0 and an MR, if we can find a way to show diffs.
Remaining tasks
✨
Add PerformanceTestBase for allowing browser performance assertions within FunctionalJavaScriptTests
Fixed
adds PerformanceTestBase
and allows counting of actual network requests via chromedriver.
✨
Add OpenTelemetry Application Performance Monitoring to core performance tests
Fixed
send various non-deterministic data to OpenTelemetry for graphs/trends and possibly alerts.
📌
Add open-telemetry/sdk and open-telemetry/exporter-otlp as dev dependencies
Active
Add more data collection for both phpunit assertions and OpenTelemetry
📌
Allow assertions on the number of database queries run during tests
RTBC
📌
Add xhr and BigPipe assertions to PerformanceTestTrait
Active
Needs issue: add support for database query logging - we can count number of queries by SELECT/UPDATE/INSERT/DELETE, query time etc. A possible 'absolute' test would be asserting the number of database queries executed on a warm page cache request.
Needs issue - consider adding a trait that handles instrumentation for unit/kernel/functional tests.
User interface changes
API changes
Data model changes
Release notes snippet