Switch the performance job to fail

Created on 30 December 2024, 9 days ago

Problem/Motivation

Follow-up from πŸ“Œ Add a (daily?) job against development versions of dependencies Active , πŸ“Œ Add performance testing Active , πŸ“Œ Run performance tests in their own job Active etc.

The performance job currently allows failure, so a build won't fail if performance tests do.

I think this runs the risk of problems being introduced and being harder to track down where they came from, so it would be good to have it fail properly.

Generally, when performance tests fail due to a change (whether a contrib update or a recipe change), the only thing that needs to happen is to adjust the assertions to the new value. Usually one extra cache get, or query, or 1kb of CSS is fine. Having the test fail and be updated just means we know what's happening when it happens. If it's unexpected, can always open a follow-up to investigate - this is how we handle the majority of changes with core.

If something suddenly adds 20 database queries or 30 cache gets or 50kb of CSS, there's probably a problem though - and the only way to find this out is to have precise assertions in the first place.

When upcoming releases of contrib modules affect performance tests (which doesn't happen with core). This should be somewhat mitigated by πŸ“Œ Add a (daily?) job against development versions of dependencies Active . If a contrib module adds one extra query in a dev release, a performance test can be adjusted to use assertGreaterThan and assertLessThan instead of assertSame so it passes with both versions, then a follow-up to use assertSame for the new number.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

πŸ“Œ Task
Status

Active

Component

General

Created by

πŸ‡¬πŸ‡§United Kingdom catch

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

Sign in to follow issues

Comments & Activities

  • Issue created by @catch
  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    I really don't wanna use the strict asserts on query counts, and I admit I don't understand the reasons why we're doing that. What's wrong with setting some upper bounds on queries?

  • πŸ‡¬πŸ‡§United Kingdom catch

    @phenaproxima so for example, let's say hitting the front page with a cold render cache takes 50 queries. To avoid a strict query count, you do assertLessThanOrEqualsTo($query_count, 60)

    Issue A adds 10 extra queries, because it adds a poorly optimized block from a contrib module to the front page (or whatever).

    Issue B adds 1 extra query because it installs redirect module, which does one extra database query on every page to look up the redirects.

    Now issue B is responsible for upping the limit, and issue A went in undetected.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    That makes sense. But I don't see how it prevents contributors (or hell, me) from mindlessly changing the query count in order to get tests passing in the name of delivering some feature or bug fix. It's not realistic to expect me, or any contributor, to do a performance deep dive when we add a module, or change some configuration, and the query count goes up (or down) by some amount. We'll just adjust the test and move on.

    So I guess it's just hard to see the benefit of fine-grained performance testing when our product is a constellation of modules we don't really control. I'm willing to be convinced, though; what am I missing?

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts
  • πŸ‡¬πŸ‡§United Kingdom catch

    will always be the higher priority for Drupal CMS, given its mission and development philosophy.

    Drupal CMS is intended to be used by smaller-medium sized sites and organisations, this should mean:

    1. Good front end performance out of the box, without needing to write custom code or install a bunch of extra modules on top. I think πŸ› Privacy policy link goes to a 403 Active saving over 200kb of JavaScript and CSS for every site visitor out of the box is a great example of this working. There is no point providing search engine optimisation recipes if Core Web Vitals is bad, it undoes all the effort.

    2. Snappy admin experience, people evaluating Drupal CMS either via a demo or a local install will be put off if basic admin and editorial tasks mean waiting seconds for pages to load. πŸ“Œ Add render caching for the navigation render array Active was found by the Drupal CMS tests because Navigation wasn't included in core's tests yet, and that's already fixed (in 11.2, we should be able to backport it to 11.1 still I think).

    3. It should be able to run on cheaper hosting plans. This doesn't mean having to scale to hundreds of thousands of entities out of the box, but it should mean using render caching effectively and things like that. Look at https://www.bluehost.com/wordpress/wordpress-hosting or https://www.dreamhost.com/wordpress/#wp-plans

    It's unrealistic to expect me, or any contributor, to do a performance deep dive when we add a module, or change some configuration, and the query count goes up (or down) by some amount. We'll just adjust the test and move on.

    If module/recipe X is added/updated, and it causes a regression in performance tests, there's two easy things that can be done:

    1. Open a Drupal CMS follow-up to look into it more while going ahead for now.

    2. Open an issue against the contrib module linking back to the performance test results, suggesting there might be a problem. πŸ› Klaro library seems way too large? Active is an example of this working (the issue existed two days before I would have opened it).

    Both of these make it infinitely easier for either the contrib maintainer or someone like me to have a look later.

    So as a compromise...what if we adopted upper bounds, but made them fairly marginal? For example - what if we gave the query count a 5-query margin of error (or even less than that)

    So there's another reason not to do this, but I forgot to mention it above. If you look at https://api.drupal.org/api/drupal/core%21profiles%21standard%21tests%21s... it includes the actual queries being executed. This means as soon as there's a change, you can see exactly why/what changed, without having to do any kind of additional profiling. This has saved a lot of time and messing about in core issues already (once we got it working). It's basically devel query log for tests. But for that to work, you need to assert on the actual queries.

    Overall, having exact query counts + query log assertions with allow failure would be better than require fail and fuzzy assertions I think.

    (Another thought occurs to me, which is more of a general performance testing thing and therefore not in scope - I don't know if number of queries is as useful a metric as time spent waiting for queries. Is there any way to measure the latter?

    There's an inherent problem adding phpunit test coverage for anything based on timings, because even running the same test on the same machine results in different times, let alone running it on different machines. So we've avoided any kind of time-based assertions because they can never be reliable enough to pick a pass/fail threshold and not cause problems in one or the other direction.

    However, the grafana dashboard that you can get locally with ddev gander, and which we're hoping to set up for Drupal CMS, does include the timings: https://gander.tag1.io/explore?panes=%7B%22Tkt%22:%7B%22datasource%22:%2... (hopefully the link works)

    If not, going to https://gander.tag1.io/?orgId=1&refresh=30s and clicking on an 'Umami front page cold cache' will show the same trace.

    So it's easy to see the timings, it's just not a good idea to assert on them. There's also a secondary issue that almost any database query on a nearly-empty database is going to be fast, to flush out slow queries we'd need to test against a lot of content. I have thought about running EXPLAIN on every query and then failing if there's a filesort or similar, but not got anywhere near that yet.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    That's all pretty persuasive. Thanks for taking the time to explain it.

    Here's what I propose:

    • We make the performance tests a must-pass job.
    • The exact query counts become assertLessThanOrEqual() assertions instead. That makes it okay for there to suddenly be fewer queries, but you need to explicitly adjust the tests if more queries are added.
    • We put some lengthy and helpful comments in the test (with documentation links, if there are any) about how to alter the tests when needed, and what's appropriate and what's not. So at least contributors have some guidance.
  • πŸ‡¬πŸ‡§United Kingdom catch

    The exact query counts become assertLessThanOrEqual() assertions, but keep their current numbers. That makes it okay for there to suddenly be fewer queries, but you need to explicitly adjust the tests if more queries are added.

    This also causes problems unfortunately, however as I was writing up a long response I might have found a way to mitigate most of those problems:

    The first problem:

    - Let's say you use assertLessthanOrEqual(60). The actual query count is 60.

    - One issue/contrib update removes a query, the assertion stays the same, the real query count goes down to 59, unknown to anyone.

    - Another issue/contrib update adds a query, now you're back to 60 - but this additional query is due to a bug. Again it's not recorded.

    - Another issue adds one more query, this one is necessary, but now you have to increase the query count to 61, but you didn't learn anything from the previous two issues.

    2. As discussed above, performance tests allow for asserting on the exact queries being run, I didn't implement this in the Drupal CMS performance tests yet, mainly because πŸ“Œ Get coffee data only when the search box is opened Active made that incredibly noisy (it's now fixed but it's not in a tagged release yet).

    However once you add assertions on actual queries, which is ideal for debugging/understanding changes, you can't assertLessThan() any more if you're asserting that two arrays are identical.

    However, mayyyybe instead of asserting the exact queries we could use an array_intersect()/array_diff() or something, so that if queries are removed from the expected queries it's fine, but not if they're added.

    This would solve the inability to assert on exact queries with some degree of fuzziness, it would make it easier for tests to take into account the canary job (can add a query without it failing on the non canary job), and it would prevent new queries from slipping through even if the overall count is way under.

    The only thing that leaves then is that eventually there will be crufty queries in the assertions that never run, but it's easy to adjust the test locally to exact assertions, delete a load of stuff from the array/lower thresholds, then change the assertions back and push an MR to clean things up.

    We can switch to assertLessthan() before πŸ“Œ Get coffee data only when the search box is opened Active is fully resolved in Drupal CMS, but sorting out the specific query assertions really needs it IMO to avoid a lot of noise/cruft that immediately needs to be removed.

Production build 0.71.5 2024