Switch the performance job to fail

Issue created by @catch
Comment 10 months ago →
🇺🇸United States phenaproxima Massachusetts
I really don't wanna use the strict asserts on query counts, and I admit I don't understand the reasons why we're doing that. What's wrong with setting some upper bounds on queries?
Comment 10 months ago →
🇬🇧United Kingdom catch
@phenaproxima so for example, let's say hitting the front page with a cold render cache takes 50 queries. To avoid a strict query count, you do assertLessThanOrEqualsTo($query_count, 60)

Issue A adds 10 extra queries, because it adds a poorly optimized block from a contrib module to the front page (or whatever).

Issue B adds 1 extra query because it installs redirect module, which does one extra database query on every page to look up the redirects.

Now issue B is responsible for upping the limit, and issue A went in undetected.
Comment 10 months ago →
🇺🇸United States phenaproxima Massachusetts
That makes sense. But I don't see how it prevents contributors (or hell, me) from mindlessly changing the query count in order to get tests passing in the name of delivering some feature or bug fix. It's not realistic to expect me, or any contributor, to do a performance deep dive when we add a module, or change some configuration, and the query count goes up (or down) by some amount. We'll just adjust the test and move on.

So I guess it's just hard to see the benefit of fine-grained performance testing when our product is a constellation of modules we don't really control. I'm willing to be convinced, though; what am I missing?
Comment 10 months ago →
🇺🇸United States phenaproxima Massachusetts
Comment 10 months ago →
🇬🇧United Kingdom catch
will always be the higher priority for Drupal CMS, given its mission and development philosophy.

Drupal CMS is intended to be used by smaller-medium sized sites and organisations, this should mean:

1. Good front end performance out of the box, without needing to write custom code or install a bunch of extra modules on top. I think 🐛 Privacy policy link goes to a 403 Active saving over 200kb of JavaScript and CSS for every site visitor out of the box is a great example of this working. There is no point providing search engine optimisation recipes if Core Web Vitals is bad, it undoes all the effort.

2. Snappy admin experience, people evaluating Drupal CMS either via a demo or a local install will be put off if basic admin and editorial tasks mean waiting seconds for pages to load. 📌 Add render caching for the navigation render array Active was found by the Drupal CMS tests because Navigation wasn't included in core's tests yet, and that's already fixed (in 11.2, we should be able to backport it to 11.1 still I think).

3. It should be able to run on cheaper hosting plans. This doesn't mean having to scale to hundreds of thousands of entities out of the box, but it should mean using render caching effectively and things like that. Look at https://www.bluehost.com/wordpress/wordpress-hosting or https://www.dreamhost.com/wordpress/#wp-plans

It's unrealistic to expect me, or any contributor, to do a performance deep dive when we add a module, or change some configuration, and the query count goes up (or down) by some amount. We'll just adjust the test and move on.

If module/recipe X is added/updated, and it causes a regression in performance tests, there's two easy things that can be done:

1. Open a Drupal CMS follow-up to look into it more while going ahead for now.

2. Open an issue against the contrib module linking back to the performance test results, suggesting there might be a problem. 🐛 Klaro library seems way too large? Active is an example of this working (the issue existed two days before I would have opened it).

Both of these make it infinitely easier for either the contrib maintainer or someone like me to have a look later.

So as a compromise...what if we adopted upper bounds, but made them fairly marginal? For example - what if we gave the query count a 5-query margin of error (or even less than that)

So there's another reason not to do this, but I forgot to mention it above. If you look at https://api.drupal.org/api/drupal/core%21profiles%21standard%21tests%21s... it includes the actual queries being executed. This means as soon as there's a change, you can see exactly why/what changed, without having to do any kind of additional profiling. This has saved a lot of time and messing about in core issues already (once we got it working). It's basically devel query log for tests. But for that to work, you need to assert on the actual queries.

Overall, having exact query counts + query log assertions with allow failure would be better than require fail and fuzzy assertions I think.

(Another thought occurs to me, which is more of a general performance testing thing and therefore not in scope - I don't know if number of queries is as useful a metric as time spent waiting for queries. Is there any way to measure the latter?

There's an inherent problem adding phpunit test coverage for anything based on timings, because even running the same test on the same machine results in different times, let alone running it on different machines. So we've avoided any kind of time-based assertions because they can never be reliable enough to pick a pass/fail threshold and not cause problems in one or the other direction.

However, the grafana dashboard that you can get locally with ddev gander, and which we're hoping to set up for Drupal CMS, does include the timings: https://gander.tag1.io/explore?panes=%7B%22Tkt%22:%7B%22datasource%22:%2... (hopefully the link works)

If not, going to https://gander.tag1.io/?orgId=1&refresh=30s and clicking on an 'Umami front page cold cache' will show the same trace.

So it's easy to see the timings, it's just not a good idea to assert on them. There's also a secondary issue that almost any database query on a nearly-empty database is going to be fast, to flush out slow queries we'd need to test against a lot of content. I have thought about running EXPLAIN on every query and then failing if there's a filesort or similar, but not got anywhere near that yet.
Comment 10 months ago →
🇺🇸United States phenaproxima Massachusetts
That's all pretty persuasive. Thanks for taking the time to explain it.

Here's what I propose:

We make the performance tests a must-pass job.

The exact query counts become assertLessThanOrEqual() assertions instead. That makes it okay for there to suddenly be fewer queries, but you need to explicitly adjust the tests if more queries are added.

We put some lengthy and helpful comments in the test (with documentation links, if there are any) about how to alter the tests when needed, and what's appropriate and what's not. So at least contributors have some guidance.
Comment 10 months ago →
🇬🇧United Kingdom catch
The exact query counts become assertLessThanOrEqual() assertions, but keep their current numbers. That makes it okay for there to suddenly be fewer queries, but you need to explicitly adjust the tests if more queries are added.

This also causes problems unfortunately, however as I was writing up a long response I might have found a way to mitigate most of those problems:

The first problem:

- Let's say you use assertLessthanOrEqual(60). The actual query count is 60.

- One issue/contrib update removes a query, the assertion stays the same, the real query count goes down to 59, unknown to anyone.

- Another issue/contrib update adds a query, now you're back to 60 - but this additional query is due to a bug. Again it's not recorded.

- Another issue adds one more query, this one is necessary, but now you have to increase the query count to 61, but you didn't learn anything from the previous two issues.

2. As discussed above, performance tests allow for asserting on the exact queries being run, I didn't implement this in the Drupal CMS performance tests yet, mainly because 📌 Get coffee data only when the search box is opened Active made that incredibly noisy (it's now fixed but it's not in a tagged release yet).

However once you add assertions on actual queries, which is ideal for debugging/understanding changes, you can't assertLessThan() any more if you're asserting that two arrays are identical.

However, mayyyybe instead of asserting the exact queries we could use an array_intersect()/array_diff() or something, so that if queries are removed from the expected queries it's fine, but not if they're added.

This would solve the inability to assert on exact queries with some degree of fuzziness, it would make it easier for tests to take into account the canary job (can add a query without it failing on the non canary job), and it would prevent new queries from slipping through even if the overall count is way under.

The only thing that leaves then is that eventually there will be crufty queries in the assertions that never run, but it's easy to adjust the test locally to exact assertions, delete a load of stuff from the array/lower thresholds, then change the assertions back and push an MR to clean things up.

We can switch to assertLessthan() before 📌 Get coffee data only when the search box is opened Active is fully resolved in Drupal CMS, but sorting out the specific query assertions really needs it IMO to avoid a lot of noise/cruft that immediately needs to be removed.
Merge request !484Update assertions that are currently failing. → (Merged) created by catch
Pipeline finished with Success
9 months ago
Total: 980s
#413410
Pipeline finished with Failed
9 months ago
Total: 859s
#413449
Pipeline finished with Success
9 months ago
Total: 1982s
#413556
Pipeline finished with Success
9 months ago
Total: 1239s
#413567
Comment 9 months ago →
🇬🇧United Kingdom catch
Worked on this a bit:

1. A couple of assertions are failing on HEAD already, updated those here so they pass.

2. Added specific database queries along with a lessThanOrEquals assertion as described above. This should work to still find new queries without being as noisy when they're removed.

3. Added @todos on 11.2 to improve the specific assertions further.

4. Removed allow_failure: false from the performance test job, although easy to revert that and move it to another follow-up if you wanted to get the test improvements first.

I haven't made a 1.0.x MR yet, but happy to do so once the overall approach her has had reviews, or could do the backport after initial commit, or could alternatively not backport and let this settle in a bit until it becomes the release branch.
Pipeline finished with Success
9 months ago
Total: 1747s
#413619
Comment 9 months ago →
🇺🇸United States phenaproxima Massachusetts
I'm comfortable with this change. I think it's good to have some solid guard rails against sudden performance problems, and I've seen enough consistency from the performance test since it was introduced, to convince me that it will genuinely catch performance regressions and not become a nuisance that slows development down.
Pipeline finished with Skipped
9 months ago
#413708
Comment 9 months ago →
System Message

phenaproxima → committed ea6937bd on 1.x authored by catch →
Issue #3496673 by catch, phenaproxima: Require the performance tests to...
Comment 9 months ago →
🇺🇸United States phenaproxima Massachusetts
Merged into 1.x. Thanks!

I'm also not sure about backporting, so leaving this open to continue considering that.

Switch the performance job to fail

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!484Switch the performance job to fail
Merged

Comments & Activities

Switch the performance job to fail

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!484Switch the performance job to failMerged

Comments & Activities

!484Switch the performance job to fail
Merged