Allow assertions on the number of database queries run during tests

Created on 7 April 2023, over 1 year ago
Updated 6 December 2023, about 1 year ago

Problem/Motivation

Once Add PerformanceTestBase for allowing browser performance assertions within FunctionalJavaScriptTests Fixed lands it will be feasible to add database query logging to PerformanceTestTrait, this would allow us to count the number of database queries on certain pages (hot and cold caches too etc.), add assertions for them, then tighten the tests when the number of queries drops, or be warned if it's going to increase (which might be fine, or might be a regression).

Steps to reproduce

Proposed resolution

Add a performance_test module to core.

Enable this module in PerformanceTestBase.

Add a query logging implementation using the database API's new query events system.

In a post response task, designed to run last, write the logged query events to a key/value entry.

In PerformanceTestTrait, check the key value and add the query count to the PerformanceData value object.

Queries are collected incrementally until they're checked - this means that if a page causes asset, image style, or AJAX controllers to be requested too, the queries from those will be added to the log.

The number of queries is not 100% deterministic in the currrent test coverage, this might be interactions of things like cache collectors which could vary by the order in which http requests finish on the tested site. We can add greater than and less than assertions to start with and potentially address this later. But also asserting on a range will still lets us adjust that range if core goes lower or higher than it due to a change.

Remaining tasks

Already using this to verify 📌 Make POST requests render cacheable Needs work and it found a missing change in that issue.

Additionally, via working on the two issues I think I've found a caching bug/shortcoming in Views 📌 Views adds cache contexts inconsistently Active .

Once this is committed, we can build on it (probably without any significant changes, just additions) to add database queries to opentelemetry traces too 📌 Add database query spans to otel traces Needs review . They'll then be browsable on http://grafana.tag1demo.tag1.io/d/teMVIdjVz/umami?orgId=1&refresh=30s&fr...

User interface changes

API changes

Data model changes

Release notes snippet

📌 Task
Status

Fixed

Version

10.2

Component
PHPUnit 

Last updated about 14 hours ago

Created by

🇬🇧United Kingdom catch

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • Issue created by @catch
  • 🇬🇧United Kingdom catch
  • 🇬🇧United Kingdom catch

    We can probably use the brand new API from Allow the database query log to be dispatched as log events Fixed to collect the queries. Bit of a question on how best to get that data into phpunit from the tested site, also whether we only implement this for functional tests or also kernel tests.

  • Merge request !5268Add query events to performance testing → (Open) created by catch
  • Status changed to Needs work about 1 year ago
  • Status changed to Needs review about 1 year ago
  • 🇬🇧United Kingdom catch

    Couple of things to figure out, but the general concept is here.

    1. Need to see if the full backtrace was intentional in the database logger, if it was, add a workaround here.

    2. There are always variations of a handful of queries within these tests so I've had to use >= <= comparisons for the assertions. Will open a follow-up to investigate exactly why. Already opened 🐛 Umami content is all created in the same second Active but same problem with the standard profile so it's not only that.

    3. We're currently writing more to the log than we need just to get a count, but doing so enables sending data to open telemetry later. I will probably do that in a separate issue/MR to see how much work is involved, and maybe move it here if it turns out to be easy.

  • Status changed to Needs work about 1 year ago
  • 🇮🇹Italy mondrake 🇮🇹

    One inline comment in the MR.

  • Status changed to Needs review about 1 year ago
  • 🇬🇧United Kingdom catch

    @mondrake I think if we do that, it should be its own issue since it's making changes to the actual database event API then (even if smaller ones), the question then becomes should it be a blocker to this issue (making this PP-1) or a follow-up to make things more flexible after this lands?

  • 🇮🇹Italy mondrake 🇮🇹

    #9 I certainly would not be blocking this for that; followup (if wanted) is OK. Rather fringe anyway.

  • 🇬🇧United Kingdom catch

    Updated the issue summary.

  • 🇬🇧United Kingdom catch

    This can be added to the existing CR, so tagging for needs CR updates: https://www.drupal.org/node/3366904

  • Status changed to RTBC about 1 year ago
  • 🇩🇪Germany Fabianx

    RTBC - Code looks good to me, fixes an important oversight from the last MR related to this and overall enhances the performance testing capabilities.

    Only caveat is that only pages loaded via the Mink driver can be performance tested in that way right now, but that is by design and hence the perfect scope and hence fine for this issue.

  • 🇺🇸United States xjm
  • Status changed to Needs work about 1 year ago
  • 🇺🇸United States xjm

    Nifty. Bumping to major -- this is an important gap in our test coverage currently and historically.

    Posted some CS suggestions on the MR. Additionally, I think the code changes should document both the `@todo` and the "why" behind the specific ranges of query counts used. Seems very arbitrary ATM.

    I think the CR updates are also outstanding? The relevant CR(s) should also be attached to this issue.

    Thanks!

  • 🇬🇧United Kingdom catch

    @xjm we can't update already-published CRs until issues are committed, otherwise the CRs would be misleading. Or that's always been my understanding of 'needs change record updates'.

    I applied the suggestions.

    Will add the @todo and the why comments next time I'm properly looking at this, I don't like the 'why' very much but it's the reality of HEAD and the reason we need stuff like this in the first place.

  • Status changed to Needs review about 1 year ago
  • 🇬🇧United Kingdom catch

    Added the additional docs/todos and opened 📌 Investigate variable database query counts in performance tests Active .

  • Status changed to RTBC about 1 year ago
  • 🇩🇪Germany Fabianx

    Back to RTBC!

  • 🇬🇧United Kingdom catch

    Added the CR to this issue, and since it's only a one line addition to the code examples, went ahead and updated it now.

  • Status changed to Needs work about 1 year ago
  • 🇸🇰Slovakia poker10

    Seems like the last pipeline job (https://git.drupalcode.org/project/drupal/-/jobs/359970) failed with:

        There was 1 failure:
        
        1)
        Drupal\Tests\standard\FunctionalJavascript\StandardPerformanceTest::testNoJavaScript
        Failed asserting that 508 is equal to 507 or is less than 507.
    

    We probably need to increase the max. number of expected queries in StandardPerformanceTest::testNoJavaScript().

    Otherwise I think the MR looks good!

  • Status changed to RTBC about 1 year ago
  • 🇬🇧United Kingdom catch

    Oof yeah that is the drawback of the current state, but the follow-up to this in 📌 Add database query spans to otel traces Needs review should help us diagnose what the differences are since you'll be able to see which queries ran between different requests in the traces.

    Since this is a one character change, moving back to RTBC. Having to specify the range means we can end up with random failures once this is committed too - but we can always easily fix them by adjusting the range up or down. And I hope we can make things a lot more predictable in follow-ups with the tests or whatever's causing the variation in core too.

  • 🇬🇧United Kingdom catch

    Now have a working implementation on the follow-up to this at 📌 Add database query spans to otel traces Needs review - only a dozen or so additional lines of code to get devel-esque query reports into the tempo traces (see screenshot in the issue if that sentence doesn't make much sense).

  • Status changed to Needs work about 1 year ago
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    Discussed with @catch. We had a think about the asserting on ranges of query numbers and think that maybe this can be improved by separating cache queries from other queries. Then we can assert against a fixed count for other queries and a range for cache queries.

  • 🇬🇧United Kingdom catch

    catch changed the visibility of the branch 3352851-db-query-assertions to hidden.

  • 🇫🇷France andypost

    There's common approach to collect this numbers https://symfony.com/doc/current/profiler.html

    Moreover it's used by the webprofiler contrib module, the implementation to collect queries DatabaseDataCollector.php

    Extra dev dependency could be added even into minor release

  • 🇬🇧United Kingdom catch

    I've pushed a commit that does #24, but working on that made me realise with only a small bit of extra work, we can cover all of 📌 Separate cache operations from database queries in OpenTelemetry and assertions Fixed . We'll eventually want to do something to allow similar when running redis as the cache backend, but that's only going to help custom performance tests on real sites and won't be made harder by splitting the queries like this, might even be easier since we could probably swap out the logic while keeping the API the same.

    So leaving at needs work a bit more to try that out.

    @andypost the way that webprofiler collects the queries and the way that the performance_test module added here collects the queries are the same - it relies on the database statement events (although it has a nifty way to enable events on all connections which is a good idea we should borrow).

    The difference is that it's using Symfony profiler to store and render the events as you point out. We could re-use the storage, but that will result in more code more than less.

    Rendering the profiles won't help us much - I guess we could store them as artifacts, but that doesn't give you cross branch/MR/time comparisons in the same way that the grafana stack does.

    So I'd need a bit of convincing that it'd actually be a net benefit.

  • 🇬🇧United Kingdom catch

    Discussed this with @alexpott in slack.

    The ranges on the database queries mean that it's going to be hard to show regressions in the current state, while I think we could have deferred that to a follow-up, we had an idea to separate out cache and other database queries from each other, since about 98% of the variation is from cache collector writes. This lets us assert on smaller ranges everywhere, in some cases using assertSame(), although I haven't switched from ranges to same for those cases yet in case results are different on gitlab.

    This pre-empts 📌 Separate cache operations from database queries in OpenTelemetry and assertions Fixed a bit, but we can keep the same API for assertions if we do that and just change how the data is collected (i.e. via a cache backend decorator, so it works for memcache and redis too). The basic logic of filtering out cache queries would need to stay if we do that, so that we don't duplicate between the two and regress the more specific assertions here.

    I've added the is_a() check, slightly awkward due to the double backslashes in the backtrace output.

  • Status changed to Needs review about 1 year ago
  • 🇬🇧United Kingdom catch

    I think I've got things as precise as I can for the initial pass:

    - now more aggressively checks that nothing from previous requests is finishing off before it starts collecting data

    - added some pre-warming drupalGet() calls to try to stabilise the site a bit more before making requests. This means we're testing slightly different things (i.e. not a completely cold cache), but that's fine.

    - switched to assertSame() for the assertions that seem to be actually predictable now, which makes it easier to see which still have to assert a range.

  • Status changed to RTBC about 1 year ago
  • 🇩🇪Germany Fabianx

    Back to RTBC!

    One thing is that we should open a follow up issue to generally track cache events in the caching layer and not in the database layer.

    As the performance data is neutral to where it comes from , this is straightforward:

    - Collect cache events
    - Subscribe to it
    - Add to performance data
    - Ignore DB cache things instead

  • 🇬🇧United Kingdom catch

    That issue is here: 📌 Separate cache operations from database queries in OpenTelemetry and assertions Fixed .

    Once we have that (we don't have cache events, but we can decorate the services to get this), it'll be a case of completely skipping the database queries to avoid duplication - which just means we can simplify the cache-special casing added here. I was originally worried about duplication between the two things, which was the reason for tackling this first, but this issue has answered how to handle that so should be easy enough.

  • 🇬🇧United Kingdom catch

    Now using this on 📌 Make POST requests render cacheable Needs work and it's successfully picking up additional cache gets after submitting the login form (also additional database queries, at least one of which I've got a theory about but need to investigate more).

  • Status changed to Fixed about 1 year ago
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    Committed and pushed b1a8736022 to 11.x and fdf0228fb6 to 10.2.x. Thanks!

    Backported to 10.2.x as this is a test-only change and could prove useful to contrib and custom testing in 10.2.x

    • alexpott committed fdf0228f on 10.2.x
      Issue #3352851 by catch, Fabianx, mondrake, xjm, alexpott: Allow...
    • alexpott committed b1a87360 on 11.x
      Issue #3352851 by catch, Fabianx, mondrake, xjm, alexpott: Allow...
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024