Add PerformanceTestTrait::assertMetrics() so it is easier to write performance tests

Created on 27 September 2024, 3 months ago

Problem/Motivation

The existing performance tests include several assertions, like

    $this->assertSame(0, $performance_data->getQueryCount());
    $this->assertSame(1, $performance_data->getCacheGetCount());
    $this->assertSame(0, $performance_data->getCacheSetCount());
    // ...

This leads to a lot of cut-and-paste when creating new tests, and filling in the numeric values takes some work.

Another problem is that a failing test produces output like this:

Drupal\Tests\demo_umami\FunctionalJavascript\OpenTelemetryFrontPagePerformanceTest::testFrontPagePerformance
Failed asserting that 1 is identical to 3.

/var/www/html/core/profiles/demo_umami/tests/src/FunctionalJavascript/OpenTelemetryFrontPagePerformanceTest.php:51

and then you have to look at Line 51 in the test class to figure out which metric changed.

Steps to reproduce

Proposed resolution

Add a helper method to the PerformanceTestTrait class. The order of the parameters is to be decided, but the function signature should be something like this:

  public function assertMetrics(
    PerformanceData $performance_data,
    ?int $stylesheet_count = NULL,
    ?int $script_count = NULL,
    ?int $stylesheet_bytes = NULL,
    ?int $script_bytes = NULL,
    ?int $query_count = NULL,
    ?int $cache_get_count = NULL,
    ?int $cache_set_count = NULL,
    ?int $cache_delete_count = NULL,
    ?int $cache_tag_checksum_count = NULL,
    ?int $cache_tag_is_valid_count = NULL,
    ?int $cache_tag_invalidation_count = NULL,
  ): void

This method would make an assertion for any non-null argument, and use assertEqualsWithDelta() or PerformanceTestTrait::assertCountBetween() for the byte counts.

Add the optional $message parameter to each assertion so that a failing test tells you which metric is out of range.

Remaining tasks

User interface changes

None

Introduced terminology

None

API changes

Add the public method assertMetrics() to Drupal\Tests\PerformanceTestTrait.

Data model changes

None

Release notes snippet

N/A

๐Ÿ“Œ Task
Status

Active

Version

11.0 ๐Ÿ”ฅ

Component

phpunit

Created by

๐Ÿ‡บ๐Ÿ‡ธUnited States benjifisher Boston area

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @benjifisher
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States benjifisher Boston area

    The order of the parameters matters. The metrics that are most stable should come first, because some tests will want to check those but not the less stable metrics. See the discussion on ๐Ÿ“Œ Add assertions to OpenTelemetryFrontPagePerformanceTest::testFrontPageColdCache() Active .

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States benjifisher Boston area
  • ๐Ÿ‡ณ๐Ÿ‡ฟNew Zealand quietone
  • ๐Ÿ‡ง๐Ÿ‡พBelarus pavel.bulat

    pavel.bulat โ†’ made their first commit to this issueโ€™s fork.

  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz

    I like the idea, but we should also deal with range asserts for cases where we expect some variations in count.
    I suggest to allow null, int and array, so we can pass minimum/maximum range and then call assertCountBetween().

    protected function assertMetrics(
        PerformanceData $performance_data,
        int|array|null $stylesheet_count = NULL,
        int|array|null $script_count = NULL,
        int|array|null $stylesheet_bytes = NULL,
        int|array|null $script_bytes = NULL,
        int|array|null $query_count = NULL,
        int|array|null $cache_get_count = NULL,
        int|array|null $cache_set_count = NULL,
        int|array|null $cache_delete_count = NULL,
        int|array|null $cache_tag_checksum_count = NULL,
        int|array|null $cache_tag_is_valid_count = NULL,
        int|array|null $cache_tag_invalidation_count = NULL,
      ): void {
    

    Here is the example call for NodeAddTestPerformance ๐Ÿ“Œ Add performance test coverage for the node/add page Active

    $this->assertMetrics($performance_data, 1, 1, [30500, 30500], [212000, 213000], 124, 227, 190, 1, [30, 127], [30, 39], 0);
    

    To help finding expected values when we write tests, i also add a getMetrics() method which return an array with all metrics.

  • Pipeline finished with Success
    3 months ago
    Total: 739s
    #294520
  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States benjifisher Boston area

    @goz:

    Is it always the same metrics that need ranges? If so, then I think it would be simpler to pass, say stylesheet_bytes: 30550 and have assertMetrics() automatically use a range:

              $this->assertCountBetween($stylesheet_bytes - 50, $stylesheet_bytes + 50, $performance_data->getStylesheetBytes(), "Asserting $stylesheet_bytes");
    

    If different tests need different ranges, then we need something more flexible, like your proposal. But if we end up always using +50/-50, and always for the same metrics (byte counts, cache tag counts) then we should keep it simple.

    If we are going to use an array for a range, then I think we should use a base value and a delta, like [30550, 50], and let assertMetrics() convert that into the range [30500, 30600]. This will make it easier to write the tests (Let the computer do the arithmetic) and it will be clear what actual values we found while we were writing the test.

    Again, I think we should agree on the order of parameters. (See Comment #2.)

  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz

    @benjifisher

    Is it always the same metrics that need ranges? If so, then I think it would be simpler to pass, say stylesheet_bytes: 30550 and have assertMetrics() automatically use a range:

    No it's not, in some cases, like in ๐Ÿ“Œ Add assertions to OpenTelemetryFrontPagePerformanceTest::testFrontPageColdCache() Active , we also have variations for cache counts.

    If different tests need different ranges, then we need something more flexible, like your proposal. But if we end up always using +50/-50, and always for the same metrics (byte counts, cache tag counts) then we should keep it simple.

    Agreed, except the range is not always the same, depending of the complexity of the loaded page i guess. That's why i choose do define the range manually.

    If we are going to use an array for a range, then I think we should use a base value and a delta, like [30550, 50], and let assertMetrics() convert that into the range [30500, 30600]. This will make it easier to write the tests (Let the computer do the arithmetic) and it will be clear what actual values we found while we were writing the test.

    You will still have to to arithmetic to find out range and delta. If in the first test, we have 120000, the second 130000, the third 180376 we still have to figure out which first number to put, and which delta. Anyway, in much cases the first count we get + 100 or 50 delta should be enough.

    Anyway, in this case, assertEqualsWithDelta() should be more relevant than assertCountBetween().

    Again, I think we should agree on the order of parameters. (See Comment #2.)

    That depends of the relevance of those metrics, and i don't know which is more relevant than another. In any cases, if we always need them all, order is not a big deal. If i had to choose, may be the query count is the one i'll put at bottom.

  • ๐Ÿ‡ง๐Ÿ‡พBelarus pavel.bulat

    Hi @goz, @benjifisher.

    Again, I think we should agree on the order of parameters. (See Comment #2.)

    Maybe let's use "$metrics" array as a parameter instead?

       $expected_metrics = [
          'stylesheet_count' => 2,
          'script_count' => 1,
          'stylesheet_bytes' => 40150,
          'script_bytes' => 7250,
          'query_count' => 0,
          'cache_get_count' => 1,
          'cache_set_count' => 0,
          'cache_delete_count' => 0,
          'cache_tag_checksum_count' => 0,
          'cache_tag_is_valid_count' => 1,
          'cache_tag_invalidation_count' => 0,
        ];
        $this->assertMetrics($performance_data, $expected_metrics);
    

    I assume the amount of metrics will constantly grow, so using them as parameters won't be optimal. Also, I think it is more readable since you can immediately understand which number to change in case of failed test.

    Just summarising the options that we have:

    1. parameters
      cons:
      - the amount of them may grow
      - harder to read numbers if name of each parameter is not highlighted by IDE on the test level)
      - you need to pass all previous parameters to the assertMetrics to be able to check the last one if needed. When a new important metric is introduced, it will be hard to re-order parameters, since a lot of tests will be already using the assertMetrics, so, in this case, we always would need to add all previous parameters.
      pros:
      - each parameter has a type
      - could be highlighted by IDE
    2. array
      cons:
      - looks like it also will be copy pasted from other tests, since it requires a bit more effort to be created. Potentially we can add an example of the array in the description for the assertMetrics helper)
      - doesn't have types
      pros:
      - more readable on the test level, so easy to maintain
      - doesn't have any problems when new metrics are added

    I will be happy to hear your thoughts. Thanks in advance.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    In general the range for stylesheets/scripts is by design to give contributors leeway to make small changes without breaking the tests.

    In most cases ranges for cache/query counts are a resolvable issue in the test coverage, although we might want to commit those sometimes temporarily.

    Is it worth thinking about passing in two $performance_data objects instead of multiple args or an array? We could add extra things like acceptable ranges in their own properties.

  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz

    Is it worth thinking about passing in two $performance_data objects instead of multiple args or an array? We could add extra things like acceptable ranges in their own properties.

    @catch i'm not sure to understand what you mean here. Here is what i think you mean :
    For each metric in the performance data object definition, we hard code that for this metrics, it allows a range (or not). For example, stylesheets bytes have an acceptable range of 50, so if you make an assert of 1000, it will allow by default values between 950 and 1050.
    But for cache query count, we does not want a range, so if range configuration is set to 0, it means the value has to be the same.
    Of course, this range configuration could be overridden if test need it.

    @pavel.bulat the point to make an assertMetrics() with all parameters was to write performance tests faster. I agrees it's harder to know which parameters is made for without good IDE.
    However, building an array with metrics names/values is equivalent to asserting each value, except you have to give the exact name which is not auto-completed, even by IDE.

    If we want to have clean message and not only the breaking line, we can prepare the test with method in trait like expectMetricsForQueryCount(10), expectMetricsForCacheGetCount(0) (...) and then call an assertMetrics($perforance_data) which will launch asserts based on "range properties" defined. That allows to add another metrics later.

    In any cases, i'm not see any solutions to help developer writing tests AND keep readable tests AND not doing copy/paste.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    @catch i'm not sure to understand what you mean here. Here is what i think you mean :
    For each metric in the performance data object definition, we hard code that for this metrics, it allows a range (or not). For example, stylesheets bytes have an acceptable range of 50, so if you make an assert of 1000, it will allow by default values between 950 and 1050.
    But for cache query count, we does not want a range, so if range configuration is set to 0, it means the value has to be the same.
    Of course, this range configuration could be overridden if test need it.

    Yes it's not very well developed, but was thinking something like that.

    Then $this->assertMetrics($performance_data, $expected_performance_data)

  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz

    #14 sounds great. I open a new branch to work on it

  • Pipeline finished with Failed
    3 months ago
    Total: 144s
    #296241
  • Pipeline finished with Success
    3 months ago
    Total: 1978s
    #296242
  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz

    This does not reduce the amount of lines to write to make tests, but make metrics tests easier to read.

    Here is the changes :

    • Add an ExpectedPerformanceData object that inherit from PerformanceData.
    • Add the optional range parameter when setting expected datas. By default, Bytes tests are configured with a 50 range.
    • If no data is set in ExpectedPerformanceData object, the data will not be asserted.
    • To not assert queries, use $expected_performance_data->disableQueries()

    If some new metrics are added, we will have to add them to PerformanceData and extend setter in ExpectedPerformanceData to add range parameter + add the range getter and variable

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States benjifisher Boston area

    I like @pavel.bulatโ€™s suggestion (Comment #11), with a few modifications (see below).

    One thing I do not like about the proposed ExpectedPerformanceData class is that it is an extra level of complication added to the integer values we expect, which can simply be specified in an array. If we do have an ExpectedPerformanceData class, then it should be initialized with an array of values in the constructor โ€ฆ but then why not just work with the array of values?

    Another thing I do not like is that using the ExpectedPerformanceData class forces us to create integer-valued class variables like $stylesheetCount and $queryCount. That is an implementation detail that we should be able to change. The current implementation has both of those, but we might decide to implement getQueryCount() as count($this->queries). We might also decide to keep the full array of stylesheets and implement getStylesheetCount() the same way.

    Similar objection: I do not like being able to set $this->queryCount so that it is not the same as count($this->queries).

    The modifications I suggest are

    1. Use the public methods of the PerformanceData class, without the "get", as keys.
    2. Follow the usual pattern of assertions: the first parameter is the expected value, and the second parameter is the actual value.

    For example:

    $expected = [
      'StylesheetCount' => 1,
      'StylesheetBytes' => 4913,
      // ...
    ];
    $this->assertMetrics($expected, $performance_data);
    

    The assertMetrics() method should be responsible for setting ranges when appropriate. In the example above, it would create these assertions:

    $this->assertSame(1, $performance_data->getStylesheetCount(), 'helplful message');
    $this->assertCountBetween(4913 - 50, 4913 + 50, $performance_data->getStylesheetBytes(), 'another helplful message');
    
  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz

    The assertMetrics() method should be responsible for setting ranges when appropriate. In the example above, it would create these assertions

    What do you mean by "when appropriate" ?
    Even if we define stylesheetsCount has to be in a -50/+50 ranges, how do you deal with tests which need another range ?

    Why i like array + assertMetrics() : just one array to define + one assertMetrics to call.
    Why i dislike it : Needs to copy/paste from another tests or search about key strings. Against an object that will provide auto-completion in IDEs. And how to deal with range expectations in the array ?

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States benjifisher Boston area

    Even if we define stylesheetsCount has to be in a -50/+50 ranges, how do you deal with tests which need another range?

    If a "count" metric is different from one test to the next, then the test should not assert it. If we pick a range that fits the observed variation, we cannot be confident that the test will continue to pass, and if the test fails then we will not be sure that anything is wrong.

    If there are occasional exceptions, then we can use individual calls to assertCountBetween(). We should probably have custom $message parameters in such cases, with a code comment explaining the reason for the variation.

    The purpose of the range on the "byte" metrics is so that we can make small changes to CSS and JS files without constantly updating the tests. If we make a big change to one of those tests, or if there is an accumulation of small changes, then we will have to update the tests.

    See the discussion on the parent issue (#3476416) about replacing assertLessThan() with assertCountBetween(). Also see the good work that @pfrenssen did on the sibling issue ๐Ÿ“Œ Add assertions to OpenTelemetryFrontPagePerformanceTest::testFrontPageColdCache() Active to get stable "count" metrics.

    @catch made some of the same points in the first two sentences of Comment #12, but less explicitly.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    If there are occasional exceptions, then we can use individual calls to assertCountBetween().

    Yes I think this is fine.

    The ranges are only to allow a certain amount of refactoring/bugfixing to take place without having to update the numbers, but not to much so that we miss significant size reductions or increases. There should never be variation unless the actual CSS or JavaScript changes and we can always tweak the global range settings if they turn out to be too strict or too lax.

  • The Needs Review Queue Bot โ†’ tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

    This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide โ†’ to find step-by-step guides for working with issues.

  • Pipeline finished with Failed
    2 months ago
    Total: 203s
    #311561
  • Pipeline finished with Success
    2 months ago
    Total: 1321s
    #311571
  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz

    Last MR to implements the way talked in #20 #21.

    An implementation example :

        $expected = [
          'QueryCount' => 124,
          'CacheGetCount' => 227,
          'CacheSetCount' => 190,
          'CacheDeleteCount' => 1,
          'CacheTagIsValidCount' => 37,
          'CacheTagInvalidationCount' => 0,
          'ScriptCount' => 1,
          'ScriptBytes' => 212313,
          'StylesheetCount' => 1,
          'StylesheetBytes' => 29911,
        ];
        $this->assertMetrics($expected, $performance_data);
    
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    This issue has 3 open MRs so not clear what is to be reviewed.

    2 should be closed or hidden.

  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz

    goz โ†’ changed the visibility of the branch 3477191-add-assertmetrics-expecteddata to hidden.

  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz

    goz โ†’ changed the visibility of the branch 3477191-add-performancetesttraitassertmetrics-so to hidden.

  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz

    The two first branches have been hidden.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    Wow sorry I closed those MRs and meant to review but got side tracked.

    Closing the 2 hidden MRs and verified no open threads on them.
    MR appears to line up with issue summary.

    Assuming there are existing performance tests that need to be updated to use this new approach. Could least 1 be converted and follow up opened for others.

    Rest LGTM.

  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz

    I have updated previous asserts to replace them by the new assertMetrics().

    Except for core/profiles/demo_umami/tests/src/FunctionalJavascript/AssetAggregationAcrossPagesTest::testFrontAndRecipesPagesAuthenticated(), where i kept assertLessThan() for ScriptBytes because values changes at each call, and it's not the purpose of this issue to fix that.
    $this->assertLessThan(250000, $performance_data->getScriptBytes());

  • Pipeline finished with Failed
    about 1 month ago
    Total: 626s
    #339557
  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz

    Tests fail, but nothing relative to current changes.

    Anyway, the branch has to be rebased.

    Metrics values have been updated due to rebase.

  • Pipeline finished with Success
    about 1 month ago
    Total: 570s
    #339854
  • ๐Ÿ‡ซ๐Ÿ‡ทFrance goz

    Branch has been rebased and tests are green

Production build 0.71.5 2024