- Issue created by @catch
- Status changed to Needs work
over 1 year ago 6:37am 12 October 2023 - 🇬🇧United Kingdom catch
Unpostponing, initial hosted otel/grafana stack is up. Going to try to get a working job first, then add the actual OTEL_COLLECTOR bit.
Starting with a manual job to test the actual gitlab integration, but will then need to figure out the scheduling bit - we might need separate env vars for daily vs hourly schedules for this I think which should be doable by adding custom variables in the schedule UI.
- last update
over 1 year ago 30,393 pass - @catch opened merge request.
- last update
over 1 year ago 30,393 pass - last update
over 1 year ago 30,393 pass - last update
over 1 year ago 30,393 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - 🇬🇧United Kingdom catch
I think this is working, but lost access to the grafana instance overnight so waiting for people to wake up...
This is only so far a proof of concept to populate the graph vs. manual runs, various @todos:
1. We should run this only on one database to start with, probably MySQL 8. Later on we could add database type to traces and add it as a selector in the dashboard similar to version. So need to hard-code that somehow, possibly it should live in the parent job/gitlab-ci.yml
2. We need an hourly scheduled pipeline, that only runs this job, this means adding custom variables to the daily pipelines so that daily jobs can add checks for those in a rules: section to not also run hourly.
3. The OTEL_COLLECTOR endpoint definition can go in a custom variable in the scheduled job, and then we can look for that and export it as an environment variable.
- last update
over 1 year ago 30,394 pass - @catch opened merge request.
- last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,390 pass, 2 fail - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - Status changed to Needs review
over 1 year ago 3:44pm 12 October 2023 - 🇬🇧United Kingdom catch
Ready for review more or less.
- export OTEL_COLLECTOR=http://otel-collector.tag1demo.tag1.io/v1/traces
This should be set in the gitlab schedule, but for that to work, we need 1. a schedule and 2. for this to be committed, so a bit chicken-and-egg.
- if: $_TARGET_DB == "mysql-8" && $_TARGET_PHP == "8.2" when: manual
This should be moved to a follow-up, which will depend on getting the branch name from gitlab, and the otel/grafana dashboard actually being stable, but for now it's the only way to run the pipeline to test it, without a schedule being configured, so again, chicken and egg.
There is probably a better way to tie this to the default database and php combination than hard-coding the values, but I don't know what that is yet.
- last update
over 1 year ago 30,387 pass, 2 fail - last update
over 1 year ago 30,387 pass, 2 fail - last update
over 1 year ago 30,387 pass, 2 fail - last update
over 1 year ago 30,387 pass, 2 fail - last update
over 1 year ago 30,387 pass, 2 fail - last update
over 1 year ago 30,387 pass, 4 fail - last update
over 1 year ago 30,394 pass - last update
over 1 year ago 30,394 pass - 🇬🇧United Kingdom catch
This is working properly now, you can see the graphs and drill down into traces on http://grafana.tag1demo.tag1.io/d/teMVIdjVz/umami?orgId=1&refresh=30s&fr...
- last update
over 1 year ago 30,397 pass - last update
over 1 year ago 30,397 pass - 🇬🇧United Kingdom catch
There's a chicken and egg with this issue and core commits, because we can't validate scheduled pipeline changes until after commit since the variables aren't set on MRs.
To work around this, I created a Drupal core fork on d.o, and configured a schedule there the same way it would be configured for core.
Tests are now running on that fork for both the 11.x and 10.2.x branches every 20 minutes, rule changes means that only composer + yarn + performance tests run on that schedule, and everything except performance tests run on commit.
https://git.drupalcode.org/project/core_performance_testbed/-/pipelines
Because this also includes the OTEL_COLLECTOR change, it's feeding data into the graphs too, so you get a better idea what they'll look like:
http://grafana.tag1demo.tag1.io/d/teMVIdjVz/umami?orgId=1&refresh=30s&fr...
This means I've removed the manual job from this MR, since while it would be nice to add that eventually, it'll be a follow-up once we've got branches and commit hashes into traces from gitlab (which depends on this MR...).
- last update
over 1 year ago 30,397 pass - last update
over 1 year ago 30,397 pass - last update
over 1 year ago 30,397 pass - last update
over 1 year ago 30,397 pass - last update
over 1 year ago 30,397 pass - last update
over 1 year ago 30,397 pass - last update
over 1 year ago 30,397 pass - 🇬🇧United Kingdom catch
Self review.
There are some fixes to PerformanceTestTrait in here.
The change from$_ENV
togetenv()
- this is because $_ENV can't see the cli argument passed in to phpunit.Checking we have a first contentful paint before exiting polling, and ensuring at least one iteration. This was to fix some bad data (about 2 or 3 requests) I saw in traces from runs on gitlab which I haven't seen locally even once - we were logging a largestContentfulPaint event without logging a firstContentfulPaint event. It's possible that when they are the same browser event (the first is also the largest), that chrome can put LCP in the logs before FCP, then if we stop polling at that moment, we never get the corresponding FCP event.
I will probably open a new issue to add some assertions in here (like is lcp greater or equal to fcp) so we can hard fail if we get bad data which should help to pick things up and also stop chromedriver log race conditions (or quasi race conditions) from adding bad data to the graphs. However, also don't want to introduce new random test failures to core, so would rather see how things go once this is actually up and running before doing that.
Was also debating whether to split the PerformanceTestTrait fixes here out to a separate issue, but these are massively intertwined at the moment - we need the gitlab runs to push data to Grafana, which will in turn show us how reliable the data on Grafana is, which will likely result in making the data collection more robust, which will then allow further improvements to Grafana (like potentially show mean requests per hour in the graph instead of per-data point).
I've added a skipped deprecation for the @return thing from Symfony which is in-line with what we're doing elsewhere.
The last thing is that currently core's daily test runs use "1" and the new performance jobs I added on my fork use "TRUE". Gitlab rules only accept string comparisons, no int or bool, so whatever we use it has to be a string. I do not have a strong preference, but we might want to pick one.
Have also set up a fake daily test job on the fork (it actually runs monthly, but you can press 'play' to simulate), to ensure that the daily tests will continue to work with these pipeline changes.
- last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago 30,397 pass - 🇬🇧United Kingdom catch
The graphs were showing an issue I thought I had found a workaround for testing locally with the original MR.
The chrome performance log uses (at least) four different timestamp formats, two of which count from OS boot, and appear to start a few milliseconds away from each other.
This meant it was incredibly difficult to compare entries from the network vs. timeline logs and I had to do a rosetta stone kludge with a common event to calculate the offset from the two supposedly identical timestamps. The symptom was sometimes the request timestamp would be 'after' the response timestamp, giving you negative length spans, which Open Telemetry doesn't understand and Grafana graphs as lasting hundreds of years, looks funny but not very useful. However, I was getting the original symptom back when running the tests from gitlab, suggesting my workaround was not complete, might be an OS difference, or might be I just didn't have enough sample runs locally, or the tests were slow enough on my laptop it didn't hit negative span lengths.
When looking through to see if there was a better workaround, I found entries in the timeline log itself for the initial request and response, we are still dealing with four almost undocumented timestamp formats and some offset calculation, but I think it is at least four different kinds of fruit instead of a mixture of real and wooden fruit (apples to apples doesn't cover the improved situation). With this change, no weird negative timestamps and all the timings look about what I'd expect. I have a bit of a question mark over the node/1 hot cache being so much slower than the front page one, but given every is ~10% consistent that looks like something to investigate manually in a follow-up and not a symptom of the performance log parsing to me.
I still would prefer to commit this in one go, because the graphs validate the logic changes and the logic changes produce the graphs, but happy to split the test vs. YAML changes if it helps this land easier.
- Status changed to Needs work
about 1 year ago 5:38pm 23 October 2023 - 🇺🇸United States smustgrave
So confirmed the pipeline runs on intervals by checking https://git.drupalcode.org/project/core_performance_testbed/-/pipeline_s...
See the results get sent to http://grafana.tag1demo.tag1.io/d/teMVIdjVz/umami?orgId=1&refresh=30s
But think something is off as the metrics appear to be empty.
- Status changed to Needs review
about 1 year ago 8:06pm 23 October 2023 - 🇬🇧United Kingdom catch
It's because I reduced the frequency of the pipeline from every 20 minutes to one hour. Bumped it back to 20 minutes and the graphs are populated again.
- Status changed to RTBC
about 1 year ago 8:10pm 23 October 2023 - last update
about 1 year ago 30,427 pass - last update
about 1 year ago 30,438 pass - last update
about 1 year ago 30,455 pass, 1 fail - Status changed to Needs review
about 1 year ago 7:29am 28 October 2023 - 🇬🇧United Kingdom catch
Found a way to keep the graphs populated without running the pipeline every 20 minutes - the pipeline now repeats the tests three times in a loop (this is enough for prometheus to think it's got enough for a histogram is my theory), and this appears to allow us to then run the pipeline at any frequency. This should help us not blow the DA gitlab budget.
Also detecting when the chromedriver log is funky (which seems to be about 1/30 runs) and skipping the test when that happens instead of failing hard. Back to needs review for those couple of changes.
- Status changed to RTBC
about 1 year ago 3:50pm 30 October 2023 - last update
about 1 year ago 30,481 pass - Open on Drupal.org →Environment: PHP 8.2 & MySQL 8last update
about 1 year ago Not currently mergeable. - First commit to issue fork.
- 🇬🇧United Kingdom catch
+1 to the YAML changes, reduces repetition and removes a double-negative.
- 🇬🇧United Kingdom alexpott 🇪🇺🌍
Re splitting this into trait fixes and yaml fixes. I don't think that that is worth it because as you point out - everything is going hand in hand. I think that performance testing in core is young enough to have a slightly looser scope than other core changes. Also this does not affect a running site so we should not be as cautious here.
- 🇬🇧United Kingdom alexpott 🇪🇺🌍
@catch tested the gitlab updates on the fork of core and they work as expected. Let's get this in.
Committed and pushed e242a5aad6c to 11.x and c66bb852c6d to 10.2.x. Thanks!
Backported to 10.2.x because this is not runtime code and having this in 10.2.x will allow us to compare 10.2 and 11.x (and 10.3.x)
-
alexpott →
committed e242a5aa on 11.x
Issue #3391689 by catch, alexpott, smustgrave: Add a performance tests...
-
alexpott →
committed e242a5aa on 11.x
- Status changed to Fixed
about 1 year ago 3:18pm 3 November 2023 -
alexpott →
committed c66bb852 on 10.2.x
Issue #3391689 by catch, alexpott, smustgrave: Add a performance tests...
-
alexpott →
committed c66bb852 on 10.2.x
Automatically closed - issue fixed for 2 weeks with no activity.