[investigation] get metrics of request to see which settings need to be changed for random failures

Created on 21 November 2023, 7 months ago
Updated 1 December 2023, 7 months ago

Problem/Motivation

Parent issue is self-explanatory šŸŒ± [meta] Known intermittent, random, and environment-specific test failures Active
Will use this issue to investigate and get metrics on requests to see which settings (DNS #3397749: gitlab pod DNS settings ā†’ or not) might be adjusted.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

āœØ Feature request
Status

Needs work

Version

11.0 šŸ”„

Component
PHPUnitĀ  ā†’

Last updated about 10 hours ago

Created by

šŸ‡ŖšŸ‡øSpain fjgarlin

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @fjgarlin
  • šŸ‡ŖšŸ‡øSpain fjgarlin

    Findings:
    - Pods get too overloaded if many pipelines are run at the same time. We might need to reduce CPU_REQUEST on some jobs.
    - Some failures might be because the code runs too fast, not too slow as we originally thought.

    I've managed to reduce some of the random failures as I barely get some anymore. I'll be putting together an MR today to see if we can move forward with those changes.

  • Merge request !5614Fix some random tests ā†’ (Open) created by fjgarlin
  • šŸ‡ŖšŸ‡øSpain fjgarlin

    fjgarlin ā†’ changed the visibility of the branch 3403162-investigation-get-metrics to hidden.

  • šŸ‡ŖšŸ‡øSpain fjgarlin

    I'm adding screenshots and html-shots of the step right where it fails in the most common places. This should help identify some of the usual offenders.

  • šŸ‡ŖšŸ‡øSpain fjgarlin

    After many tries/fixes/debug... we are at a point where only 2 tests fail a bit more often (ManageFieldsTest and MoveBlockFormTest), but all the others are passing a bit more consistently.

    Iā€™m adding screenshots and html-shots of the moments where things fail so we can debug further, they're available as artifacts.
    - MoveBlockFormTest tries to add the same block 25 times.
    --- It usually fails on the 4th attempt. Do we need 25?
    --- Do we need that block to be the same always, can we not add the first 25 blocks that we find?

    Worst case, we can mark those two as skipped, which is better than the 11 that are being unskipped in that MR

  • Status changed to Needs review 7 months ago
  • šŸ‡ŖšŸ‡øSpain fjgarlin

    See comment #8, and also the MR https://git.drupalcode.org/project/drupal/-/merge_requests/5614/diffs

    It'd be great to know if something like this (ie: adding extra debug on fail) can be added to core.

    Whilst testing the MR, only 1 or 2 recurring tests would fail, out of the 11 initially unskipped (+3 new ones that triggered random fails).

    I am still marking these two to be skipped for now and will follow-up later.

    Marking this as "Needs review" in case adding the extra debug in known places is a good addition. I know that this code would only be temporary and should be removed in the long run, but it's the only way to capture additional information.

  • Status changed to Needs work 7 months ago
  • The Needs Review Queue Bot ā†’ tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

    This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide ā†’ to find step-by-step guides for working with issues.

Production build 0.69.0 2024