Service not known with OPT_IN_TEST_NEXT_MINOR

Created on 14 November 2024, 5 months ago

Problem/Motivation

It happens here https://git.drupalcode.org/project/rdf_sync/-/jobs/3356183

This project needs an additional "virtuoso" service, an RDF triplestore backend needed for testing. The service is defined in .gitlab-ci.yml. The main pipeline setup succeeds and also the OPT_IN_TEST_PREVIOUS_MAJOR jobs are OK. But when runs with OPT_IN_TEST_NEXT_MINOR somehow the service is not available.

Steps to reproduce

See the https://git.drupalcode.org/project/rdf_sync/-/pipelines/338807 pipeline

Proposed resolution

?

Remaining tasks

User interface changes

API changes

Data model changes

🐛 Bug report
Status

Active

Component

gitlab-ci

Created by

🇷🇴Romania claudiu.cristea Arad 🇷🇴

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @claudiu.cristea
  • 🇬🇧United Kingdom jonathan1055

    The phpunit 'next minor' job currently has to redefine the services, because the regular phpunit uses 'with-chrome-legacy' whereas the next minor has 'with-chrome'
    https://git.drupalcode.org/project/gitlab_templates/-/blob/main/includes...
    So your custom override is only affecting the 'current' phpunit job.

    You can probably fix this in multiple ways, either by repeating the customisation in 'next minor', or re-using the services from your 'current' job.

  • 🇪🇸Spain fjgarlin

    True! It's 100% that. We are in a weird state where some versions of core 11 use an older yarn version and some others use a newer version of yarn, and they require different chromedrivers.

    If this is backported we'd be able to remove that override and your code would work, but for now it's not possible, so yeah, for now I suggest following @jonathan1055 advice.

  • 🇷🇴Romania claudiu.cristea Arad 🇷🇴

    @jonathan1055, @fjgarlin, than you!

    You can probably fix this in multiple ways, either by repeating the customisation in 'next minor', or re-using the services from your 'current' job.

    I don't think I understand how repeat the same service definition under OPT_IN_TEST_NEXT_MINOR

  • 🇬🇧United Kingdom jonathan1055

    As it seems your tests do not need the chrome driver at all, you could try (untested)

    phpunit (next minor):
      services: !reference [phpunit, services]
    

    so that what ever you have in the 'current' phpunit services will be re-used in the 'next minor'. Or you could just duplicate your definition.

  • 🇷🇴Romania claudiu.cristea Arad 🇷🇴

    @jonathan1055, #6 works. Thank you!

  • 🇬🇧United Kingdom jonathan1055

    Good to hear that. Just looked at your MR and I am pleased that the concise first option in #6 worked for you. It means that if you ever add another service to the phpunit job, the 'next minor' will automatically stay in sync with it.

  • 🇺🇸United States cmlara

    The phpunit 'next minor' job currently has to redefine the services, because the regular phpunit uses 'with-chrome-legacy' whereas the next minor has 'with-chrome

    Did anyone ever test launching both CI images and just choosing which image is actually connected to by phpunit based on environment variables dynamically setting the container name? Presumably minimal cost ( higher base ram usage, and standby cpu usage for an idle container) for avoiding BC breaks until gitlab_templates creates a 2.x branch.

    I did raise BC concerns in #3462681: Add W3C compliant JS testing however I did not follow up as I thought it was a full swap not a partial swap of images between jobs.

  • 🇬🇧United Kingdom jonathan1055

    Did anyone ever test launching both CI images and just choosing which image is actually connected to

    I don't recall ever seeing that suggestion. It is definitely worth investigating.

  • Merge request !291#3487525 Load both chrome services → (Merged) created by jonathan1055
  • Pipeline finished with Success
    5 months ago
    Total: 53s
    #339611
  • 🇬🇧United Kingdom jonathan1055

    Here's an initial test
    https://git.drupalcode.org/project/scheduler/-/pipelines/339622

    PHPUnit jobs for 'current' and 'next minor' both passed. At the job start-up they both show:

    [service:selenium/standalone-chrome-selenium] 
    [service:drupalci/webdriver-chromedriver-chrome]
    [service:drupalci/webdriver-chromedriver-chrome] 2024-11-15T10:14:50.401430414Z ChromeDriver was started successfully.
    

    The 'next minor' has more debug during runtime, and that shows [service:selenium/standalone-chrome-selenium] is being used during the tests. I'm not sure why there is no detailed services debug during the runtime tests on 'current', maybe the drivers do it differently. But the initial testing of this one javascript test shows that this is worth further investigation.

    Now I need to test on a project that has Nightwatch tests. Any suggestions?

  • 🇪🇸Spain fjgarlin

    Oh, this is a good suggestion. Starting services that we won't need is not ideal, but it's a consequence of the changes in core. The code looks much simpler and it also means we don't need to override that part for variants.

    Downstream pipelines for decoupled_pages which has nightwatch tests: https://git.drupalcode.org/issue/gitlab_templates-3487525/-/pipelines/33... (you can always open an issue/MR there so you don't depend on me to run the downstream pipelines in this issue).

  • 🇬🇧United Kingdom jonathan1055

    Thanks for triggering the downstream pipeline. Yes I will open a draft [ignore] MR to add CI_DEBUG_SERVICES: true, but it looks promising as all three Nightwatch variants passed.

  • 🇬🇧United Kingdom jonathan1055

    Tested on https://git.drupalcode.org/project/decoupled_pages/-/merge_requests/8
    Pipeline https://git.drupalcode.org/issue/decoupled_pages-3422594/-/pipelines/340026
    All three Nightwatch jobs pass.

    Current

    Core Drupal: 11.0.7, Nightwatch 2.4.2
    DRUPAL_TEST_WEBDRIVER_HOSTNAME='localhost'
    DRUPAL_TEST_WEBDRIVER_CHROME_ARGS='--disable-dev-shm-usage --disable-gpu --headless'
    DRUPAL_TEST_WEBDRIVER_W3C=false
    DRUPAL_TEST_WEBDRIVER_PORT='9515'
    

    Previous Major

    Core Drupal: 10.3.8, Nightwatch 2.4.2
    

    Four variables same values as above.

    Next Minor

    Core Drupal: 11.1.0-dev, Nightwatch 3.7.0
    DRUPAL_TEST_WEBDRIVER_HOSTNAME='selenium'
    DRUPAL_TEST_WEBDRIVER_CHROME_ARGS='--disable-dev-shm-usage --disable-gpu --headless'
    DRUPAL_TEST_WEBDRIVER_W3C=true
    DRUPAL_TEST_WEBDRIVER_PORT='4444'
    

    I did not know (and did not even think we could) load both services, I thought there would be some clash or conflict. This is such a simple solution, and we already have the environment variables that make it work. It clashes with, but will simplify 📌 Update to new driver when D11 "current" uses Nightwatch 3 Postponed

    The documentation pages for PHPUnit and Nightwatch will need updating if we are going to do this change.

  • 🇪🇸Spain fjgarlin

    Yeah, there is no clash because the names are different, so that made it easier. We didn't think of this because to be honest, it doesn't make too much sense to start several services knowing that you'll only pick up one, but on the other hand, having core 11.0.7 and 11.1.0-dev using a different one was something we didn't anticipate, so it's a good workaround/fix.

    Also, that means that we are overriding one less part per variant, so the "inheritance/extension" will work better, not needing to re-define new sections per variant ("services" in this case) when they differ from the norm.

    From my point of view, this is RTBC. I'm only curious about the performance impact, how much longer, if any, takes job creation as we are now adding one more service to the mix. It'd be great to have a before/after. I don't think that the impact will be big, but we'd need to take that into account somehow. I guess that enabling CI_DEBUG_SERVICES can give you useful info to know how long it takes to create the service.

  • 🇬🇧United Kingdom jonathan1055

    The tests already have CI_DEBUG_SERVICES: 1 and there are lots of timestamps in the log. They are not all in chronological order, but I will extract the times and try to see what it shows us, if anything.

  • 🇬🇧United Kingdom jonathan1055

    If I'm reading the logs correctly, the new selenium service takes two seconds to load. The earliest and latest timestamps, for example, are

    [service:selenium/standalone-chrome-selenium] 2024-11-15T16:42:14
    [service:selenium/standalone-chrome-selenium] 2024-11-15T16:42:16
    

    The legacy driver is much quicker, mostly loading within one second

    [service:drupalci/webdriver-chromedriver-chrome] 2024-11-15T16:42:14.27
    [service:drupalci/webdriver-chromedriver-chrome] 2024-11-15T16:42:14.28
    
  • 🇪🇸Spain fjgarlin

    Thanks for checking.

    Downstream pipelines: https://git.drupalcode.org/issue/gitlab_templates-3487525/-/pipelines/33...

    The code looks good and there are tests on #12, #15 and the downstream pipelines. Setting to RTBC.

  • 🇺🇸United States cmlara

    Technically your not seeing all the factors in those jobs, such as does the image pull take longer due to bandwidth constraints, disk io, etc.

    All that is monitoring is time between the two first container log messages which occur after the pod has started which is the most variable aspect.

    Trying to performance check pod startup without raw access to the CI environment on dedicated runners where all environmental factors can be controlled is nearly pointless.

    The jobs run with a minimum of 2cpu’s with (to my knowledge) burst authority to max CPU and no limitations on memory consumption. In other words extra containers have almost zero impact unless the node is near constraints from every other job pushing to limits.

    To my knowledge the images are downloaded in cached mode not always-pull mode meaning other jobs running on the same runner will impact time to first log message and bandwidth usage. Image download is the biggest factor on a cold node, however even then I’ve generally seen remote servers have speed to spare and you need parallel downloads to max out the pipes if we ever get to the point we are worrying about image download times then infra has hit the point where they should have a local shared cache in the same VPC or at least same AWS region as the runners.

    TLDR: don’t worry about the performance on a change like this.

    Side note: realizing this won’t be a fix for everyone who has had to override containers if they use the documented method of expanding services by using the per container reference. In the future we may want to consider promoting using reference against testing-base services instead of the current documented suggestion of referencing each container by name. Still worth committing as it’s cleaner and I believe would set the base for doing such references changes (I’m working with using such a style over in quasar).

  • Pipeline finished with Skipped
    5 months ago
    #344371
  • 🇪🇸Spain fjgarlin

    Thanks for the extra checks and explanations. Merged!

  • 🇬🇧United Kingdom jonathan1055

    I just discovered that I had one extra commit locally for this MR. It was only to re-write the comments to explain what to do in MR257 on 📌 Update to new driver when D11 "current" uses Nightwatch 3 Postponed

    Here's a patch so you can see what I had. Maybe this could be committed? I can open a new MR. Given that the changes may be postponed for a while, and I have already done the thinking about it and written the details, it would be good to have accurate @todo comments in the code ready.

  • 🇪🇸Spain fjgarlin

    Yup, no problem. I'll commit the patch. Thanks!

  • 🇬🇧United Kingdom jonathan1055

    Thank you, that's really helpful. Sorry I missed it before.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024