[experiment] Explore paratest to run our phpunit tests in parallel

Created on 9 August 2016, over 8 years ago
Updated 10 January 2024, about 1 year ago

Problem/Motivation

  • run-tests.sh is hard to maintain code
  • Its a wrapper around phpunit, which caused many bugs in the past

Proposed resolution

Try to run https://github.com/brianium/paratest with kernel tests and browser tests. It could be also worth trying to find out whether running unit tests in parallel would be worth for itself.

Note: This is an exploration issue.

Remaining tasks

User interface changes

API changes

Data model changes

📌 Task
Status

Needs work

Version

11.0 🔥

Component
PHPUnit 

Last updated 1 day ago

Created by

🇩🇪Germany dawehner

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇪🇸Spain fjgarlin

    I was investigating the usage of "paratest" for (initially for contrib) via GitLabCI (https://git.drupalcode.org/project/gitlab_templates).

    One of the biggest differences I saw was that we cannot specify the "--printer" option, which Drupal uses to generate the html in the "browser_output" folder. This option is currently not supported in any of the versions (currently 6.x and 7.x).

    I managed to get _some_ improvements, going from around 30 min to around 20 min to run the unit tests of the module, getting the exact same results, but as mentioned before, not having the "browser_output" folder.

    I created this issue in the "gitlab_templates": #3370952: Run phpunit tests from a single job in parallel

  • 🇦🇺Australia mstrelan

    #27 FWIW once we have #3346242: PHPUnit 10 with Drupal 10 we will not need the --printer option.

  • 🇬🇧United Kingdom catch

    Some of our test classes take anything up to 5-6 minutes to complete.

    paratest theoretically would help with that, by allowing multi-method classes to be run in parallel.

    However, for that to be an overall benefit, we'd need to still be able to affect which tests are run by which runner and which order, see 📌 Distribute @group #slow tests between test runners and mark more tests RTBC for the run-tests version of this. If we do that, we'd start the slowest methods first, so that if an individual method takes 5 minutes (and some do), it starts as early as possible.

  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    #29 sounds totally sensible!

  • 🇪🇸Spain fjgarlin

    #3374070: Experiment with concurrency package

    I investigated briefly the usage of this tool for core and gitlabci.

  • 🇨🇦Canada deviantintegral

    +1 to the idea of starting slowest tests first!

    It's been a long time since I looked at this, but I've since been working on a project where we set up functional tests with playwright, fully supporting parallel tests with separate databases. The project starts from an install profile, so it's not that different than what core does. I would like to refactor this to be public, but in lieu of sharing code I'll share some lessons we learned:

    • It turns out even for a small-sized install profile similar to standard that the bulk of test time can be taken during setup. We initially improved this by creating a database early in the CI job for all tests using the same installer setup, and doing a mysql import instead of a site install.
    • But, it turns out even that starts to be slow. While our project uses mariadb in production, we determined that everything worked fine with sqlite. Switching to that for tests turned a slow mysql import into a simple cp call to copy the pristine sqlite database into a test-specific instance. While core needs to test with a variety of database backends, perhaps tests could be marked as either requiring tests against all database types, or as not requiring any specific database backend to allow greater use of sqlite in tests.
    • Once you start getting your tests in parallel and fast, you'll start to get surprising random failures. That's probably because you'll hit various request limits in the web server config, database config (if not using sqlite), unpredictable memory use from multiple browsers adding up, and so on. We worked around this by sharding our tests across multiple runners instead of growing vertically, since the costs are nearly the same for us.
  • 🇬🇧United Kingdom catch

    @fgarlin looked at paratest in #3374070: Experiment with concurrency package and didn't seen an improvement in performance, in fact saw a performance hit.

    I had a quick play with paratest locally, only got so far because I had trouble getting it to do 'run all tests from a specific test suite in parallel', it seems to always pick up all of the tests, probably missing an argument somewhere.

    Paratest has broadly two modes:

    1. Run test classes in parallel (this is what run-tests.sh does)
    2. Run test methods in parallel

    Run tests methods in parallel is interesting because it would mean an 18-method functional test isn't inherently slow any more, and those are our longest running tests at the moment. However, I think we should make an explicit decision for now, that we won't rely on per-method parallelism for optimisation of core tests, for the following reasons:

    1. Our tests fail with per-method parallelism according to @fgarlin's investigations, so we'd have to do a lot of work just to get compatible at all
    2. per-method parallelism at the runner level precludes (or at least makes a lot more complicated) implementing other optimisations like 📌 Improve performance of functional tests by caching Drupal installations Needs work
    3. We'd then be locked into paratest

    If we decide to defer that for now, we can optimise for per-class parallelism for now, without waiting on something that doesn't exist yet, and see how far we get.

    We could still switch to paratest for per-class concurrency if we want to move away from run-tests.sh, but then it's a case of exchanging somewhat 1-1 not a completely different approach.

    Note also that HEAD adds support for gitlab parallel runners on top of concurrency, which is the difference between 30 minute and 17 minute test runs. And and 📌 Distribute @group #slow tests between test runners and mark more tests RTBC adds support for distributing known-slow tests between runners at the start of each job which is the difference between 17 minute and 11 minute test runs. We'd need to replicate those features via paratest somehow in order to keep the same performance.

  • 🇳🇱Netherlands bbrala Netherlands

    I'd say we shouldn't use paratest. Since it is not a drop in replacement this will probably surface weird issues in the testsuite. Combined with blocking other innovations like the install caching it seems like we should not do this in the foreseeable future. There is enough going on in testland right now.

  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    Our tests fail with per-method parallelism according to @fgarlin's investigations, so we'd have to do a lot of work just to get compatible at all

    Why?

    And: is this true for all types of tests?

    run-tests.sh is notoriously difficult to modify, update and, frankly, use (for example: for testing MySQL one needs to pass in --sqlite … --dburl … 😳 — being clarified at 🐛 clarify db settings for run-tests.sh example command Needs work ).

    Wouldn't a good middle ground be to start adopting paratest for those types of test where there's no concurrency issues? IOW: wouldn't it be a good middle ground to have run-tests.sh start using paratest itself for types of tests where it makes sense, to allow us to gradually move away from it?

    We'd need to replicate those features via paratest somehow in order to keep the same performance.

    Could you explain (also for posterity sake) how those (awesome!) performance benefits are easy to achieve with run-tests.sh but difficult using paratest? 🙏

  • 🇳🇱Netherlands bbrala Netherlands

    Well, the issue is pretty much this:

    https://github.com/paratestphp/paratest/issues/532

    Paratest is erunning in parallel based on a single command, but does not allow chunking. So a lot of the control you have right now through run-tests.sh (using offset+limit) and also ordering tests based on group (#slow first) is something that is not trivial if we add paratest to the mix.

    To clarify:
    There are 2 kinds of concurrency on gitlab + drupal tests right now.

    PARALLEL in Gitlab: running commands on multiple runners.
    CONCURRENCY in run-test: running multiple processes on a single runner in gitlab.

    This makes this discussion a little harder. We are currently running multiple runners in PARALLEL while running tests CONCURRENT on those runners.

    We have to many dials right now imo, but that is a different discussion i think :)

  • 🇺🇸United States moshe weitzman Boston, MA

    Thanks for giving us some better vocab. IMO, virtually no contrib module needs CONCURRENCY. Thats why I favor straight phpunit for the default implementation. Core and a few contrib projects can override from there.

  • 🇬🇧United Kingdom catch

    run-tests.sh is notoriously difficult to modify, update and, frankly, use

    Generally only gitlab, or people working on gitlab integration, would use run-tests.sh - I always use phpunit locally because I can never get the cli arguments right, better to have phpunit.xml and then use phpunit with no arguments at all.

    Wouldn't a good middle ground be to start adopting paratest for those types of test where there's no concurrency issues?

    From gitlab experimentation in 📌 [PP-2] Speed up gitlab ci runs Postponed parallel jobs make a massive different for functional, functional javascript, and kernel tests. This is the difference between 10-15 minute test runs vs. 30-35 minute test runs. What we're doing is running ~180 functional tests at a time.

    We may end up using parallel jobs even more assuming we downsize the runners from 32 cpu AWS instances too. I think I have things (with about 10 MRs applied) where we can tweak concurrency vs parallelism vs. CPUs and have a pretty good idea of the effects, however like @bbrala says there are so many variables it gets very complicated/overwhelming very fast. If we only use paratest for unit and build tests, we haven't really gained anything except a new dependency (and possibly more edge case failures) though.

    Thanks for giving us some better vocab. IMO, virtually no contrib module needs CONCURRENCY

    I'm not sure about this. If a contrib module has one unit test, one kernel test, and one functional test, then concurrency gets them nothing at all because the separate jobs by test type run each suite in parallel.

    However, if they have 10 functional tests, running them at 10 concurrency will get the results back approximately ten times as fast. This still might only be a minute or two faster for a developer, which is not a huge difference, but for other test runs, if the runner is only needed for say 2 minutes instead of 4 minutes, we can run double the number of contrib tests with the same overall amount of runner time. But this might not turn out to be an issue if we only allocate say 2 CPUs to a runner and eight or sixteen of them can co-exist on an instance. Right now though we'd be running 10 tests in serial on a 32 cpu AWS instance which seems overkill.

    Wouldn't a good middle ground be to start adopting paratest for those types of test where there's no concurrency issues?

    IMO our best option for maintainability is dropping the --group and etc. arguments from run-tests.sh and slim it down to only what we need for gitlab-ci. That gives us less run-tests.sh code to maintain and makes it clearer what we'd need to port to any replacement.

    Could you explain (also for posterity sake) how those (awesome!) performance benefits are easy to achieve with run-tests.sh but difficult using paratest?

    This is the relevant bits from run-tests.sh

    
      CI_PARALLEL_NODE_INDEX: $CI_NODE_INDEX
      CI_PARALLEL_NODE_TOTAL: $CI_NODE_TOTAL
    
    [....]
      if ((int) $args['ci-parallel-node-total'] > 1) {
        $slow_tests_per_job = ceil(count($slow_tests) / $args['ci-parallel-node-total']);
        $tests_per_job = ceil(count($test_list) / $args['ci-parallel-node-total']);
        $test_list = array_merge(array_slice($slow_tests, ($args['ci-parallel-node-index'] -1) * $slow_tests_per_job, $slow_tests_per_job),       array_slice($test_list, ($args['ci-parallel-node-index'] - 1) * $tests_per_job, $tests_per_job));
      }
    

    What this does:
    1. .gitlab-ci/pipeline.yml sets certain jobs to run in parallel
    2. When jobs run in parallel, gitlab sets index and total environment variables so you know you're being run in parallel and which of the parallel jobs you are.
    3. In run-tests.sh we check those variables, and then slice up the jobs based on the total and index. Additionally, via @group #slow we've got a list of the slower/est tests and we distribute those evenly to run at the very beginning of each parallel job.

    paratest doesn't understand gitlab index and total. So far the only idea I had to potentially recreate that would be to embed those directly into test discovery itself (so that if the environment variables are set, we only return the slice, and also interleave the corresponding slow tests at the beginning of the list) - that might be doable but it's not a 1-1 swap.

  • 🇺🇸United States mile23 Seattle, WA

    run-tests.sh is notoriously difficult to modify, update and, frankly, use

    See this meta: #2626986: [meta] Improvements to run-tests.sh

    I generally support @catch's idea of culling out stuff from run-tests.sh that isn't in use by CI processes. But I also think it's not a huge lift to make it more maintainable, minimized or not, as per #2624926: Refactor run-tests.sh for Console component. (which won't be getting a re-roll by me until there's some consensus). It should be even easier after the work for #3057420: [meta] How to deprecate Simpletest with minimal disruption

    The question here is not that there are tradeoffs between paratest and run-tests.sh, and which will we choose? The problem is that no one's really maintaining run-tests.sh, for fearful reasons rather than engineering ones.

    Put it into a component, and for realsies, run-tests could be the coolest thing we export to the PHP world at large, but instead it languishes in a fake shell script.

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    Last time I tried to use paratest on a client project it did not play nice at all with Symfony deprecation reporting.

  • First commit to issue fork.
Production build 0.71.5 2024