[policy, no patch] Better scoping for bug fix test coverage

Created on 14 May 2018, over 6 years ago
Updated 6 May 2024, 6 months ago

Problem/Motivation

There is more discussion about this in Drupal Slack, #contribute. The links to those threads are in #90.

We have broadly three types of issues committed against core:

1. Features, introducing new functionality.
2. Tasks, changing or refactoring existing functionality
3. Bug fixes, fixing a bug with existing functionality, usually but not always without significant changes to behaviour.

For features we always require test coverage, although in general the bigger the feature being added, the harder it can be to determine whether the test coverage is sufficient.

For tasks the amount of new or changed test coverage depends on the kind of change being made. However refactoring that doesn't break tests or requires only minor changes to tests is generally assumed to have coverage.

For bug fixes we assume that if there's a bug, it must always be possible to write a test that fails without the bug fix being applied, so more or less every bug fix requires an accompanying test - there are rare occasions where multiple issues are being worked on at once, and test coverage is added in dependent issues.

While it's not the intention, this means that the test requirements for fixing a bug, are more strict than they are for refactoring or adding new features.

The types of bug we have are obviously very disparate, from single character typos in exception message placeholders to upgrade path data loss and fatal errors.

A test written for a specific bug fix will ensure it doesn't regress over time, and if the code path was otherwise completely untested it may prevent other regressions in the future, but in general it's less likely that a bug fix causes a regression than a task or new feature.

Bug fixes are sometimes complex issues where multiple code paths need testing, but sometimes they're straightforward developer mistakes that weren't caught in review, and we'd never have thought to have added explicit test coverage if they'd been spotted during patch review instead of after commit. See case studies below.

Bug fixes are also often the place where new or irregular core contributors interact with the core queue most, because they're contributing back fixes from client sites (where by definition they're manually testing the bugfixes they contribute, and will often have them applied via composer.patches.json). In general there is a high incentive to make improvements to the bug fix itself, but often neither knowledge nor incentive to write comprehensive test coverage - if you have the patch in composer.patches.json it's already fixed for you and apart from having to re-roll every year or so that might be the last time you think about it. Even if you do 8 re-rolls of the same 1-5 line bugfix, it will take you less time than 8 hours writing an rewriting a 50-line test and responding to reviews - and you might get reviews after you've already moved on from the project that needed the bug fix.

On top of this, it can be very hard to know where to add test coverage in core due to the sheer number of tests that we have, and unclear responsibilities and boundaries for when to add new assertions to an existing test, new methods on an existing test class, new test classes altogether etc. This represents a high barrier to contribution where there is an easy opt-out and quite significant consequences to the rest of the community for those issues not being fixed (possibly not for the specific issue, but in aggregate across hundreds of small bugfixes).

The is a further consequence that often the tests we require for small bugfixes aren't well scoped at all - they're one-off tests for very specific developer mistakes or simulating very specific contrib or custom code path interaction edge cases, not well structured generic tests of functionality (like the JSON:API/REST coverage) or high-impact coverage like upgrade path tests. By being extremely specific, they don't necessarily provide good coverage of the area they're testing, or they might be very tied to implementation and result in maintenance burden over time. This often leaves us with a choice of committing poor quality tests to get the bug fix applied, or ever-expanding the scope of a tiny issue to introduce better test coverage it was never the issue's fault didn't exist in the first place.

And with phpstan, test coverage is sometimes not the best cure anyway - we can look for general types of developer mistakes across core and contrib, vs. testing one specific case of it we already fixed. This is only just starting, but it gives us better options than we had in 2010.

Case study 1

#3223302: Changed Method label() in Taxonomy Terms results in Field/Name Changes is a still open issue, with a one-line bug fix, and ~100 lines of test coverage. The bug was opened in July 2021, a merge request was opened in November 2021, it was marked RTBC in January 2022, it was marked needs work in April 2022 for one change to the bugfix, and to add test coverage, tests were added, it was then marked needs work again for documentation issues with the test coverage.

The test added ensures that the specific developer mistake (calling ::label() in a getEntityLabelFieldValue() method) won't happen again for taxonomy terms. However, it does not add test coverage for other entity types in core - this would be out of scope for that issue, even though they're equally prone to a similar regression being introduced. The bug can't be reproduced with just core, you need contrib or custom code altering the term label to run into it.

Further, it's not even clear that test coverage is the best way to stop this happening again, the most likely cause would be a new entity type copying and pasting this pattern (in contrib or core) and making the same mistake, and for that phpstan seems more appropriate.

So to fix a one-line bug, we're adding a 100-line test that doesn't address the issue for other entity types or stop new entity types making the same mistake, and the bug still isn't fixed in any version of core.

My suggestion in this case would be to open a new issue 'Document and/or add test coverage/code style rules to ensure that entity label field getters don't call ::label()), and just commit the one-liner, but our current policy works against that kind of solution because it would be committing the bugfix 'without tests'.

Case study 2

#2561639: Wrong placeholder for exception shown when new fields are created during module installation fixes a typo in an exception message resulting in a placeholder not being replaced. It was opened in September 2015 with a patch. Marked needs work for tests in November 2015. Tests were written in the same month. It has gone between needs work, needs review, and RTBC for seven years due to various issues with the test coverage. It was eventually closed as a duplicate because the issue was fixed in #3258969: Wrong argument for @message in ModuleInstaller::install call to watchdog_exception , and committed without tests, meaning that all the effort on the original issue was wasted. And of course in the meantime the bug was still occurring.

Heuristics

One risk of adopting this approach is it increases the likelihood of discussions on issues about whether they need test coverage or not, which has the potential to delay commits and burn people out too. To try to prevent this, we can start with a reasonably strict set of heuristics to apply to issues, unless the answer to the majority of these is 'yes' or there is some other specific reason not to include test coverage, we'd require tests by default. see below in the proposed text.

Original report:
Writing tests is a good approach. It helps to reduce the number of bugs as we add new features. However for patches that do not bring any new functionality but fix bugs in existing features requiring tests is a destructive policy because it slows down the process of fixing bugs in Drupal core.
It is a quite typical case, when a patch for a bug is submitted and even reviewed but then it gets declined by core committer because of lack of tests. After this the issue may get stuck for a very long time (months/years). And there is a great chance the patch will never be commited. Furthermore it also happens that a fix that was already commited is reverted because of this testing policy. Which is kind of crazy because reverting a commit that fixes a bug is like deliberately commiting this bug.

Another point here is that writing tests may require different level of skills and area of expertise than fixing a bug. Drupal testing sub-system is rather complicated. At this moment it consists of 5 (?) different types of tests. On the other hand the patches for many bugs are trivial and could be processed in novice issues.

Proposed resolution

Documentation changes

1. Upload a test case that fails

Details

Bug fixes should be accompanied by changes to a test (either modifying an existing test case or adding a new one) that demonstrate the bug. This can be shown by uploading a patch with only the test changes, which Drupal.org's automated testing system should show as a fail, alongside a patch containing both the tests and the bug fix itself. If a bug fix is not accompanied by tests, there should be an explanation in the issue for why no tests are needed (usually only for trivial bugs where testing would be convoluted) before it is marked RTBC. Patches that need tests written can have the Needs tests [637 issues] tag added. If the issue might affect specific environments differently (such as a change affecting how database queries are built), the test-only patch should be tested against all supported environments in the automated testing infrastructure.

Details

Bug fixes should be accompanied by changes to a test (either modifying an existing test case or adding a new one) that demonstrate the bug. This can be shown by running the 'tests only' job in the gitlab merge request pipeline interface. If a bug fix is not accompanied by tests, there should be an explanation in the issue for why no tests are needed (usually only for trivial bugs where testing would be convoluted) before it is marked RTBC. Patches that need tests written can have the Needs tests [637 issues] tag added. If the issue might affect specific environments differently (such as a change affecting how database queries are built), the test-only patch should be tested against all supported environments in the automated testing infrastructure.
Add the following to the Upload a test case that fails in the details section, below the existing paragraph.

An automated test for a bug fix may not be needed if the following are true and if the majority of answers to the questions below is 'Yes'.

  1. The issue has clear ‘Steps to reproduce’ in the Issue Summary.
  2. The fix is 'trivial' with small, easy to understand changes.
  1. Is the fix is easy to verify by manual testing?
  2. Is the fix in self-contained/@internal code where we expect minimal interaction with contrib? Examples are plugins, controllers etc.
  3. Is the fix achieved without adding new, untested, code paths?
  4. Is an explicit 'regression' test needed?
  5. Is it easy for someone who did not work on the original bug report to add the test coverage in a followup issue?
  6. Does the issue expose a general lack of test coverage for the specific subsystem? If so, is it better to add generic test coverage for that subsystem in a separate issue?
  7. If this fix is committed without test coverage but then later regresses, is the impact likely to be minimal or at least no worse than leaving the bug unfixed?

Tests may still be required at the discretion of core committers.

Remaining tasks

framework manager review

Done. Update Testing core gate
Done. Create a 'Needs no tests' tag that was suggested in #93, #104.
Done. Follow up to document when an update path test is needed See #100. Followup 🌱 Document when update tests are needed Active .
Done. A followup has been made for the suggestions about testing 🌱 [policy, no patch] Improving the test suite Active

🌱 Plan
Status

Fixed

Version

11.0 🔥

Component
Other 

Last updated 37 minutes ago

Created by

🇷🇺Russia Chi

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇬🇧United Kingdom catch

    Another example: 🐛 Entity storage exception during module install missing @message parameter in watchdog_exception() call. Fixed for that one it was RTBC with test coverage, I committed only the fix and dropped the tests.

  • 🇳🇿New Zealand quietone

    The change proposed here would go into the Testing core gate. So, I have added suggested text for that change to the Issue Summary. That includes simplifing the wording of the 9 heuristic questions and re-ordering them to put the simplest ones to answer first (I think).

  • 🇳🇿New Zealand quietone

    Updating the title to meet the ' Special issue titles '.

    Adding a date to set a defined review period because this affects the Testing Core Gate. Any points not agreed to in that time frame will be moved to a follow up issue. The review time of three weeks is experimental. In another organization I was in, a policy was open for review and feedback for one month. But this issue is not a full review of the policy, it is about adding a new section specific for Bug Reports. And since this issue already has two rounds of discussion and a clear proposal, two weeks seems sufficient.

  • 🇳🇿New Zealand quietone

    Fix review date in the IS and add item to test the heuristics.

  • 🇺🇸United States smustgrave

    Not sure how I feel about the idea of skipping tests. Definitely can see the argument but could be a dangerous can of worms where people start pushing for more changes without tests.

    If this were something we did I think we should incorporate
    a) A crystal clear list of what qualifies for allowing to skip tests
    b) I think 2 sign offs from either framework manager or product managers.

  • 🇸🇰Slovakia poker10

    I would say that we are going to offer possibility to skip tests to many bug fixes, if we use this wording:

    If the answer to the the majority of the following questions is 'no' an automated test is required.

    Personally I would tighten this to at least 2/3 of "YES" answers needed to allow skipping tests. Currently the Testing core gate allows to skip tests on some specific trivial fixes, see #39. But I see the added value of this policy change by providing explicit questions, so that all contributors would know whether the tests are needed or not.

    Looking at this as a provisional committer, I would say that for me it is a way better/safer to have a failing test as a proof, than going thru steps to reproduce and trying to "decipher" what the reporter meant. Certainly there are bug fixes, where the fix is trivial, but the steps to reproduce are not and are tricky.

    So beside the change in ratio I have proposed, I think we should define which questions should have mandatory "YES" to allow tests to be skipped. For example the questions 1, 2 and 3. I cannot imagine how the process will be better if there will be "NO" answer to the question no. 1 (so the issue summary will not be clear / complete), but the issue will still qualify for tests to be skipped, so there will be no failing test. Very risky to me.

  • 🇺🇸United States neclimdul Houston, TX

    I'm definitely have concerns about loosening testing too much. The point of testing is to prove that a bug doesn't exist so IMO its pretty important that the majority of bug fixes have a test to prove they have a fix.

    There are definitely cases we should allow bypassing patches but they should be pretty exceptional. We don't have a framework for testing it, it requires elaborate conditions or impractical environments, etc.

    Talking in slack, it came up that a lot of features don't have tests so simple bug fixes require and entire boiler plating to add tests. I was running into this so many times I spent a large part of the d8 release cycle just picking subsystems and writing unit tests. But lets be real, if we don't add the test, its just going to get an follow up that no-one pays attention to and never gets committed.

    Looking directly at the current heuristics, I don't really understand a lot of them. Others seem like things that should _always_ have tests so I don't understand how they would count toward bypassing testing.

  • 🇬🇧United Kingdom catch

    @neclimdul did you look through case studies 1 and 2? And also the example from #47?

  • 🇷🇺Russia Chi

    I'm definitely have concerns about loosening testing too much.

    We can't loose something that we don't have. If we block a bug fix with "Needs tests" tag the total number of tests in Drupal core will not increase.

  • 🇷🇺Russia Chi

    So far the main argument in favor of updating the policy is a number of efforts sometimes needed to create a test for simple bug fixes. This has blocked a huge number of bug fixes from being committed.

    However, that's not the only reason. Regression tests are hard to design well because they have a limited scope. Imagine a case when you need to write a single test for a component that does not have any tests at all. Even a simple test quite often requires creating a base class, helper modules, fixtures, etc. Such work would be definitely out of scope of a bug report.

    The proposed heuristics look quite complex. I am afraid they will complicate the issue workflow a big deal. What if we just replace the "Needs tests" tag with "Needs test follow-up"? So any bug fix that don't have a test should provide a link to a follow-up issue.

    The only exception when "Needs tests" is really needed is the case when a bug is hard to reproduce.

  • 🇺🇸United States smustgrave

    See what you mean about when tests don't exist at all, so adding a quick assertion isn't possible.

    But if we pushed tests to follow-ups what are the odds they'll get worked on?

  • 🇷🇺Russia Chi

    Follow-ups may have a wider scope. Multiple bug reports can reference a single test issue.

  • 🇸🇰Slovakia poker10

    This has blocked a huge number of bug fixes from being committed.

    But how many new bugs these fixes could cause, if they would have been committed without tests? I have seen many D7 issues where a trivial bugfix patch was ready, but eventually it was found that it was not fixing the issue mentioned in the IS. And only the test-only patch discovered that.

    Not to mention what is more efficient/more safe approach - set-up a test site, trying to simulate steps to reproduce vs check a failing test where the exact steps to reproduce the bug are present? Yes, ideally you want both, but without a test it is a way worse.

    The only exception when "Needs tests" is really needed is the case when a bug is hard to reproduce.

    I disagree and would completely turn it around. Tests should be needed everytime, except when answers to selected questions (not majority) from heuristics are YES (because I cannot imagine both "NO" answer to the question no. 1 + skipped tests, for example). Tests are a guarantee that the same bug will not appear in the future. If we decide to skip tests in a decent number of bug fixes, then we can easily end up with more bugs that we have now, because of regressions.

    But if we pushed tests to follow-ups what are the odds they'll get worked on?

    +1. I would say these issues are not so attractive. Even if they can have a wider scope.

    I see the point of skipping tests when adding tests would require very big effort and consume a lot of time, but these cases should be rare.

  • 🇷🇺Russia Chi

    But how many new bugs these fixes could cause, if they would have been committed without tests?

    Exact same amount as with tests. Regression tests have very limited scope. They must not test anything besides the bug they created for. So that they won't help on catching any aside bugs produced by the fix.

  • 🇺🇸United States neclimdul Houston, TX

    First, I love this discussion. I've scrapped an entire comment I was thinking about because @smustgrave nailed it. To expand though, the bugs exist because there wasn't a test. The bug almost _is_ not having a test so skipping the test means we didn't really fix the bug. Also I 100% agree the follow will just languish and we'll revert of re-introduce the bug. Chi is also hitting a lot of the other points I'm thinking about though maybe from a different perspective so I'll talk more about that.

    I've been thinking about this all day in response to catch's comment because I had in fact read the case studies and they seem like issues that definitely need patches to me. So the implication that they would change my mind(2nd time since they came up in slack) made me take a step back. Clearly we're looking at the problem different and I needed to do some soul searching.

    So... here goes.

    I'd been starting to resolve on a conclusion and reading Chi's comments I think its reinforcing what I was thinking. I think the problem at the heart of at least the case studies is that writing tests is too is hard and slow to iterate on.

    Basically, fixing a string should not require deep understanding of core subsystems and weeks or months fighting with testbot. So if it it took 2, 3, or more years to make tests pass because of that, I don't think that's a "we bypass testing" problem but a "we do testing bad" problem.

    From personal experience, writing this sort of test is several hours of digging through subsystems to understand requirements on how I could test the code, then guessing at what might pass on testbot, then submitting it to testbot and waiting an hour or two for it to complete, meaning I'll probably get feedback the next day. And that means when I do get feedback I've probably moved on to other things and because job is not working on core all day I'm probably moved on to some other task and not be able to look at it again for a week or two and...

    If that aligns with most peoples experience, and its that hard for us, then imagine what its like for someone outside the core developer community and I think we've reached the root of the problem. So if we still think testing is important(🙏), I think that means we should be focusing our efforts and making it all around easier and faster to write tests not remove requirements for when tests are needed.

    Things that stand out and have been in the back of my mind for a while are:

    • Easier testing
      • Out of the box ways to quickly create consistent testing environments with ddev, lando, et.al so people can quickly test things in a way that _will_ pass on testbot.
      • Easier ways to test from local and remote IDE's. For example, run-test serves us well but it often it leads different results drawing out test development.
    • Faster tests. I know the importance of this is controversial but its a serious burden to creating tests.
      • Push tests out of browser tests. I think a lot of people default to it because it doesn't require any understanding of core systems to have something that "works". I think we can provide test assertions, mock tools like we do for logging for key systems like entities, plugins, the installer along with documentation we can point people too.
      • make browser tests faster and easier. I know there's a long standing issue to provide mock db's etc. we should figure that out because one is slow. Many is miserable.
    • Can we export a failed test list in a way we can iterate on locally? I don't know how but its so hard to parse the current output and recreate and that should be front and center along with documentation on how to run them.

    There's probably more that could go into this but it's already a monster information dump so I'll stop there and hope for the best.

  • 🇬🇧United Kingdom catch

    @Chi

    We can't lose something that we don't have. If we block a bug fix with "Needs tests" tag the total number of tests in Drupal core will not increase.

    Yes this is my position as well. We are constantly adding test coverage to core, and this issue wouldn't change that, what it would mean is some bugfixes getting committed faster and the same amount of tests getting committed as we do currently.

    Since I added case study #2 to the issue summary, the bug was fixed (without a corresponding test) in #3258969: Wrong argument for @message in ModuleInstaller::install call to watchdog_exception , in five comments and less than one month in a duplicate issue. Has that resulted in less test coverage than leaving the other issue open for five years?

    @neclimdul I do agree with a lot of points in #70, but we already have issues like 📌 [meta] Convert some tests to Build tests Active and 📌 Convert LoadMultipleTest into a kernel test Fixed , which notably people do work on even though those issues aren't prerequisites for people's bug reports. So I think the idea that people don't work on tests unless we refuse to commit their bugfixes is not really the case. In fact I think we'd be much, much better off if our testing effort was spent on refactoring the test coverage we already have, than adding more one-off tests for incorrect placeholders in error messages, or per my example from #47 adding a test that we called trigger_error() with an arbitrary string in code that's going to be deleted in a few months, along with the test we added for it.

    There is so much refactoring we could do - from converting more functional tests to kernel tests to unit tests, to making some entity-specific tests generic to all entity types (node revision UI tests for example?), to make it possible to write upgrade path tests without having to import database dumps into specific old tags of Drupal and re-export and compress them. This would then make adding new test coverage easier in the long run, but in the meantime we have one character bugfixes that we can either commit or hold up.

  • 🇬🇧United Kingdom catch

    To expand though, the bugs exist because there wasn't a test. The bug almost _is_ not having a test so skipping the test means we didn't really fix the bug.

    So this is true in a way, and it was the motivation behind the current policy when simpletest was added to core, but I think it's missing an important component:

    We commit three categories of issue, bugs, tasks and features.

    Sometimes tasks and features are expected to come with new test coverage, but often for smaller tasks and even some features, not breaking the existing test coverage is sufficient.

    When large new features or tasks are committed, because we don't have any code coverage reports for core, we can't see what code is untested or under-tested.

    However for bug reports, we assume that more or less every line of the bug report should have test coverage, and in some cases require adding test coverage for much more than the code that's changed because it didn't previously exist.

    This means the bar for fixing bugs is consistently higher than it is for at least a subset of tasks and features. There are other things that make tasks and features hard to work on, it's not that they get an easier time overall, but it's by definition easier to introduce a bug without test coverage than to fix it.

    Also, although we revert a lot more issues than we used to, the nature of core development is that if a bug was introduced more than a few weeks ago, the chance of a revert is pretty low - because we stack issues on top of each other which can make straight reverts impossible in a lot of cases.

    Typing this out did remind me that code coverage reports on MRs would be really nice - i.e. when looking at lines of code changed, also see the code coverage report, and ideally a diff of that from the target branch.

  • 🇷🇺Russia Chi

    Removed unrelated text from the issue summary.

  • 🇺🇸United States neclimdul Houston, TX

    Sometimes tasks and features are expected to come with new test coverage, but often for smaller tasks and even some features, not breaking the existing test coverage is sufficient.

    Is there an issue to fix that? It seems like a big problem kicking this off the original feature developer that understood the code to someone that just wants to fix their problem.

    When large new features or tasks are committed, because we don't have any code coverage reports for core, we can't see what code is untested or under-tested.

    I used to run coverage reports daily but it became a big pain and was constantly breaking because run-test isn't phpunit and it broke on the regular. The bigger problem though was the heavy use of browser tests which will never trigger useful coverage. @see better testing tools outside browser tests.

    This means the bar for fixing bugs is consistently higher than it is for at least a subset of tasks and features.

    Still sounds like we found the problem upstream. The feature should have had more tests. Bugs should still need tests, its the right thing.

    the nature of core development is that if a bug was introduced more than a few weeks ago, the chance of a revert is pretty low

    You don't have to tell me... but that's a rant for a different time.

  • 🇺🇸United States neclimdul Houston, TX

    Just opened my issue dashboard to find a trivial issue that got kicked back to NW last year because I added an entire unit test with almost full coverage. Its now RTBC because someone removed the test. Maybe I'm just doing this all wrong. :(

  • 🇬🇧United Kingdom catch

    Is there an issue to fix that? It seems like a big problem kicking this off the original feature developer that understood the code to someone that just wants to fix their problem.

    I don't think there's an explicit issue. We are not usually knowingly committing stuff with gaping holes in test coverage, it's just that without code coverage reports, it's hard to review (especially with already large patches) exactly how much of the code is covered. If an issue is adding loads of new code and not much tests you can see it, when existing code is being refactored it's harder.

    Once again the chaotic nature of much of our existing test coverage makes all of this difficult, because it's hard to tell what's already covered vs. not. A lot of these issues date all the way back to the original addition of simpletest where we tried to add as much coverage as possible without the experience of organising tests, and to a ten year old, huge, untested code base. We still have some of those original functional tests in core more or less unchanged.

    I still remember writing bits of TermTest, some hunks in there were committed in 036a026433 which is from 2008, which was itself a refactor of test coverage that had been added previously, it was probably one of the first simpletests I worked on. It predates terms as entities and terms as entity reference fields. ::testTaxonomyNode might possibly have made sense at the time, but now it's doing.. what... testing that entity reference fields work? I'll open an issue to delete that method now. We've have probably installed Drupal and run that test hundreds of thousands of times, maybe millions.

    This is partly what I want to avoid adding to - we probably have dozens of tests that are not actually adding 'code coverage' to core, they're duplicating coverage in other tests. When we have to add an entirely new test to assert very small bug fixes, it's sometimes a sign that we have no coverage at all, but sometimes it's a symptom of not being able to find the appropriate test to add a couple of assertions to, even though that test exists in core somewhere. If we spent more of our time on refactoring test coverage, and less on adding new functional tests for specific regressions, we might end up in a better state overall because we'd know where to add those assertions to and duplicate less.

    A recent example where I broke something is 📌 Allow AJAX to use GET requests Fixed - the existing test coverage caught multiple bugs in iterations of the patch, and the patch updated some tests for new expectations. It also removed one assertion it shouldn't have (mea culpa) which provided a hint as to the regression. 🐛 Ajax state leaking to Views destination paths Fixed was opened two months later - fortunately it's an easy fix and came with tests (although they're FunctionalJavascript tests, but we don't have an easy way other way to test ViewsAjaxController), and it was all discovered and fixed before the 10.1.0 release candidate. If we'd found the regression a day or so after the original patch landed, we'd probably have reverted and recommitted with extra tests added.

    The original issue adding ViewsAjaxController to core was probably... views in core, which overall we have a lot more views test coverage in core now (and a lot more core APIs tested via Views) than we did before it was added, but it's also one where we were adding coverage for years-old untested code. So there is not a moment in that issue where we can point to a specific event where code was added without coverage (except perhaps me removing one assertion, or before automated testing existed), it's more of a cumulative thing over time.

    I used to run coverage reports daily but it became a big pain and was constantly breaking because run-test isn't phpunit and it broke on the regular. The bigger problem though was the heavy use of browser tests which will never trigger useful coverage.

    I think unit and kernel test coverage would give us a lot of useful information, and it would be useful to have them as separate reports (or at least somehow filterable). We could potentially never add functional test coverage reports, seem a bit pointless anyway.

  • 🇺🇸United States danflanagan8 St. Louis, US

    I'm an example of a community member who would be interested in working on follow-up issues to add missing test coverage and/or issues to refactor test coverage in general. We'd just need some kind of tag so they'd be easy to locate.

    And I'm not all talk here. I've posted test-only fail patches to dozens of bugs. I coordinated a large part of the effort to refactor tests that used Classy to use Stark leading up to D10. I've taken time to refactor tests on contrib modules I maintain. I've added test coverage for contrib modules I don't maintain that never had test coverage.

    I give a big +1 to the heuristics proposed in this issue. When it comes to the actual implementation of the heuristics, I think it would be nice to have a tag that could be added to an issue that indicates that the community believes no tests are needed. Kind of like the opposite of the "Needs tests" tag. That would allow interested or concerned parties to review such issues from time to time.

    The term "testing theatre" just occurred to me. We've heard of "security theatre" before and agree that some security measure provide no value. There are some tests that provide no value too. They're all for show. "Testing theatre" is part of what would be addressed by this issue.

  • 🇬🇧United Kingdom catch

    I think it would be nice to have a tag that could be added to an issue that indicates that the community believes no tests are needed.

    Yep I think we're also talking about at least two things here:

    1. Cases where we actually don't want to add a test at all - for me there are not many cases like that, but I do think deprecation tests where the only logic tested is a call to trigger_error() is an example.

    i.e. where the tested code looks more or less like

    function old_function($foo, $bar) {
      trigger_error('...', E_USER_DEPRECATED);
       new_function($foo, $bar);
    }
    

    Partly because what it's testing is... not doing much, partly because the test has to be removed along with the deprecated code, so we're 'digging a hole and filling it in again' to quote Keynes. This doesn't include bc layers that have actual logic involved which generally I'd want tests for even if we're deleting it later, because there is something to go wrong that we have to maintain in the interim with those.

    2. Cases where we would like to have test coverage, but adding test coverage is particularly onerous compared to the actual bugfix and/or the test coverage that we add just to have a regression test for the bug is likely to be of lower quality than if we wrote a test for the functionality in general.

    You can say we should write a test for the functionality in general, but that is expanding the scope of the bug fix way beyond what it should be, where other similar bugfixes might involve adding a single additional assertion to an existing test (or even data provider). You could then say we should open a prerequisite issue to add the test coverage, and postpone the bugfix on that - and in most cases I think that is worse than committing the bugfix and adding the tests in a follow-up.

  • 🇬🇧United Kingdom catch

    Actually there's a #3 which is 'covered by a PHP stan rule' - we are not quite there with PHPStan levels, just starting to get that from the levels we're enabling now, but it will be more in the future.

  • 🇳🇿New Zealand quietone

    I have read all the comments and summarized as follows.

    This is the for/against the proposal which show support for the proposal. Comment if I have made a mistake.

    for: poker10, danflanagan8, catch, larowlan, webchick. dww, alexpott, effulgentsia, chi
    against: smustgrave, neclimdul, anavarre, dww
    unknown: dawehner

    Next are the case mentioned when tests are not needed.

    1. Deprecation tests where the only logic tested is a call to trigger_error().
    2. Where the impact of a regression is minimal and the fix is obvious or a test adds maintenance overhead.
    3. Where we would like to have test coverage, but adding test coverage is particularly onerous compared to the actual bugfix and/or the test coverage that we add just to have a regression test for the bug is likely to be of lower quality than if we wrote a test for the functionality in general.
    4. Covered by a PHP stan rule

    Here are other suggestions.

    1. There was concern that the heuristics were complex
    2. Tighten the number of YES responses to the heuristic questions from a majority to 2/3
    3. Identify the heuristic questions that must be a YES
    4. Make use of a new tag 'Needs test followup'
    5. There were at least 3 people who wanted to spend time improving the testing suite instead of making this change. dww proposed the following:
      • Make writing tests easier.
      • Better document the existing tests.
      • Incentivize test writing.
  • 🇳🇿New Zealand quietone

    I suggest that the next step is to work through the suggestions in #80.

    1. Are the question complex themselves or are there too many? How can they be made simpler?
    2. What is you opinion on changing the criterion?
    3. Should we identify the questions that must be a YES? If this was done, then those question could be changed to set criteria.
    4. Is a new tag needed?
    5. I guess this one needs a followup. I have not looked. Catch mentions some issues in #71 but that does not cover the documentation and incentive point.

    I am removing the date from the title because there was no comment that would block making this change.

  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    Adding an anchor for Heuristics heading

  • Status changed to Needs review over 1 year ago
  • 🇳🇿New Zealand quietone

    I have made a followup for the suggestions regarding testing.
    I have updated the Issue Summary with proposed text change to the Core Testing gate.

    Can anyone explain how the proposed tag 'Needs test followup' will be used compared to 'Needs tests'?

  • 🇳🇿New Zealand quietone

    Update IS to clarify next steps.

    This got a behind schedule as a result of my holiday. The proposed resolution has been updated by me with what I think is a summary of the discussion. This needs to be reviewed, possibly tweaked, and approved so the Core Gate documentation can be updated.

  • Status changed to Needs work over 1 year ago
  • 🇺🇸United States mikelutz Michigan, USA

    Some grammar and punctuation fixes:

    The fix 'trivial' with small, easy to understand changes

    The fix is 'trivial' with small, easy to understand changes

    Is explicit 'regression' test needed?

    Is an explicit 'regression' test needed? / Are explicit 'regression' tests needed?

    If this fix is committed with test coverage but then later regresses, is the impact likely to be minimal or at least no worse than leaving the bug unfixed. For example, few sites were applying workarounds or patches that were removed when the fix was committed?

    If this fix is committed with test coverage but then later regresses, is the impact likely to be minimal or at least no worse than leaving the bug unfixed? For example: few sites were applying workarounds or patches that were removed when the fix was committed.

  • Status changed to Needs review over 1 year ago
  • 🇳🇿New Zealand quietone

    @mikelutz, thanks!

    Fixes made for #86. Back to NR

  • 🇳🇿New Zealand quietone

    In Slack, mikelutz said he didn't like this sentence.
    For example, few sites were applying workarounds or patches that were removed when the fix was committed?
    I too found it not very clear or helpful so we agreed to remove it.

    I am updating the Issue Summary.

  • 🇺🇸United States xjm
  • 🇬🇧United Kingdom catch

    Took the original 'heuristics' out of the issue summary, since I got confused which was the proposed text.

  • 🇸🇰Slovakia poker10

    I think that the selected questions for mandatory *yes* are well chosen, so +1 from me for the current proposal.

  • 🇳🇱Netherlands bbrala Netherlands

    After reading rhis full issue and IS, I want to give my +1 tot his proposal. Combined with a tag which helps us identify if a test is needed.

    Wish I had the time, writing tests is not too bad imo and fun even at times.

  • 🇬🇧United Kingdom catch

    Another issue where a test wasn't helpful or would be nearly impossible to write 🐛 claro.jquery.ui css assets may be added the page multiple times Fixed . Might have been caught by generic CSS regression testing, but otherwise we'd be testing the exact structure of Claro's library definitions.

  • Status changed to RTBC about 1 year ago
  • 🇳🇿New Zealand quietone

    There haven't been further suggestions/tweaks here so I am going to set this to RTBC.

    This still needs feedback from the framework managers, release managers and the testing subsystem maintainers.

  • 🇮🇹Italy mondrake 🇮🇹

    @quietone asked me to comment here.

    In general, +1.

    TBH I think this could be debated forever, so the sheer fact that some consensus was built in five years deserves recognition. If we were quicker in this sort of things, my approach would be 'let's go and see how it works, and fix the rough edges along the way'. But we're generally not quick, so I think more reviews are needed to agree this is robust enough for the next few years to come.

    Couple of points.

    1) PHPStan - I suggest issues that are fixing PHPStan baseline (are they bugs or tasks?) are exempt from adding tests if the existing tests do not fail in the process. We have around 1k baselined error on levels 0 and 1 alone, and we have been on PHPStan for 2 years already IIRC. Fixing them is sometimes a really painful process. Level 2 will add 2k more. IMHO we need a sort of fast track here or we will remain negligible levels for a long time. Note: the lower the PHPStan level, the higher the risk of type related bugs in code - so while we stay on L1, more and more bug issues will be opened.

    2) Follow-ups - IMHO we should also add to the commit gate that a follow-up issue EXISTS with the description of the additional test required, when heuristics tell us that one is needed.

  • 🇬🇧United Kingdom catch

    Agree with #1 and #2 of @mondrake's points.

    PHPStan is like free test coverage for some types of issue, so if we can fix a phpstan baseline error without breaking existing tests, then phpstan will at least protect against us reverting the specific fix we made, if not necessarily regressing the functional bug that resulted from it.

    The follow-up needing to exist and have a legible issue summary also makes sense.

  • 🇬🇧United Kingdom catch

    Since @mondrake has +1d, removing 'needs subsystem maintainer review' tag.

  • 🇦🇺Australia acbramley

    I'm wondering if we can also document when an update path test is needed?

    Take 🐛 Reduce the number of field blocks created for entities (possibly to zero) Fixed for example - a critical bug that greatly affects performance of large Drupal installs.

    The update hook is trivial - it sets a boolean value in a new config.

    Writing an update path test for that is not trivial - even for someone extremely familiar with writing tests, and someone that has written upgrade path tests in the past (like me).

    This one in particular is harder because layout_builder needs to be installed on top of the generic db dump file before we run updates, I spent ~60 minutes this morning getting this test working, and it would've been MUCH longer if I hadn't had multiple resources to help me (one being the existing MenuLinksetSettingsUpdateTest to copy from, and the layout-builder.php stub file that I stumbled upon to do the module install for me.

    We then run that update hook on every single MR going forward, all to test that a one liner runs correctly? Something that any number of contributors can look at and visually confirm it's doing the right thing.

  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    I realise in comment #1 I said solid -1 but that was before the criteria were established.

    So +1 from me from a framework manager perspective.

    Leaving the tag so @alexpott and @effulgentsia can also opine

  • 🇳🇿New Zealand danielveza Brisbane, AU

    I have a couple of issues that I think are worth raising in regards to this:

    Prevent auto-adding new fields to LB layouts Needs review is in the exact same situation that @acbramley raised in #100. The new functionality is tested, the only part that is missing a test is the update hook, which is simply setting a new boolean config value to TRUE.

    🐛 Contextual links can be used to drag and drop Layout Blocks. Needs review is a one line fix on a 5+ year old issue. Testing the dragging Layout Builder blocks appears to be an existing issue because the existing tests around this simulate the element moving in the DOM rather than performing an actual drag.

    Testing this particular issue requires us to be able to click and hold on an element. We don't have this in core and I can't think of a way to do this in a test without writing code that I feel like may be prone to random failures.

    Considering this is a very old issue with a one liner fix that can be easily tested manually I think it may be a good candidate for this.

  • 🇺🇸United States dww

    +1 to #100 and #102.

    Here's another to consider: what tests could / should we add for 🐛 Active toolbar tray has weak affordance and fails WCAG color criteria Needs work ? That issue has already been so painful, I don't think I could handle another 2 years for the tests. 😬 I'm not really sure what we'd be testing. It's a major accessibility bug with an entirely CSS fix. Are we supposed to have automated coverage for the accessibility of our CSS (beyond the lint job, which is already passing)?

  • 🇺🇸United States dww

    P.s. we’ve already got the “Needs tests” tag. Do we want to introduce something like “Needs no tests” to indicate we’ve considered it and it’s not worth it, undesirable, can be punted to follow-up, etc?

  • 🇬🇧United Kingdom longwave UK

    I have been watching this issue for a long time and never commented because I am very on the fence about it. On the one hand I see that we would move faster and fix more bugs if we committed some fixes without tests, but on the other I am worried about introducing regressions later on. But after sitting on it for a while I think the proposal here is a good compromise to both those arguments.

    I also agree with #100/102 that update tests are one of the most painful things to write, and concur that if the update hook is trivial then we should not require explicit testing; we do have test coverage that ensures that all updates are at least successful, and we have test coverage for e.g. the config/state APIs, so there seems little value in ensuring that an update hook can correctly make a minor change via those APIs. I further agree with #103 that CSS fixes and some accessibility fixes really can't be tested.

    Overall, the suggestion in the IS that the majority of the questions must be answered yes is a good failsafe. However:

    1. Is the fix is easy to verify by manual testing?
    2. Is the fix in self-contained/@internal code where we expect minimal interaction with contrib? Examples are plugins, controllers etc.
    3. Is the fix achieved without adding new, untested, code paths?
    4. Is an explicit 'regression' test needed?
    5. Is it easy for someone who did not work on the original bug report to add the test coverage in a followup issue?
    6. Does the issue expose a general lack of test coverage for the specific subsystem? If so, is it better to add generic test coverage for that subsystem in a separate issue?
    7. If this fix is committed with test coverage but then later regresses, is the impact likely to be minimal or at least no worse than leaving the bug unfixed?

    Question 4 stands out to me here. It's either the wrong way round - Is no explicit regression test needed? Yes means we can proceed without a test - or perhaps even it can be removed as a duplicate of question 7?

    The first part of question 6 also suffers from a similar problem, and I think "with" should be "without" in question 7.

    Suggested rewording where, to me, primarily "yes" answers point towards not requiring test coverage:

    1. Is the fix is easy to verify by manual testing?
    2. Is the fix in self-contained/@internal code where we expect minimal interaction with contrib? Examples are plugins, controllers etc.
    3. Is the fix achieved without adding new, untested, code paths?
    4. If this fix is committed without test coverage but then later regresses, is the impact likely to be minimal or at least no worse than leaving the bug unfixed?
    5. Is it easy for someone who did not work on the original bug report to add the test coverage in a followup issue?
    6. If the issue exposes a general lack of test coverage for the specific subsystem, is it better to add generic test coverage for that subsystem in a separate issue?

    I moved the final regression question higher as to me that is more important than the latter two questions.

    I also agree with #104 in that we could use "Needs no tests" when a decision has been made after answering the questions, or "Needs tests" if we decide the opposite, and also in conjunction with "Needs followup" if we decide to push test coverage to a separate issue (which should be opened and the tag removed before the bug fix is committed).

  • 🇬🇧United Kingdom catch

    What I was thinking of when I wrote 'is a specific regression test needed' is we have in some cases written one-off tests that are very specifically written for 'is this bug that we used to have not there any more', and are not 'does this system work as it's intended to', and that over time makes the overall test suite harder to maintain, and reduces our test vs. coverage ratio (i.e. we've have several cases of multiple tests testing more or less exactly the same thing, added years apart in different issues). But I think the rewording in #105 is plenty here and covers that situation well enough.

  • 🇺🇸United States smustgrave

    Also want to add an issue type to the discussion. What about php warnings? Usually I push back on those tickets because someone just puts an isset() or is empty() to mask an issue. But if that’s the fix does it need a test?

    Also agree on most of the suggestions here but would request some post be made in slack #contribute, #bugsmash, and #needs-review-queue-initiative when a final decision is made

  • 🇬🇧United Kingdom catch

    Per #106 here's an example of what would be a pure regression test if added: 🐛 Content Moderation should only validate its own workflow type Fixed - we'd only be asserting that something that used to happen doesn't happen any more, probably with quite a lot of test code involved to do that, but the actual code flow is already tested so it wouldn't add actual additional code coverage.

  • 🇳🇿New Zealand quietone

    I have updated the Issue Summary with the suggestions in #105. Thank you, @longwave, for rewriting the questions.

    There is support in the later comments for a 'Needs no tests' tag that was suggested in #104.

    And I think a follow up is needed to document when an update path test is needed? See #93, #100.

    I also agree with this change so I will remove the 'needs release manager' tag.

  • Status changed to Needs work 9 months ago
  • 🇮🇹Italy mondrake 🇮🇹

    From the IS

    This can be shown by uploading a patch with only the test changes, which Drupal.org's automated testing system should show as a fail, alongside a patch containing both the tests and the bug fix itself.

    This was the flow before introduction of MRs, that is now going away. Needs to be adjusted to current reality.

  • Status changed to RTBC 9 months ago
  • 🇬🇧United Kingdom catch

    That text was pre-existing and not changed by the proposal here, but yeah we might as well update it while we're doing this. Changed to:

    This can be shown by running the 'tests only' job in the gitlab merge request pipeline interface.

  • 🇺🇸United States nicxvan

    I updated the final bullet point with -> without, I think that was the intention.

  • 🇺🇸United States nicxvan

    I have been thinking about this all day as well, I think while having a patch sit stale for a long time (particularly waiting for tests) can be disheartening, especially for the user that created it.

    I think the practical reality for most project is that as long as people know about patching and follow best practices with build processes having a patch that fixes the bug can be nearly as good as having it committed. (There are obvious exceptions to this, but I'm specifically talking about the simple fixes to logic or typos that are mentioned in this issue).

    I think in general having a patch only get committed once a test is there to ensure no future regression is essential to the stability.

    I think having regressions are more damaging than having to re roll patches every so often.
    I do think that finding a way to increase velocity is also essential for these types of fixes.

  • 🇦🇺Australia pameeela

    Updated IS to note that the 'Case study 2' issue was eventually closed as a duplicate because the issue was fixed in another ticket, committed without tests :|

    Certainly drives home the point a bit.

  • 🇳🇿New Zealand quietone

    @nicxvan, thanks for the feedback. Yes, there is risk with this change but there are benefits. We can trial this change for about a year or so and adjust as needed.

    There is support for this and I have begun to work on the remaining task. I updated the core gates and then notices this is tagged for framework manager review. Oops. I will ping the framework managers.

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    I think the case studies outline why this is a good idea. And I think it is worth being as pragmatic as possible when committing changes. This is a framework manager +1 for the documentation updates. We will definitely make some mistakes due to this - but we make mistakes already and I don't think this will make it worse and has a good chance of making some simple things better.

    One interesting case is 🐛 Page title should contextualize the local navigation Needs work ... that issue got caught out by 🐛 PrepareModulesEntityUninstallForm::formTitle does not exist Needs work I think that with this policy we might have committed a 1 line fix for 🐛 PrepareModulesEntityUninstallForm::formTitle does not exist Needs work which means that 🐛 Page title should contextualize the local navigation Needs work would not have caused a break adn then might not have ended up so robust. I'm not sure what this tale tells us BUT the fact that we've had a callback point to non-existant code for years and fixing has proved hard should tell us something (maybe that Drupal's titling system is very complex).

  • 🇳🇿New Zealand quietone

    Making tag

  • 🇳🇿New Zealand quietone

    Adda this description for the tag:

    Indicates that tests are not needed according to Upload a test case that fails of the testing core gate . Intended to be used sparingly.

  • Status changed to Fixed 7 months ago
  • 🇳🇿New Zealand quietone

    The next steps here was discussed in committer slack a few days ago. One committer expressed a desire to give this a closer review but that was well over a month ago. So, to close this we agreed to open a new issue to review this change in 6 months. That issue is 📌 [policy review 2024-10] Testing core gate, heuristics section Active .

    Therefor I am changing this to fixed.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024