[policy, no patch] relax patch size policies

Issue created by @acbramley
Comment 5 months ago →
🇳🇿New Zealand quietone
On commit of the issue that originated this issue catch made the following comment 📌 Replace REQUEST_TIME in services Fixed which supports the existing policy

I'm going to go ahead and commit this despite #66 because I've already reviewed it in its current state once and it's looking good now. However the re-roll hell this has been in since December validates that it probably would have been easier in smaller chunks - e.g. splitting cache changes out to their own issue, or core/modules changes or both. It's always hard to balance between too many changes in one issue vs. too many issues for one change, and can be hard to tell when you start working on an issue how big the eventual MR is going to end up, so very tricky to figure out.

Where, the comment being referred to, comment #66, is the one where it was asked that the issue be split into smaller issues.
Comment 5 months ago →
🇳🇿New Zealand quietone
Being forced to split a large MR up just because it touches more than X number of files/lines/KB is just busy work to me, especially with these types of issues where the changes are mostly the same thing across every file.

On the other hand the policy is based on research that shows that the quality of a code review reduces as the change size increases. I do think Drupal should follow that, unless there is new research. And I know from my own experience as a reviewer that the research is correct.

Or is it more that this policy isn't a hard rule and we can decide on an issue-by-issue basis as a community whether something is too large?

There have been exceptions to the policy about the size of the change, such as when it is mostly boilerplate or a large change was automated. I think one of the recent OO hook issues was one, but I could be wrong. Coding standard issues would be another case where large patches were committed. So, yes there is some flexibility.

The language of the policy is already flexible in that it takes in consideration of the type of change and only suggests a 'good target size'. I do not see a need for a change.
Comment 5 months ago →
🇨🇦Canada Charlie ChX Negyesi 🍁Canada
Yes, hook-to-OOP went in a single go. However, that contained no functional changes and while the MR itself was over 25 000 lines it was just shuffling code around automatically and a very small amount manually and even that small amount was split into six patches. Zero functional changes. (The problems being found are coming from the new module system not the conversion. People have abused ModuleHandler in truly creative ways I couldn't have possibly expect.)
Comment 5 months ago →
🇺🇸United States smustgrave
I think it would be good to consider maybe upping? I really get both sides

📌 Move helpers in contact_storage_test.module and delete it Active this is a piece of a meta converting all test modules to hooks. The fact it was broken out to this smaller one made it super quick to review. If the developer tried to group say 6 modules of tests to convert that could take a little longer. And this may be a bad example but could hold up good to go pieces from getting in.

So I'd be +1 for upping it slightly but also recognizing and suggesting that breaking things up is always the fastest approach.
Comment 5 months ago →
🇺🇸United States nicxvan
I think it's funny that issues I created were used for pro and against increasing the patch size.

I think it should be increased a bit from where it is, and while it can get hard to review after a certain size, tools like hiding sections do make it easier.

Here is a third example.
I am working through an issue that it was requested I break up now too.

📌 Mark core modules as converted to OOP hooks Active
📌 [pp-1] Mark several more modules as converted Active
I think the first break up makes sense because the first issue was an attempt to mitigate the memory leak and the issue was really doing two separate things to that approach.

The second issue is being requested to be broken up into 3 new issues.

With the tools in gitlab it feels possible to review that as one issue especially since it was rtbc already in the parent issue.
I'll likely break it up when I get some time just because it was requested, but I think that issue should be considered acceptable.

As for the meta smustgrave mentioned I did combine a couple but quickly realized the remaining had too much to do in each so I will end up splitting them up as I work on them as well.
Comment 5 months ago →
🇳🇿New Zealand quietone
For those that want this to change, can you add suggested wording to the issue summary so it can be evaluated? Showing before and after text would help.
Comment 4 months ago →
🇺🇸United States smustgrave
Do we just want to increase the number and leave the language?
Comment 4 months ago →
🇺🇸United States smustgrave
A good target size for a merge request or merge request is a diff size 10-40K, or about 400 changed lines of code. The ideal merge request size varies with the type of fix.

Good conceptual scope is more important than the size alone. Large size merge requests with many instances of the same simple change can still be easy to create and review, while small merge requests with complex code or documentation can be difficult to create and review.

These limits are based on 11 Best Practices for Peer Code Review. That analysis shows that reviewers are twice as likely to miss problems when reviewing more than about 400 lines of code, and many times more likely to miss problems when reviewing more than 400 lines.

Thought?
Comment 4 months ago →
🇦🇺Australia acbramley
I think 400 lines seems reasonable, I think the main change being that we're not judging by patch "size" since that metric is not readily available from an MR (AFAIK).

I wonder if we could find some examples of >400 line MRs that are considered too hard to review.
Comment 4 months ago →
🇫🇷France andypost
sometimes it's easier to chunk commits by 400 loc but having overall changes as 10k
Comment 4 months ago →
🇺🇸United States smustgrave
Lets see what committers thing if this change is good to update.

Agree sometimes the number can be 1000s but really a snippet of code really changed.
Comment 4 months ago →
🇳🇿New Zealand quietone
Ah, there is not proposed resolution section here, that should be completed. And if this to change wording in a policy, then that needs to be included as well. Thanks.
Comment 4 months ago →
🇬🇧United Kingdom catch
Yeah diff size is possible to figure out by downloading the raw diff and looking at the file size but no-one is doing that so this definitely needs changing.

I wonder if we could find some examples of >400 line MRs that are considered too hard to review.

I don't think this is only about 'too hard to review' it's also about trying to optimise RTBC throughput, which is in itself a very fine balance.

Some issues are reviewable in 15 minutes, and if I am lucky, when I have a 15 minute gap, I will open an issue that 'looks like' a 15 minute issue and either commit it or move it to needs work with comments. If the issue could have been a 15 minute issue but for some reason isn't, then we get into webchick's now nearly 20 years old 'please stop eating baby kittens' post https://webchick.net/please-stop-eating-baby-kittens - but I think most regular core contributors have internalised the issue scoping guidelines to the point where random out of context fixes almost never happen.

When we get into issues that are more like 1-4 hours to review, it's a lot trickier.

If we take a hypothetical issue with a diffstat of 1600 lines, that could potentially be split into four different issues of 400 each, and say that a 400 line diffstat = 1 hour review time.

Using MRs definitely means that someone could review those 1600 lines in four, one-hour chunks, but I don't think that reviews really work like this. Any reasonably complex issue goes through multiple rounds of review from multiple different people. In the meantime, other issues get reviewed and committed, which mean a rebase needs to happen, sometimes this results in logic changes or a bit of reorganisation, which means another re-review.

If it's possible to take a 1600 line MR, extract 400 lines from it, quickly review those 400 lines and get them committed, then for me that is great - because once those lines are committed that means there is 1200 lines of code to review left. If I review 400 lines of a 1600 line MR, and think that they're good, then there is no guarantee that another issue somewhere else won't cause changes that mean they need to be reviewed again before the entire thing can be committed. So there are still 1600 lines that potentially need to be reviewed and committed until it's actually in. If I review 400 lines and think they're great, then review the next 800 lines and find loads of issues, that's the point I would ask for an issue to be split if I think parts of it can be committed without the others, it saves holding the entire issue up that way.

But on the other hand, sometimes issues are split into two for what seems like arbitrary reasons, and it's hard to review the first half without also reviewing the second half - but... this is OK if there are actually two separate MRs that are written up, because I can refer to both at the same time, concentrating on one but using the second for reference. Or worst case, it's easy to recombine the issues for a single commit if that turns out to be easier.

And then we have automated or semi-automated MRs that do hundreds of the same kind of change at once, and those are great because it saves dozens of individual review/RTBC/commit cycles for what is essentially the same thing.

No idea what all this means for the policy though.

[policy, no patch] relax patch size policies

Problem/Motivation

Remaining tasks

Comments & Activities