Authored-by should use email address from commits

Created on 9 August 2025, about 1 month ago

Problem/Motivation

Review https://new.drupal.org/contribution-record/11413644 and the associated MR https://git.drupalcode.org/project/drupal/-/merge_requests/12403

Observe that the contribution record suggests:


            
              
              
              📌
              Remove FileSystemInterface::basename() and use PHP native basename()
                Active
              
             feat: Remove FileSystemInterface::basename() and use PHP native basename()

Authored-by: 54534-cmlara@users.noreply.drupalcode.org
Authored-by: 22609-kimpepper@users.noreply.drupalcode.org

Observe that the author address is:
From: Conrad Lara <cmlara@cmlara.com>

The commit author address is a conscious choice by developers that should be respected by default.

Steps to reproduce

Create an issue with an MR. Submit commits to the MR with an Author email address that does not utilize the no-reply address. Credit the user who wrote the commits, observe that the message suggests the no-reply address.

Proposed resolution

Use email address as included in commit

Remaining tasks

User interface changes

Commit messages will now suggest the address as submitted by the commit author.

API changes

None expected.

Data model changes

None expected.

🐛 Bug report
Status

Active

Version

1.0

Component

User interface

Created by

🇺🇸United States cmlara

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @cmlara
  • 🇺🇸United States dww

    +1. For folks who push commits to the MR, yes.

    I also opened an issue so that even if I’m credited with reviewing something that I still get identified with my preferred email.

    Thanks for opening this,
    -Derek

  • 🇧🇪Belgium BramDriesen Belgium 🇧🇪

    +1 But this is key!

    The commit author address is a conscious choice by developers that should be respected by default.

    As described on your git instructions page: https://www.drupal.org/user/UID/git . One has the choice to use an anonymized address.

  • First commit to issue fork.
  • 🇪🇸Spain fjgarlin

    A few issues/questions:
    - Note that MRs could potentially have commits from the same person coming from different emails. eg: this MR. Whatever we do in this case (take the first, take the last, put both) will be correct in some cases and wrong on some others.
    - If a user decides to remove their account and their personal email was used on a commit (by that user's choice or unintentional local git setup), we won't have a way to remove that PII from the commit history.
    - If an issue had multiple MRs (eg: this one New contribution records system Active ), where some were merged, some where closed, some might be ignored. What's the source of truth? This would only add to the previously raised points, but it's another source of potential conflicts/issues.

    If we were to do it, I don't think we need a setting/field on d.o or even in gitlab, as we have the information on the patch files for the MRs.
    But the above questions are relevant, especially as we haven't heavily used conventional commits yet.

  • 🇪🇸Spain fjgarlin

    Related issue.

  • 🇺🇸United States cmlara

    Note that MRs could potentially have commits from the same person coming from different emails

    I had originally thought take last (allow users to do an update to correct bad data) however likley the take all is a better choice.

    Committers can clean up the data at commit if need be, and this goes to the fact that authors may be working for different organizations on the same issues (copyright of work for hire owned by the hiring company) which aligns to D.O. credit policy of credit everyone involved.

    we won't have a way to remove that PII from the commit history.

    This is covered by https://www.drupal.org/docs/develop/git/setting-up-git-for-drupal/drupal...

    The no-reply addresses themselves (since they identify a specific user by username and user ID) are also likely PII. There appears to be no real new concern here.

  • 🇺🇸United States drumm NY, US

    I think I generally like this idea. This has a good chance of being doable without additional API calls, and increased page load time, since we’re already loading commits to help maintainers make crediting decisions. If this does take more API calls, then we should look if we need to consider other options.

    And this does let the contributor choose what they want per-issue, as long as they’re making a code contribution.

    Clarifying the proposed resolution to

    Use email address as included in the most recent commit, from any MR if there are multiple

    The most recent commit has the best chance of being what the contributor wants, and can be amended with an easier force push. Most issues won’t have multiple MRs. For ones that do, I don’t think it practically matters too much which one wins; readable, efficient JS can be the priority over extra logic around prioritizing/choosing multiple MRs.

  • 🇪🇸Spain fjgarlin

    Great. I can work with that.

    1. We will load by default the anonymous GitLab address (as some of the contributors listed might not have participated in the code)
    2. We will make a map of user emails from the patch of the MRs: https://git.drupalcode.org/project/drupalorg/-/merge_requests/378/diffs....
    3. We will replace existing emails with the newly found.

  • 🇪🇸Spain fjgarlin

    Investigation/progress so far:
    - https://git.drupalcode.org/project/drupalorg/-/merge_requests/147.patch returns name and email, not username nor user ID.
    - https://docs.gitlab.com/api/merge_requests/#get-single-merge-request-com... returns name and email, not username nor user ID
    - https://docs.gitlab.com/api/merge_requests/#get-single-merge-request-par... returns user id, username, and name, but no email

    The only way to make sure to get the correct user from an email is to do a user search in gitlab (https://docs.gitlab.com/api/users/#list-users), and this will only work for the public_email, not secondary emails, and it will require a call per user (if using REST, which is what we are using so far).

    The glue is "username", which is not present in the list of commits from the above endpoints/URLs.

    Also, you can technically set any email via git config --global user.email "EMAIL". That's the one that will be linked to the commit, regardless of whether you have that email in your emails setting: https://git.drupalcode.org/-/profile/emails

  • 🇺🇸United States dww

    If getting everything via API is a pain, can we iterate through all the commits, and for every one, whatever is in Author: we add it to a list and add all unique values as Authored-by: footers in the default commit message? Not as ideal, but seems like it’d cover the 80% case really well with no data beyond the commits in the issue forks.

  • 🇺🇸United States cmlara

    Also, you can technically set any email via git config --global user.email "EMAIL". That's the one that will be linked to the commit, regardless of whether you have that email in your emails setting: https://git.drupalcode.org/-/profile/emails

    That starts to touch on the often avoided issue of properly acknowledging copyright holders. this may be better a separate issue, however it is a point of compliance we eventually need to address.

    We can not assume that every author is a D.O. user when formatting the author-by lines, these generally should include everyone who has a legal claim in the code being committed, which can be overly simplified as, anyone who authored a change requiring original thought.

    This was a bit less of an issue when the commit template did not assert included names were authors, it is a burden that was already required, and became more deeply accepted when 🌱 [policy] Decide on format of commit message Active was adopted, especially with the original discussions believing GitLab would populate that data for us as part of its template system before the need to adhere to Drupalisms pulled this back into the contribution UI.

    This starts to go back to my point in #7 regarding authors who work for multiple organizations on the same issue and that commit email could be indicative of corporate ownership (work for hire copyright law).

  • 🇺🇸United States cmlara

    and this will only work for the public_email, not secondary emails

    Has this been validated? https://gitlab.com/gitlab-org/gitlab/-/issues/26110 implies secondary and private emails are searchable if done using an account with sufficient privileges.

    You can also use ?search= to search for users by name, username, or email. For example, /users?search=John. When you search for a:
    https://docs.gitlab.com/api/users/#as-an-administrator

  • 🇪🇸Spain fjgarlin

    It seems to also accept secondary addresses.

    If we look for @catch's 3 email addresses listed in https://git.drupalcode.org/project/drupal/-/graphs/11.x?ref_type=heads
    2 of them work, 1 doesn't. The one not matched seems to be automatically generated by gitlab and not set in a profile (I checked the same scenario with my user).

    Going back to #8. We'd need to do an API call for every commit listed on every MR for an issue. Then get a map of email addresses per user and select the most recent.

  • 🇺🇸United States cmlara

    Going back to #8. We'd need to do an API call for every commit listed on every MR for an issue

    The Commits API can return multiple commits per call.

    If this does take more API calls, then we should look if we need to consider other options.

    This can also be hooked into the webhook system so that the majority of the time this is done when a commit is made or a comment posted leaving only a rare need to hit the “sync” button (which is not on every page load).

  • 🇪🇸Spain fjgarlin

    The Commits API can return multiple commits per call.

    And for each of those commits, we'd need to do an API call to search for the email (we can keep a map so we don't do duplicate calls). It's an API call for each individual email.

    The "sync" button won't make any difference with the above calls as all of this (the message generation) is JS driven.

  • 🇺🇸United States cmlara

    (we can keep a map so we don't do duplicate calls). It's an API call for each individual email.

    Considering this is a privileged call and would have to be made by D.O. infra I don't see a reason that there can't be a backend cache that retains these mappings. We do have 'drive by' only ever seen once in their lifetime contributors, however are those the majority of ecosystem commits? This feels like a non-barrier to me.

    I have to imagine when #3300281: Have git.drupalcode.org manage secondary emails, replacing multiple_email module we knew at some point we would have to have D.O. query G.D.O. and accepted those calls as acceptable.

Production build 0.71.5 2024