Migrate drupal.org issues to gitlab issues

Created on 11 July 2022, over 2 years ago
Updated 28 March 2023, almost 2 years ago

Problem/Motivation

#3265096: Move issues from www.drupal.org to git.drupalcode.org in this namespace to be able to create fork / MR.
* We need to migrate issues from drupal.org to gitlab.
* We will do it in batches, so it should be driven by project_name.
* We might need to undo the changes.
* Redirects should be created from drupal.org issues to the migrated gitlab issues.

Deployment instructions

  • Merge the code and run drush en drupalorg_gitlab to enable the new module.
  • Check that features are in "Default" state. We should see in this page "/admin/config/content/formats/1" a filter called "Gitlab/Drupal.org issue to link filter" which should be enabled. The same for "Full HTML" format. If they do not appear then probably the feature "drupalorg_wysiwyg" needs reverting.
📌 Task
Status

Needs review

Version

3.0

Component

GitLab integration

Created by

🇪🇸Spain fjgarlin

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇺🇸United States dww

    Coming here from a #gitlab Slack meeting thread about issue IDs. When you run git blame and find a commit, you see things like “Issue #12345: whatever”. It’s going to very seriously harm DX if I have to check both drupal.org/node/12345 and GitLab issue 12345 to figure out which one is actually it.

    According to Moshe, GitLab allows us to set the ID when creating new issues programmatically. I strongly believe we should use the d.o issue NID to specify the gitlab issue ID during the migration. Then, after all the issues are in GitLab, we know that we can always go to GitLab/$project/issues/ID and there won’t be any collisions.

  • 🇪🇸Spain fjgarlin

    I investigated the API possibilities and there is a bit of a conflict here.

    It is possible to set the IID when creating an issue (https://docs.gitlab.com/ee/api/issues.html#new-issue), however this "requires administrator or project owner rights". This means that we'd need to run the creation as admin, and this creates a new problem with the "author" property, which would be the user who created the issue, which is a property that can NOT be set on edit (https://docs.gitlab.com/ee/api/issues.html#edit-issue). So this creates a new problem as we'd lose the context of who created the issue.

    The way the scripts are designed is to create the new issue and then to create all the needed URL redirects. So somebody going to drupal.org/i/12345 for an old issue will be redirected to the new place. These redirects will be migrated to the new D10 site, so I think it's just a matter of agreeing to this convention, which I think it's even easier than git.drupalcode.org/project/PROJECT/issues/IID

  • 🇺🇸United States dww

    I’m 💯 for the redirects. I was hoping to have both.

    That’s really too bad that you lose who created the issues if you set the ids. What an sad limitation of GL. Maybe there’s a work-around?

    Apparently I’m having trouble explaining my concern. I believe it’s going to be confusing and annoying to have to both check two places (the d.o redirect and the GL link), and not really know if a given ID refers to one or the other. I guess we’ll have to start checking dates in commits, too, and memorize the date the migration ran live. There won’t be d.o redirects for new issues, so if I see “12345” in a post-GL commit, I can’t use the “easier URLs”, anyway.

  • 🇫🇷France fgm Paris, France

    Or we could change conventions and ask that issues created on GL be called something else than issue, e.g. "ticket". That way any "issue #xx" would be on d.o. and "ticket #yy" would be on GL.

  • 🇸🇰Slovakia poker10

    Redirects are great, but I personally feel that these are not enough and that without other connection, this will be very fragile. Imagine that we somehow lose one or more redirects (which I think can happen if these will be editable on d.o. as other content redirects), then there will be no way for users to find that particular issue in Gitlab.

    I think there are many places where issue IDs are mentioned just as a plain text (like #123456, without square brackets) and for less experienced contributors it could be hard to find the referenced issue.

    I am also wondering if keeping the ID would help directly in GitlabUI, because we still need to resolve/convert strings [#XXX] to the new Gitlab issue IDs, so there would have to be a conversion in place in addition to the redirects (if I understand that correctly). In case we keep the IDs the same, wouldn't we match the Gitlab method of referencing and would it be possible to omit the conversion?

    So I agree with @dww and prefer to keep IDs, so we can hopefully find some workaround for that issue with authors (as I think that loosing the information about the issue author is a no-go).

  • 🇺🇸United States dww

    fgm: indeed, I asked/proposed that in the Slack meeting. Sorry I didn’t mention it here. It’s still a bit of cognitive load, but it’s less than checking dates, for sure. But then we have to re-train everyone on commit message conventions…

  • 🇺🇸United States moshe weitzman Boston, MA

    The redirect controller could forward unknown issue ids to the gitlab url, without any mapping. That way there is one url to check.

  • 🇸🇰Slovakia poker10

    Looking at the code of the MR here, redirects seems to be created as a standard redirects via a redirect module.

    Another point - we also have links comment/XX, which are redirecting to the concrete comment in the issue. Is this redirect solved for referencing migrated comments as well?

  • 🇪🇸Spain fjgarlin

    https://www.drupal.org/comment/15302889 redirects to https://www.drupal.org/project/drupalorg/issues/3295357#comment-15302889 📌 Migrate drupal.org issues to gitlab issues Needs review .
    These redirects are something that we will need to decide if we are going to migrate or not. So far, nothing was written for it.

    Right now, when migrating comments to notes, we add as part of the description of the new note Migrated from [comment #%s (#%s)](%s) which has the internal comment order and the CID.

  • 🇺🇸United States dww

    @moshe re:

    The redirect controller could forward unknown issue ids to the gitlab url, without any mapping. That way there is one url to check.

    All kinds of flaws with that proposal:

    1. That only works if we assume that Drupal Core is the only project for which this matters. The GL URLs need to include a project name. "12345" could be from any project.
    2. If we don't preserve the IDs, "12345" could be both a valid d.o legacy issue ID, and a rolled over new GitLab ID. So d.o/i/12345 will redirect me to the GL issue for that old NID, but I still might end up on some irrelevant issue if I really needed to be at GL/$project/issues/12345 instead. The only ways for me to know are to compare dates, or re-train every Drupal contributor on commit message conventions (which we should do for other reasons, but that's out of scope here 😅).
    3. It sounds like there isn't a redirect controller, just a bunch of redirect content being generated, so there's not (currently) a way to implement your idea, even if it could work.

    Meanwhile, @poker10's concerns about the longevity of all that redirect content is a great one. Makes me think that instead of just generating redirects directly, we should have a redirect controller, and a {drupalorg_gitlab_issue_map} table (or whatever) with all the legacy d.o NIDs -> project + GL IID values. Then no one can break the redirects via the UI. And we'll have a canonical remap table to use going forward in case we need it for other things. I can't imagine the cost of having such a table in our DB is too great for all the potential benefits that would come from doing it that way. It'd probably a more efficient DB-storage than storing all the redirects separately, in fact.

    If there's no way to work around GL's limitation that we can't programatically set both IID + Author to what we want, how about we create a new field called "Original Author" or something? So the formal Author on migrated GL issues would be the "admin" user, but we can at least know the regular d.o user that created the legacy issue? Going forward, "Original Author" wouldn't be set on new GL issues, or we automatically set it to the "Author", or whatever. Sort of a PITA, but IMHO less painful than having issue IDs colliding.

  • 🇺🇸United States drumm NY, US

    Makes me think that instead of just generating redirects directly, we should have a redirect controller, and a {drupalorg_gitlab_issue_map} table (or whatever) with all the legacy d.o NIDs -> project + GL IID values. Then no one can break the redirects via the UI. And we'll have a canonical remap table to use going forward in case we need it for other things.

    In my experience, the risk of a redirect controller breaking in code updates is higher. Unless we get better at test-driven-development for Drupal.org, it will break at some point and go unnoticed. Using the common redirect module ensures we have less code to maintain and upgrade. And I don’t expect people to be editing these redirects in the UI.

  • 🇺🇸United States dww

    And I don’t expect people to be editing these redirects in the UI.

    It's not about what we expect people to do, it's about preventing the possibility of changing these, either by accident or malice. This is about data integrity.

    It's like the map tables from Drupal migrations. Even if you don't intend to write any code to consume it, the cost of generating that table as part of this migration is almost nil. The possibility that it will really come in handy exists.

    If, 3 years from now, we want to check if all the redirects still exist, the table would let us. If we decide we prefer a custom controller for some reason(s), we could.

  • 🇪🇸Spain fjgarlin

    I have a task in my backlog to adapt the IID of the migrated issue to that of the NID. From what we've read (see links in #16), it is possible, and I will try to achieve it. I just haven't coded it yet.

  • 🇸🇰Slovakia poker10

    I have mentioned this on Slack, but will post it here too.

    Not sure what is the exact status of the MR (I have not reviewed it), but from what I remember, there were some open concerns about keeping the issues IDs, how issue metadata should be migrated ( 🌱 Using GitLab labels for issues on Drupal projects Active ), and similar. @fjgarlin when you will have time to work on this again, it will be great to update the current status (what is done, what decisions are needed and what work is needed).

    Given that this will be a one-time migration (for each project), without an option to revert (unlike the GitlabCI migration), I think it would be great, if we can see at least one "demo" project migrated first. We will be able to evaluate and test, if everything went correctly, if all references to issues, comments, etc, are kept and working, how the meta info are migrated, etc. Some feedback could be collected this way before the real migrations via opt-in process will start. Thanks!

  • 🇪🇸Spain fjgarlin

    Re IID, there is some code that needs to be tested. I just added it to the MR as a comment for clarity.

    We will have an opt-in for projects, and initially, we have mechanisms to revert the migration in case something goes really bad. But yeah, we will need to test with a handful of projects and then test references and a few other things before doing more projects.

    Once I come back to work on this, I'll mention it here and we can see what the next steps are.

Production build 0.71.5 2024