5

Moving Python's bugs to GitHub

 2 years ago
source link: https://lwn.net/SubscriberLink/885854/bb107c53bdebc248/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!

Over the past seven years or so, Python has slowly been moving its development infrastructure to GitHub; we covered some of the early discussions at the end of 2014. One piece of that infrastructure, bug tracking, has not been moved from bugs.python.org, but plans are underway to make that happen soon. It is not a simple or straightforward process to do so, however, so the transition will take up to a week to complete; there are a number of interesting facets to the switch, as it entails clearing some technical, and even legal, hurdles.

The plan

Python's developer-in-residence, Łukasz Langa, announced the plan and schedule on the Python Discourse forum on February 18. As described in PEP 581 ("Using GitHub Issues for CPython"), the Roundup-based bugs.python.org (often abbreviated as "bpo" or "BPO") will be retired, but live on in read-only mode so that the existing URLs still work. Each of the entries on BPO will be enhanced with information about where the corresponding issue lives on GitHub; after the transition, all new issues will be added to GitHub.

After the discussions in 2014, the move to GitHub got rolling with PEP 512 ("Migrating from hg.python.org to GitHub"); as the title indicates, it was a move away from the Mercurial-based repositories to Git and GitHub. (Mercurial uses "hg" as its nickname and the name of its main binary, after the Atomic symbol for Mercury.) Brett Cannon, who authored PEP 512 and has been one of the driving forces behind the workflow changes that came with the move to GitHub, reported on the progress of the project at the Python Language Summits in 2016 and in 2017. At the summit in 2018, Mariatta Wijaya proposed switching to GitHub Issues for bug tracking, which resulted in PEP 581; it was approved in 2019 and is coming to fruition now.

As Langa noted, though, there are various difficulties in making the switch:

Unfortunately, this is not an easy task technically, procedurally, or legally, as it involves coordinating with several external actors and solving technical challenges mostly unique to our current circumstances. As a result, while progress was steady, it took a long while to get to this point. I was asked by the Steering Council to take over project management on the migration.

He has been working with CPython developer Ezio Melotti and "our friends on the Github side" to push the task forward. His announcement marked the beginning of a two-week feedback-gathering phase. Then, on March 4, a test migration will be done using 10% of the bugs on BPO; if that is successful, and no show-stopping problems are encountered, the migration will start by making BPO read-only on March 10 and beginning the transfer of everything on March 11.

The migration is estimated to take anywhere from 3 to 7 days, depending on the load on Github.com. This is why we will be performing the bulk of it during the weekend to speed things up.

During that time, no new issues can be opened in either place, but GitHub pull requests (PRs) can be created and used as normal. As issues are migrated from BPO and start showing up at GitHub, which will be ongoing during the process, they can be edited there, "but destructive actions (changing issue titles, editing comment content, deleting comments, removal of labels) are HIGHLY DISCOURAGED". Making those kinds of changes will make it more difficult to audit the completeness of the migration.

There is a contingency plan should things stretch out too long: "In the unlikely case that the migration cannot be completed in 7 days, the Steering Council decided that we would abort it and re-enable BPO again." Further details on the plan, its risks, and possible mitigations for them can be found in a GitHub issue. That issue is part of the gh-migration repository, which is where problems should be reported as part of the feedback process: "You can treat it as exercise in using Github issues 😉". There are also example migrated issues available on GitHub for Python developers and others to examine, as well as documentation updates (coming from this PR).

The main legal question to resolve was whether the Python Software Foundation (PSF) is able to move the user-generated content, with its potentially personally identifiable information (PII), from BPO to GitHub. The steering council and PSF lawyers determined that no user consent was required to do so:

Both BPO and Github are public-facing systems. Users actively placed their information (including PII) in the BPO system, which actively grants consent for that information to be stored, publicly accessible, and distributed on-demand. Changing our backend to Github does not revoke that permission. At the same time, the migration will not be surfacing any new user information that wasn't previously publicly accessible in the BPO system.

Concerns

As might be guessed, one of the concerns expressed in the forum regarded the multi-day pause in the use of the bug tracker. Eric Snow wondered if the older closed and inactive issues could be locked and migrated first. "Assuming the remaining issues would be much fewer than the inactive ones, I'd expect the disruptive part of the migration would be much (proportionally?) shorter." Langa said that the difficulty with doing that is there is no way to disable GitHub Issues during the migration; as soon as some issues are migrated, there would be two trackers in operation, in effect. "The idea to have two issue trackers open at the same time is making me nervous."

In the announcement, Langa noted that Python and GitHub were able to learn from the experience of the LLVM project, which migrated from Bugzilla to GitHub Issues back in December. That migration took 21 days, so the hope is that experience will lead to a smoother (and quicker) transition for Python. Snow said that the estimate of four to seven days "feels like the end of the world" in terms of its impact on core workflow, but, obviously, 21 days is far worse. Melotti said that he has been in contact with LLVM and others:

If I understand correctly the actual transfer eventually took them a couple of days, but it had a few false starts and issues. I've been talking with the project manager of the LLVM project and a few other people that performed similar migrations in the past, so that we could learn from their mistakes and avoid them.

Irit Katriel suggested that post-migration would make for a good time "to review old issues and close them if they are no longer relevant". Langa agreed with that idea, as did Melotti, who added it to the issue tracking notification for BPO users. A notification email of the change will be sent to BPO contributors, listing the issues they have submitted, been assigned, or were following, along with a link to the corresponding new GitHub Issue.

Victor Stinner asked about a related concern; normally an update to a BPO issue will send an email to those people who have added themselves to the "nosy" list for it. He wondered if the update of the BPO entries to add the new GitHub link would generate said emails; "I'm in the nosy list of 885 BPO issues. Should I expect 885 emails [...]?" Melotti sympathized (at least in part because he is on the nosy list for over 4000 entries), but did not directly address whether the BPO change emails would be generated. He did say that he was still hoping to be able to automatically convert the nosy list subscriptions to their GitHub equivalent. Otherwise, active contributors will need to go into each new bug, one by one, and add themselves.

Steve Dower wondered whether it made sense to migrate the closed issues at all. "While there's always some amount of further discussion on closed issues, the vast majority are never going to be touched again. Why recreate them?". But Katriel said that the closed issues still have useful information, which is best kept in one place:

If you want to search closed tickets for some error message, for instance, you want to search in only one place.

There are issues where the problem is not fixed, but the ticket has relevant discussion and workarounds.

The conversation is still ongoing as of this writing, and presumably will be for another week or more. None of the concerns raised so far seem like they will be all that hard to deal with, though it may still be a pretty painful transition, especially for active, longtime contributors. Whether that all gets worked out on the timeline laid out remains to be seen; it would not be a huge shock if the final transition had to be pushed back a time or two. There are quite a number of moving parts that need to be in alignment for this kind of a transition. Hopefully, it all goes off without a hitch—though that may be a tad overoptimistic.

As with Python learning from LLVM's experience, so too can other projects watch this transition with interest. That is one of the strengths of open source and openly developed software; there is much to be learned from the experience of other projects. In fact, the whole transition from self-hosted to GitHub can be found in the Python mailing lists, forum posts, PEPs, and so on; projects thinking about making a switch like that can prepare themselves better by standing on the shoulders of the projects that have gone before.

The switch away from Roundup also largely completes Python's transition of its development infrastructure from open-source, Python-based tools (Mercurial, Roundup) to the proprietary GitHub "software as a service" offering, which is certainly sad in some ways. But Python has always been a fairly pragmatic project—something it seems to have inherited from former benevolent dictator for life Guido van Rossum—and the intent of these moves was geared toward attracting new developers who are familiar with and comfortable using GitHub. Over the last few years, the project does seem to have picked up some steam—and lots of new faces—so it looks like that effort may be paying off. That, too, may be instructive to other projects.


(Log in to post comments)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK