1

Whatever happened to SHA-256 support in Git?

 1 year ago
source link: https://lwn.net/SubscriberLink/898522/9cf50ee3f96f90c1/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!

The news has been proclaimed loudly and often: the SHA-1 hash algorithm is terminally broken and should not be used in any situation where security matters. Among other things, this news gave some impetus to the longstanding effort to support a more robust hash algorithm in the Git source-code management system. As time has passed, though, that work seems to have slowed to a stop, leaving some users wondering when, if ever, Git will support a hash algorithm other than SHA-1.

Hash functions are, of course, at the core of how Git works. Every object in its data store — every version of every file, among other things — is hashed, with the resulting value serving as the key under which that object is stored. Commits, too, are represented by a hash of the current state of the tree, the commit message, and the hash(es) of the parent commit(s). The security of the hash function is a key part of the integrity of a repository as a whole. If an attacker could replace a commit with another having the same hash value, they could perhaps inject malicious code into a repository without risking detection. That prospect is worrisome to anybody who depends on the security of code stored in Git repositories — everybody, in other words.

The Git project has long since chosen SHA-256 as the replacement for SHA-1. Git was originally written with SHA-1 deeply wired into the code, but all of that code has since been refactored and can handle multiple hash types, with SHA-256 being the second supported type. It is now possible to create a Git repository using SHA-256 (just use the --object-format=sha256 flag) and most local operations will work just fine. The foundation for support of alternative hash algorithms in Git was part of the 2.29 release in 2020 and appears to be solid.

That 2.29 release, though, is the last one that features alternative-hash work in any significant way; there has been no mention of this work in the project's release notes since a fix showed up in 2.31, released in March 2021. The 2.29 work marked SHA-256 as experimental and warned that "that there is no interoperability between SHA-1 and SHA-256 repositories yet". There was some work toward interoperability posted in 2020, but those patches do not appear to have ever been merged into the Git mainline.

In other words, work on supporting the use of a hash algorithm other than SHA-1 in Git appears to have ground to a halt. That recently led Stephen Smith to post a query about its status to the development list. This response from Ævar Arnfjörð Bjarmason is illuminating and, for those looking forward to full SHA-256 support, potentially discouraging:

I wouldn't recommend that anyone use it for anything serious at the moment, as far as I can tell the only users (if any) are currently (some) people work on git itself.

Bjarmason pointed out that there is still no interoperability between SHA-1 and SHA-256 repositories, and that none of the Git hosting providers appear to be supporting SHA-256. That support (or the lack thereof) matters; a repository that cannot be pushed to a Git forge will be essentially useless to many people. There is also the risk (which cannot really be made to go away) that the longer hashes used with SHA-256 may break tools developed outside of the Git project. The overall picture is one of a feature that is not yet ready for real-world use.

That said, it is worth noting that brian m. carlson, who has done the bulk of the hash-transition work so far, disagrees with Bjarmason's assessment. In his view, the only "defensible" reason to use SHA-1 at this point is interoperability with the Git forge providers. Otherwise, he said, SHA-1 is obsolete, and performance with SHA-256 can be "substantially faster". But he agrees that the needed interoperability does not exist, and nobody has said that it is coming anytime soon.

What has happened here looks, to an extent at least, like a story that has played out numerous times over the course of free-software history. A problem has been identified, and a great deal of core foundational work has been done to solve it. That solution appears to be well considered and solidly implemented. In a sense, the job is 90% done. All that is left is the hard work of making the transition to a new hash easy for users — what could be thought of as "the other 90%" of the job.

This sort of interface and compatibility development is hard and developers often do not find it particularly rewarding, so it tends to be neglected by our community. The Git project, one might argue, is especially prone to user-interface challenges, but the problem is wider than that. There are certain sorts of tasks that volunteers are often uninclined to pick up, and that companies may not feel the need to fund.

Given the threat that the SHA-1 hash poses, one might think that there would be a stronger incentive for somebody to support this work. But, as Bjarmason continued, that incentive is not actually all that strong. The project adopted the SHA-1DC variant of SHA-1 for the 2.13 release in 2017, which makes the project more robust against the known SHA-1 collision attacks, so there does not appear to be any sort of imminent threat of this type of attack against Git. Even if creating a collision were feasible for an attacker, Bjarmason pointed out, that is only the first step in the development of a successful attack. Finding a collision of any type is hard; finding one that is still working code, that has the functionality the attacker is after, and that looks reasonable to both humans and compilers is quite a bit harder — if it is possible at all.

So few people are losing sleep over the possibility that a Git repository could be deliberately corrupted by way of an SHA-1 hash collision anytime soon. The combination of a lack of urgency and little apparent interest in doing the work has seemingly brought the SHA-256 transition to a halt. Perhaps that is how the situation will remain until another SHA-1 weakness turns up and brings attention back to the situation. But, as Randall Becker pointed out, there is a cost to this inaction:

Adding my own 0.02, what some of us are facing is resistance to adopting git in our or client organizations because of the presence of SHA-1. There are organizations where SHA-1 is blanket banned across the board - regardless of its use. [...] Getting around this blanket ban is a serious amount of work and I have very recently seen customers move to older much less functional (or useful) VCS platforms just because of SHA-1.

It is a bit of a stretch to imagine that remaining with SHA-1 will threaten Git's dominance in the near future. But it could, perhaps, give a toehold to a competitor that would lead to trouble in the longer term, especially if the security of SHA-1 crumbles further.

Given that, one might think that companies that are dependent on Git would see some value in solving this particular problem. Many companies use Git, but some have based their entire business model around it. The latter companies have benefited greatly from the community's investment in Git, and they have a lot to lose if Git loses its prominence. It would seem to make sense for one or more of these companies to make the relatively small investment needed to push this transition to completion; that would be good for the community — and for their own future as well.


(Log in to post comments)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK