The Git project selects SHA-256 as the next-gen hash function to migrate to

Git Rev News: Edition 42 (August 22nd, 2018)

Welcome to the 42nd edition ofGit Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page ongit.github.io.

This edition covers what happened during the month of July 2018.

Discussions

General

SHA-256 has been selected as Git’s next-generation hash function

Last month’s edition discussed the state of NewHash work, i.e. the process of selecting Git’s next-generation hash function. This discussion has concluded with the selection of SHA-256. An update to hash-function-transition.txt to change NewHash to SHA-256 is queued in the next branch.

Support

Git clone and case sensitivity

Paweł Paruzel reported that he found some test files in his repo appeared modified just after a clone because he had files like “boolStyle_t_f” and “boolStyle_T_F” that differ only in case and was cloning on a case-insensitive Mac.

He suggested having git clone throw an exception when files that differ only in case are cloned to a case insensitive system.

Brian Carlson replied that this would make it impossible to clone such a repository on a case-insensitive system while the current behavior might still result in a functional repo.

Brian also suggested using something like test $(git status --porcelain | wc -l) -eq 0 to check that a repo is unmodified after clone.

Duy Nguyen agreed with Brian and proposed a patch that uses sparse checkout to hide all the paths that fail to checkout because of the filesystem. Duy’s patch also warns to tell the user what happens.

Jeff King, alias Peff, replied to Duy suggesting just warning and advising the user. And Duy followed up with a modified patch that does just that.

Simon Ruderich commented that the advice message in Duy’s patch should list the problematic file names to help users.

Peff agreed with Simon and wondered if it was better to detect at checkout time if a file already exists on the filesystem rather than checking before the checkout. Peff also noted that Duy’s patch used strcasecmp() to check if filenames diff only by case, but in some cases, especially related to utf8 names, a filesystem could use complex folding rules which we would need to follow.

Brian replied to Peff saying that it was indeed possible to detect the issue at checkout time, and Duy replied that it was actually what his patch was doing.

Duy, Peff and Jeff Hostetler then agreed that it would be difficult to follow complex folding rules that a filesystem might use.

Duy then started sending a real patch in its own email .

Junio Hamano chimed in to suggest a different implementation and a long discussion thread involving Torsten Bögershausen, Elijah Newren, Duy, Junio, Peff and Jeff Hostetler followed about how to best find all the colliding paths.

Duy sent a version 2 of his patch .

The previous long discussion thread continued following this patch though.

Duy sent a version 3 that actually tries to find all the colliding paths on “UNIXy platforms”.

Szeder Gábor found small issues in the patch, so Duy sent a version 4 .

Comments from Torsten started a discussion about clarifying the documentation of the core.checkStat config option which was addressed by a separate patch from Duy and Junio.

Duy then recently sent a version 5 which tries to find all the colliding paths on Windows too, and a version 6 to address a few more comments from Junio and Torsten.

It looks like the latest version will be merged to the “next” branch soon.

Developer Spotlight: Derrick Stolee

Who are you and what do you do?

I’m a software engineer at Microsoft working on the version control client team. My team includes the Git contributors from Microsoft as well as most of the developers for VFS for Git (formerly GVFS). We also work on other version control clients, such as Team Explorer for Visual Studio.

I joined this team after a couple years on the Git Server team for Visual Studio Team Services (VSTS), where I work generally on performance features, such as the history algorithm, reachability bitmaps, and other scale issues. While I was on the team, we onboarded the Windows development group to Git, which was a very exciting time to be part of the team. After they were using VSTS, the place that needed the most performance improvement was in the client experience, so I switched teams.

Before Microsoft, I was an academic. I got my Ph.D in Mathematics and Computer Science from the University of Nebraska and was an assistant professor for a few years, working in computational graph theory and combinatorics. I found that being a faculty was not nearly as much fun as being a graduate student, and I couldn’t find enough time to write code for my computational experiments. Luckily, I was able to find a role at Microsoft that could use some of those skills.
What would you name your most important contribution to Git?

My most important contribution to Git has been the commit-graph feature. This feature enables significant performance boosts for Git repos of almost every size. I built a similar feature while rewriting the history algorithm for VSTS, and I changed teams from server to client so I could contribute this feature to core Git.
What are you doing on the Git project these days, and why?

In addition to working on more fallout from the commit-graph feature (including speeding up git log --graph ), I’ve been working on the multi-pack-index feature. This allows indexing a list of objects across several packfiles. I’m the first to admit that this feature is not as necessary as the commit-graph until you get to incredibly large repositories. When git repack stops being a realistic option, then the multi-pack-index can keep your lookup times sane. I hope that there are some more future integrations that are beneficial to other repos, such as tying the reachability bitmap to the multi-pack-index instead of a single packfile.
If you could get a team of expert developers to work full time on something in Git for a full year, what would it be?

THE INDEX! The .git/index file is a linear list of every path available in the working directory, even if those files are not checked out (due to a sparse-checkout). This is one of the most central features to all of Git, and is used in many places with a very transparent API.

The index is the single biggest bottleneck left to tackle for super-large repos like the Windows repo. These enormous repos are too large for most developers to need the entire thing, so they work in a small cone. VFS for Git virtualizes the files until they are read or written, and we use the sparse-checkout feature to limit Git’s interaction to be in that cone. However, we still need to read and write the entire index file whenever we interact with the staging area. If the index was hierarchical or could prune the list for entire subtrees, then we could drastically reduce this cost.

The biggest problem with this direction is that it requires refactoring almost all of the consumers to use a new API that is not as coupled to the layout of the index. Only after that change happens can we drastically replace the file format. It’s a bit of a chicken-or-the-egg problem, because we can’t refactor the API unless we know the behavior the new format can present, but we can’t test format options until we refactor the API.

The task is pretty daunting, but I think someone could get there with enough focus and determination.
If you could remove something from Git without worrying about backwards compatibility, what would it be?

I would rebuild revision.c from scratch. I’m going to need to do a lot of replacement to make git log --graph use generation numbers, but it would be easier if I could start over and add features one at a time.

This is probably a boring answer, but I have found that every single command-line option is someone’s favorite feature, so I don’t want to take that away from anyone. One of Git’s strengths is that it is so flexible for different use cases and different repos.
What is your favorite Git-related tool/library, outside of Git itself?

It’s rather new, but I’ve been enjoying using GitGitGadget to send patches to the mailing list. Too often I make a mistake sending patches upstream, or lose track of which commit I used in v2, and things like that. Starting from a GitHub pull request (that also ran builds on the code for Windows, Mac, & Linux) is much easier (to me) than going through the hoops of git format-patch and git send-email . I think that getting started submitting patches via email is one of the biggest barriers to entry for our community, and I really believe we are losing quality contributors due to that friction. Hopefully, GitGitGadget can be one way that we can attract and retain more contributors.

Releases

libgit2 v0.27.4
Gerrit Code Review v2.15.3
GitHub Enterprise v2.14.2 , v2.13.8 , v2.12.16 , v2.11.22 , v2.14.1 , v2.13.7 , v2.12.15 , v2.11.21
GitLab 11.1.4 , 11.1.2, 11.0.5, and 10.8.7 , 11.1.1 , 11.1.0 , 11.0.4, 10.8.6, and 10.7.7
Bitbucket Server v5.13
GitKraken v4.0.2 , v4.0.1 , v4.0.0
GitHub Desktop v1.3.3 , v1.3.2 , v1.3.1 , v1.3.0
tig v2.4.0

Other News

Various

Introducing freedesktop.org GitLab by Daniel Stone
dev.to is now open source announcement by Ben Halpern (source available on GitHub )
Image Diffing & Reflog in [new] Tower (multiplatform proprietary Git client) by Tobias Günther
13 new Bitbucket Cloud features by Justine Davis

Light reading

Git workflow for individuals by Gabor Szabo
GitOps: A Path to More Self-service IT by Thomas A. Limoncelli
Kubernetes anti-patterns: Let’s do GitOps, not CIOps! by Ilya on Weaveworks blog
Splitting SSH and git configs (with the help of git-config’s includeIf directive ), by Jonathon Fry
10 Common Git Problems and How to Fix Them by Michael Kohl (with caveats about some advice)
An automatic interactive pre-commit checklist, in the style of infomercials by Vicky Lai
Do your commits pass this simple test? (Frequent. Descriptive. Atomic. Decentralized. Immutable.) by Jonathan Irvin
Follow these [three] simple rules and you’ll become a Git and GitHub master by Ariel Camus
Working with large files and repositories with Git LFS on GitHub, on GitHub Community Forum
10 Git quiz questions to test your distributed version control skills by Cameron McKenzie
Data Version Control Tutorial by Dmitry Petrov (uses DVC software )
The 10:1 rule of writing and programming by Yevgeniy Brikman; it uses git-quick-stats and cloc to derive its conclusions, and inspired creation of hofs-churn script
Code Review Review is the Manager’s Job by John Barton
Using Microsoft Word with git by Martin Fenner (2014); advises to use pandoc to convert MS Word documents to Markdown format in a diff filter (via gitattributes )

Git tools and sites

Web applications for sending patches via email:
- GitGitGadget (homepage) is a bot (gadget) to serve as glue between GitHub Pull Requests and the Git mailing list, allowing to submit patch series , and to iterate this process.
- submitGit is an older Heroku app to send GitHub Pull Request to the Git mailing list, correctly formatting the patches. It’s creation was extensively covered in Git Rev News Edition #4 .
Version control for Data Science:
- Meltano by GitLab is an open source convention-over-configuration product for the whole data lifecycle, all the way from loading data to analyzing it. Meltano stands for the steps of the data science life-cycle : Model, Extract, Load, Transform, Analyze, Notebook, and Orchestrate. Announced in Hey, data teams - We’re working on a tool just for you .
- DVC : Open-source Version Control System for Data Science Projects. It is present in Python Package Index.
lazygit is simple [windowed] terminal UI for git commands, written in Go with the gocui library (compare tig , an ncurses-based text-mode interface for Git).
Gitrob: Putting the Open Source in OSINT is a tool to help find potentially sensitive files pushed to public repositories on GitHub
git-quick-stats is a simple and efficient way to access various statistics in git repository (with a simplehomepage)
hofs-churn is a small bash script to approximate code churn for a Git repo as described by Brikman’s article: “ The 10:1 rule of writing and programming ”
Atlas by O’Reilly, is a tool/site for publishing books (which can be written in AsciiDoc, with layout and structure defined in HTML and CSS files), based on Git.
Keep a Changelog , a site with proposed changelog format and a motto: “Don’t let your friends dump git logs into changelogs” .

Credits

This edition of Git Rev News was curated by Christian Couder < [email protected] >, Jakub Narębski < [email protected] >, Markus Jansen < [email protected] > and Gabriel Alcaras < [email protected] > with help from Derrick Stolee and Ævar Arnfjörð Bjarmason.

Git Rev News: Edition 42 (August 22nd, 2018)

Discussions

General

Support

Developer Spotlight: Derrick Stolee

Releases

Other News

Various

Light reading

Git tools and sites

Credits

Recommend

★ Introducing our Laravel Nova packages - Freek Van der Herten's blog on PHP and...

小米18年中报业绩大增雷军拿99亿奖励刷新世界纪录

GitHub - yasuhito/orgbox: Mailbox-like task scheduling in Org.

GitHub - davidshepherd7/electric-operator: An emacs minor mode to automatically...

Quick Start with Ionic + Firebase

Edge Computing at Chick-fil-A

Evolution of Android Security Updates

Exporting Crashlytics data to BigQuery

万万没想到——flutter这样外接纹理

急速JavaScript全栈教程 - 挚爱JavaScript - SegmentFault 思否

About Joyk