

GitHub - bloomberg/repofactor: Tools for refactoring history of git repositories
source link: https://github.com/bloomberg/repofactor
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Finding the causes of repository bloat
This project contains a bunch of tools to help analyse the largest blobs (by "on disk" storage) in a repository.
Here is a sample sequence of commands showing typical usage:
-
Typically start with a clean clone of the repository that you want to analyse. It can be bare. For reasonable performance it should be cloned onto "local" disk on a reasonably fast Linux machine.
-
Add these tools to your
PATH
or use a full path to each script or executable. -
Run these tools from the repository undergoing analysis and cleaning.
-
Work out a suitable threshold size by running
generate-larger-than
with experimental parameters. 50000 might be a good starting point. The size is "average bytes after compression by Git". -
Generate a sorted list of objects with file information
generate-larger-than 50000 | sort -k3n | add-file-info >../largeobjs.txt
-
Make a report showing the summary of each commit together with the paths which introduce the large objects, their uncompressed size and file information
report-on-large-objects ../largeobjs.txt
Filtering out large blobs
-
Create a temporary work directory and export
RFWORK_DIR
to point to this directory (defaults to the current directory). -
Again, run all commands from the repository being analysed.
-
From the above report, edit down a list of blob ids that can be eliminated. Call this
large-objects.txt
. -
Generate a remove script
make-remove-blobs large-objects.txt >"$RFWORK_DIR"/remove-blobs.pl chmod +x "$RFWORK_DIR"/remove-blobs.pl
-
Optionally edit the remove script to filter out any paths that are not required at the same time
-
Run the filter branch
run-filter-branch
-
Create a new "easy rebase" script for moving work-in-progess branches from the old history to the new history
make-mtnh >"$RFWORK_DIR"/move-to-new-history
-
Push the rewritten refs and the
rewrite-commit-map
branch to all central repositories -
Deploy
move-to-new-history
for users to use
Recommend
-
70
In a previous article we covered problems with maintaining multiple repositories and 2 ways how to solve it in ideal world. Do you maintain many interdependent repositories one by one and still think…
-
46
README.md Truffle Hog Searches through git repositories for secrets, digging deep into commit history and branches. This is effective at finding secret...
-
35
Here are the steps that I too to merge multiple GitHub repos into one, while preserving all commit history. The process took about 30 minutes for 5 repos. As a result, I feel like my GitHub page is cleaner and code is act...
-
4
BDE Allocator Benchmarks This repository contains a snapshot of the BDE libraries, and contains both allocator benchmarking tools and some modifications to the BDE libraries required for...
-
10
CartoAssets Share frontend assets between different CartoDB repositories Installation As easy as: npm install --global gulp-cli
-
6
clang-metatool - A framework for reusing code in clang tools About clangmetatool When we first started writing clang tools, we realized that there is a lot of life cycle management that we had to repeat. In some cases, peopl...
-
3
BDE Tools This repository contains a collection of tools to help with the development and building BDE-style libraries and applications. The tools found in this repository are listed below, and described in detail in the tool'...
-
4
phabricator-tools Tools and daemons for administering lots of Phabricator instances and integrating them with other tools. Overview Phabricator is an awesome, open-source applicat...
-
1
git-adventure A text-based adventure game helping players learn about git along the way. This document describes how the text-based adventure system that for Git will work. It supports Git on the com...
-
7
Merging two GitHub repositories without losing history We are in the process of merging smaller example code repositories into larger parent repositories on the MDN Web Do...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK