0

GitHub - bloomberg/repofactor: Tools for refactoring history of git repositories

 2 years ago
source link: https://github.com/bloomberg/repofactor
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Finding the causes of repository bloat

This project contains a bunch of tools to help analyse the largest blobs (by "on disk" storage) in a repository.

Here is a sample sequence of commands showing typical usage:

  • Typically start with a clean clone of the repository that you want to analyse. It can be bare. For reasonable performance it should be cloned onto "local" disk on a reasonably fast Linux machine.

  • Add these tools to your PATH or use a full path to each script or executable.

  • Run these tools from the repository undergoing analysis and cleaning.

  • Work out a suitable threshold size by running generate-larger-than with experimental parameters. 50000 might be a good starting point. The size is "average bytes after compression by Git".

  • Generate a sorted list of objects with file information

    generate-larger-than 50000 | sort -k3n | add-file-info >../largeobjs.txt

  • Make a report showing the summary of each commit together with the paths which introduce the large objects, their uncompressed size and file information

    report-on-large-objects ../largeobjs.txt

Filtering out large blobs

  • Create a temporary work directory and export RFWORK_DIR to point to this directory (defaults to the current directory).

  • Again, run all commands from the repository being analysed.

  • From the above report, edit down a list of blob ids that can be eliminated. Call this large-objects.txt.

  • Generate a remove script

    make-remove-blobs large-objects.txt >"$RFWORK_DIR"/remove-blobs.pl
    chmod +x "$RFWORK_DIR"/remove-blobs.pl
    
  • Optionally edit the remove script to filter out any paths that are not required at the same time

  • Run the filter branch

    run-filter-branch

  • Create a new "easy rebase" script for moving work-in-progess branches from the old history to the new history

    make-mtnh >"$RFWORK_DIR"/move-to-new-history

  • Push the rewritten refs and the rewrite-commit-map branch to all central repositories

  • Deploy move-to-new-history for users to use


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK