Git got big files or keys? Break out BFG

Written by mikefettis | Published 2018/11/09
Tech Story Tags: git | github | devops | security | developer

TLDRvia the TL;DR App

Everybody messes up, today’s mistake was adding a big file to git before a .gitignore was in place to handle it. As a result, github is rejecting the push, even after “removing” the file from git. The reason is that the file still exists in git(history). Time to clean up the mess, break out BFG and nuke it from orbit. -Sadly this means java is involved, but necessary demons. BFG can be found below, and a java jdk needs to get installed.

BFG Repo-Cleaner by rtyley_A simpler, faster alternative to git-filter-branch for deleting big files and removing passwords from Git history._rtyley.github.io

First things first take a look at BFG repo-cleaner. Welcome back, hopefully there was some reading involved. BFG repo-cleaner will be used to clean up the big files, this can also be used to clean up sensitive data that someone accidentally added to a repo. “cough cough” aws keys. It does this by rewriting the git history and removing all traces of the file. Like many things git sometimes is better not to explain the wizardry and dive right in.

TLDR oh my git… just do this… black magic ensues.

Welcome back from blindly running commands found on the internet, everything worked correctly right? Time to break down what just happened. The prework is setting up BFG and getting it loaded into the environment. A folder structure is created in the home folder to store the jar. The jar is then downloaded and a symlink is created so that when the new version is added the old symlink can get deleted and reset. This is not entirely needed but it certainly helps. Next the folder is added to the path env variable in the bash_profile file. Then sourcing the bash_profile to use the new path and the new folder. It is not required to do all of this but, let’s be honest, this is going to happen more than once and it is better to have this in there for the future. After that the repo is cloned ( most likely it already exists so don’t worry. Then git garbage collection is run. Next move out of the directory because BFG needs to be run not in the current dir. Fire the BFG passing in the file or wildcard that should be nuked.Drop back in the folder.Expire the get reference log which cleans up some things BFG didFinally git garbage collection to clean up the rest of the cruft.

That’s that, files have been removed and all history of them existing has been wiped. This type of process can be especially useful when combined with a git hook and a regex for specific things in files, like keys and whatnot. It can also easily be tied into a Jenkins build pipeline to protect people from themselves. Good luck and when in doubt break out the Big “Friggin” Gun

BONUS: there is a fantastic zine from julia evans that talks about some other great git things

New zine: Oh shit, git!_Hello! Last week Katie Sylor-Miller and I released a new zine called "Oh shit, Git!". It has a bunch of common git…_jvns.ca

(Links for everything mentioned:)

Removing sensitive data from a repository - User Documentation_If you commit sensitive data, such as a password or SSH key into a Git repository, you can remove it from the history…_help.github.com

BFG Repo-Cleaner by rtyley_A simpler, faster alternative to git-filter-branch for deleting big files and removing passwords from Git history._rtyley.github.io

Git - git-reflog Documentation_The "show" subcommand (which is also the default, in the absence of any subcommands) shows the log of the reference…_git-scm.com

Git - git-gc Documentation_If the number of packs exceeds the value of gc.autoPackLimit, then existing packs (except those marked with a.keep file…_git-scm.com


Written by mikefettis | hacker and janitor building platforms and systems that when they work no one knows they are there.
Published by HackerNoon on 2018/11/09