Fix Git repository after Subversion conversion

After converting a Git repository from Subversion with svn2git, all was well. At least until I wanted to squash some of the oldest commits with the excellent interactive rebase. Full of fail, I realized that I might have to do some cleanup before going on such a bold expedition. Here’s a couple tricks for “post-processing” a converted repository.

  1. Make a backup! Ideally, do this work on a clone of the original repository, then use a tool like Meld (or simply diff -r repo_backup new_repo) to check that the resulting files are the same as before.
  2. Remove empty commits: git filter-branch --commit-filter 'if [ z$1 = z`git rev-parse $3^{tree}` ]; then skip_commit "$@"; else git commit-tree "$@"; fi' "$@"
  3. Garbage collect and prune: git gc --prune
  4. Start interactive rebase from the first commit: git rebase -i $(git log --format=%H | tail -1)
  5. Squash all commits with empty messages. These are shown as multiple commit IDs on the same line with a space and > between them, like this:
    pick 1111111 >2222222 >3333333 Message

    Just split these up and remove the commits, like this:

    fixup 1111111 nothing
    fixup 2222222 nothing
    pick 3333333 Message

    In Vim, you can do this by repeating the following command until it reports no hits: :%s/^pick \([a-f0-9]\{7\}\) >\([a-f0-9]\{7\}\)/fixup \1 nothing\rpick \2/g

  6. Exit the editor, and the rebase should complete on its own.

Warning, as always: YMMV and RTFM.

cvs2git2svn

After discovering Ohloh, cleaning up and publishing repositories of yore seemed like a good idea. One of them was established back in the CVS newbie days, and contained lots of external binaries – Not the kind of thing you want to version control. Having used CVS, Subversion and Git (in that order), there was only one choice: Interactive rebase with Git. Also, the software was created while at CERN, so it should continue to be hosted there. And they had started a Subversion service in the meantime, so it was time to upgrade as well.

These instructions should fit for any CERN project, and can easily be modified to fit any repository. The usual warnings apply: YMMV and RTFM.

  1. Set some variables to avoid typing: svn_repo=Repository_name
    svn_user=User_with_edit_access
  2. Install the tools: sudo apt-get install cvs2svn git-core git-svn
  3. Create the cvs2git working directory: cvs2git_wd=$(mktemp -dt cvs2git.XXXXXXXXXX)
  4. Copy the contents of the repository (not a working copy) to the working directory: scp -r $svn_user@lxplus.cern.ch:/afs/cern.ch/project/svn/reps/${svn_repo}/* $cvs2git_wd. Don’t worry if /hooks is not copied – You don’t need it. If you don’t have filesystem access to the repository, you can try cvssuck. Be warned: It’s really slow.
  5. Set cvs2git global options:
    1. zcat /usr/share/doc/cvs2svn/examples/cvs2git-example.options.gz > $cvs2git_wd/cvs2git.options
    2. Modify at least ctx.username and author_transforms in $cvs2git_wd/cvs2git.options.
  6. Make the new Git repository: git_wd=$(mktemp -dt git.XXXXXXXXXX) && git init $git_wd
  7. Convert to Git (repeat for each module):
    1. Modify run_options.set_project in $cvs2git_wd/cvs2git.options
    2. Create Git import files: cd $cvs2git_wd && cvs2git --options=cvs2git.options. If you get any warnings or errors you might have to change the options again.
    3. Import to Git: cd $git_wd && cat $cvs2git_wd/cvs2svn-tmp/git-blob.dat $cvs2git_wd/cvs2svn-tmp/git-dump.dat | git fast-import
  8. Make a backup in case the rest goes hairy.
  9. If you need to (which was kind of the point of this exercise), do an interactive rebase from the first commit: git rebase -i $(git log --format=%H | tail -1).
  10. git-svn needs at least one commit to be in the Subversion repository: svn_wd=$(mktemp -dt svn.XXXXXXXXXX) && svn co --username $svn_user svn+ssh://${svn_user}@svn.cern.ch/reps/${svn_repo} $svn_wd && cd $svn_wd && touch .temp && svn add .temp && svn ci -m "git-svn dummy commit"
  11. Convert to Subversion:
    1. Prepare git-svn repository: git2svn_wd=$(mktemp -dt git2svn.XXXXXXXXXX) && git svn clone --username $svn_user svn+ssh://${svn_user}@svn.cern.ch/reps/${svn_repo} $git2svn_wd && cd $git2svn_wd
    2. Get Git commits: git fetch $git_wd
    3. Apply Git commits as master branch: git branch tmp $(cut -b-40 .git/FETCH_HEAD) && git tag -am "Last fetch" last tmp && first_commit=$(git log --format=%H | tail -1) && git checkout $first_commit . && git commit -C $first_commit
    4. Apply Git commits: git rebase master tmp && git branch -M tmp master
    5. Check if this works : git svn dcommit --rmdir --find-copies-harder --dry-run
    6. If it does, you’re good to go: git svn dcommit --rmdir --find-copies-harder

If the last step fails, the easiest way to continue is just to remove all commits from the Subversion repository, fix the Git repository, and restart at step 10.

N-way Git synchronization with extra cheese

Index

  1. Background
  2. Converting Subversion to Git
  3. Generate and version .gitignore files
  4. Git via proxy
  5. Setting up pull everywhere
  6. References

Background

I’ve got a desktop and server behind a router with a dynamic IP address at home, a desktop at work, and a laptop that floats around. I’d very much like to have the same settings on all of them, and to be able to synchronize them as easily as possible. I’ve been using Subversion for this, but recent trouble with symlinks and a long-term concern that storing the revision history centrally (even with backups now and then) is a Bad Move in the long term. So when I had to start using Git at work, and after realizing that it could solve both problems (at least in theory), I tried figuring out how to do this. After lots of tries followed by rm -rf settings/, I think I’ve got a working setup. Of course, I don’t guarantee that any of this will work for you.

Converting Subversion to Git

Install the necessary software:
sudo apt-get install git-svn

Copy the following code into a file named svn2git.sh, and run it as documented below.

svn2git.sh

#!/bin/sh
#
# NAME
#    svn2git.sh - Convert a Subversion repository to Git
#
# SYNOPSIS
#    svn2git.sh [options] <Subversion URL> 
#
# OPTIONS
#    --authors=path  Authors file
#    -v,--verbose    Verbose output
#
# EXAMPLE
#    /path/to/svn2git.sh https://example.org/foo
#
#    Create authors file for repository
#
#    /path/to/svn2git.sh -v --authors=authors.txt https://example.org/foo
#
#    Get Subversion repository to ./foo.git
#
# DESCRIPTION
#    Two-part script to migrate from Subversion to Git. First it tries to get
#    a list of the Subversion authors, so it can be formatted to fit the Git
#    commit structure. When running with the authors file, it will fetch the
#    entire Subversion revision history.
#
# BUGS
#    Email bugs to victor dot engmark at gmail dot com. Please include the
#    output of running this script in verbose mode (-v).
#
# COPYRIGHT AND LICENSE
#    Copyright (C) 2009 Victor Engmark
#
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
################################################################################

# Output error message with optional error code
error()
{
    if [ -z "$2" ]
    then
        error_code=$EX_UNKNOWN
    else
        error_code=$2
    fi
    echo "$1" >&2
    exit $error_code
}

usage()
{
    error "Usage: ${cmdname} [-v|--verbose] [--authors=path] <Subversion URL>" $EX_USAGE
}

verbose_echo()
{
    if [ $verbose ]
    then
        echo "$*"
    fi
}

# Use for mandatory directory checks
# $1 is the directory path
# $2 is the (optional) error message
directory_exists()
{
    if [ ! -d $1 ]
    then
        error "No such directory '${1}'
$2" $EX_NO_SUCH_DIR
    fi
}

# Make sure an executable is available
# $1 is the path to the executable
# $2 is the (optional) error message
executable_exists()
{
    if [ ! -x $1 ]
    then
        error "No such executable '${1}'
$2" $EX_NO_SUCH_EXEC
    fi
}

PATH="/usr/bin:/bin"
cmdname=`basename $0`
directory=$PWD

# Exit codes from /usr/include/sysexits.h, as recommended by
# http://www.faqs.org/docs/abs/HTML/exitcodes.html
EX_OK=0           # successful termination
EX_USAGE=64       # command line usage error
EX_DATAERR=65     # data format error
EX_NOINPUT=66     # cannot open input
EX_NOUSER=67      # addressee unknown
EX_NOHOST=68      # host name unknown
EX_UNAVAILABLE=69 # service unavailable
EX_SOFTWARE=70    # internal software error
EX_OSERR=71       # system error (e.g., can't fork)
EX_OSFILE=72      # critical OS file missing
EX_CANTCREAT=73   # can't create (user) output file
EX_IOERR=74       # input/output error
EX_TEMPFAIL=75    # temp failure; user is invited to retry
EX_PROTOCOL=76    # remote error in protocol
EX_NOPERM=77      # permission denied
EX_CONFIG=78      # configuration error

# Custom errors
EX_UNKNOWN=1
EX_NO_SUCH_DIR=91
EX_NO_SUCH_EXEC=92

# Process parameters
until [ $# -eq 0 ]
do
    case $1 in
        -v|--verbose)
            verbose=1
            shift
            ;;
        --authors=*)
            authors_file=${directory}/$(echo "$1" | cut -c11-)
            shift
            ;;
        *)
            if [ -z $svn_url ]
            then
                svn_url=$1
                shift
            else
                # Unknown parameter
                usage
            fi
            ;;
    esac
done

if [ -z $svn_url ]
then
    # No Subversion URL provided
    usage
fi

repository_name=`basename $svn_url`

verbose_echo "Running $cmdname at `date`."

# Preliminary checks
directory_exists "$source_base"
executable_exists "/usr/bin/git"
executable_exists "/usr/bin/git-svn"
executable_exists "/usr/bin/svn"

verbose_echo "Source repository: '${svn_url}'"

if [ -z $authors_file ]
then
    # Get authors file
    authors_file="${directory}/${repository_name}-authors.txt"
    if [ -e $authors_file ]
    then
        error "Authors file '${authors_file}' already exists"
    fi
    verbose_echo "Authors file: ${authors_file}"

    svn log --quiet "${svn_url}" | grep '^r.*' | cut -d ' ' -f 3- | cut -d '|' -f 1 | sort | uniq > "${authors_file}"

    author="$(head -1 $authors_file)"
    echo "Please modify ${authors_file} to a format like"
    echo "${author}= Full Name <${author}@example.org>"
    echo "and rerun $cmdname with --authors=${authors_file}"
else
    if [ ! -e $authors_file ]
    then
        error "Authors file '${authors_file}' doesn't exist"
    fi

    git_target="${directory}/${repository_name}.git"
    if [ -e $git_target ]
    then
        error "Target repository '${git_target}' already exists"
    fi
    verbose_echo "Target repository: '${git_target}'"

    # Clone
    git-svn clone --no-metadata --authors-file="${authors_file}" --revision 1:1 "$svn_url" "$git_target" || error "Clone failed"

    # Fetch
    cd "$git_target"
    batch_start=2
    revisions=$(svn info "$svn_url" | grep '^Revision:' | awk '{print $2}')
    while [ $batch_start -le $revisions ]
    do
        batch_end=$(expr $batch_start + 990)
        if [ $batch_end -gt $revisions ]
        then
            batch_end=$revisions
        fi

        verbose_echo "Fetching revisions $batch_start through $batch_end"
        git-svn fetch --authors-file="${authors_file}" --revision $batch_start:$batch_end || error "Fetch failed"
        
        batch_start=$(expr $batch_end + 1)
    done

    git rebase git-svn

    verbose_echo "Applying svn:ignore properties"
    git-svn show-ignore >> .git/info/exclude

    verbose_echo "Removing references to Subversion"
    git config --remove-section svn-remote.svn
    rm --recursive --force .git/svn/
fi

verbose_echo "Cleaning up."
cd "$directory"

verbose_echo "${cmdname} completed at `date`."
exit $EX_OK

Now make sure you do a directory diff between the old Subversion and the new Git repositories to see if it succeeded.

Now you can get this on other machines using
git clone --origin example ssh://example.org/~/settings

Generate and version .gitignore files

This is an optional step in case you would like to version the old svn:ignore properties as .gitignore files:

exclude2gitignore.sh

#!/bin/sh
#
# NAME
#    exclude2gitignore.sh - Convert $GIT_DIR/info/exclude to corresponding
#    .gitignore files
#
# SYNOPSIS
#    exclude2gitignore.sh [options] /path/to/repository
#
# OPTIONS
#    -v,--verbose    Verbose output
#
# EXAMPLE
#    /path/to/exclude2gitignore.sh ~/foo
#
#    Create .gitignore files for the Git repository in ~/foo
#
# DESCRIPTION
#    Based on the format generated by `git-svn show-ignore`, where non-comment
#    lines indicate ignored files. Will try to put the .gitignore as close as
#    possible to the ignored file(s).
#
# BUGS
#    Email bugs to victor dot engmark at gmail dot com. Please include the
#    output of running this script in verbose mode (-v).
#
# COPYRIGHT AND LICENSE
#    Copyright (C) 2009 Victor Engmark
#
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
################################################################################

# Output error message with optional error code
error()
{
    if [ -z "$2" ]
    then
        error_code=$EX_UNKNOWN
    else
        error_code=$2
    fi
    echo "$1" >&2
    exit $error_code
}

usage()
{
    error "Usage: ${cmdname} [-v|--verbose] /path/to/repository" $EX_USAGE
}

verbose_echo()
{
    if [ $verbose ]
    then
        echo "$*"
    fi
}

# Use for mandatory directory checks
# $1 is the directory path
# $2 is the (optional) error message
directory_exists()
{
    if [ ! -d $1 ]
    then
        error "No such directory '${1}'
$2" $EX_NO_SUCH_DIR
    fi
}

# Make sure an executable is available
# $1 is the path to the executable
# $2 is the (optional) error message
executable_exists()
{
    if [ ! -x $1 ]
    then
        error "No such executable '${1}'
$2" $EX_NO_SUCH_EXEC
    fi
}

PATH="/usr/bin:/bin"
cmdname=`basename $0`
directory=$PWD

# Exit codes from /usr/include/sysexits.h, as recommended by
# http://www.faqs.org/docs/abs/HTML/exitcodes.html
EX_OK=0           # successful termination
EX_USAGE=64       # command line usage error
EX_DATAERR=65     # data format error
EX_NOINPUT=66     # cannot open input
EX_NOUSER=67      # addressee unknown
EX_NOHOST=68      # host name unknown
EX_UNAVAILABLE=69 # service unavailable
EX_SOFTWARE=70    # internal software error
EX_OSERR=71       # system error (e.g., can't fork)
EX_OSFILE=72      # critical OS file missing
EX_CANTCREAT=73   # can't create (user) output file
EX_IOERR=74       # input/output error
EX_TEMPFAIL=75    # temp failure; user is invited to retry
EX_PROTOCOL=76    # remote error in protocol
EX_NOPERM=77      # permission denied
EX_CONFIG=78      # configuration error

# Custom errors
EX_UNKNOWN=1
EX_NO_SUCH_DIR=91
EX_NO_SUCH_EXEC=92

# Process parameters
until [ $# -eq 0 ]
do
    case $1 in
        -v|--verbose)
            verbose=1
            shift
            ;;
        *)
            if [ -z $repository ]
            then
                repository="${1%\/}"
                shift
            else
                # Unknown parameter
                usage
            fi
            ;;
   esac
done

verbose_echo "Running $cmdname at `date`."

directory_exists "$repository"

grep '^/' "${repository}/.git/info/exclude" | while read line
do
    ignore_path="${repository}${line}"
    verbose_echo "Starting with $ignore_path"
    ignore_name="$ignore_path"

    # Strip globs in path
    ignore_path=`dirname "$ignore_path"`
    while [ ! -e "$ignore_path" ]
    do
        ignore_path=`dirname "$ignore_path"`
    done

    # Remove path from file name (need +2 to include the end slash and to
    # compensate for 1-based indexing
    name_length=$(expr length "$ignore_name")
    path_length=$(expr length "$ignore_path" + 2)
    ignore_name=$(expr substr "$ignore_name" $path_length $name_length)

    # Complete .gitignore path
    ignore_path="${ignore_path}/.gitignore"

    verbose_echo "$ignore_name >> $ignore_path"
    echo "$ignore_name" >> "$ignore_path"
done

verbose_echo "Cleaning up."
cd "$directory"

verbose_echo "${cmdname} completed at `date`."
exit $EX_OK

Git via proxy

One of the machines involved is behind a gateway machine at work, so I had to add the following to ~/.ssh/config:

Host work
     ProxyCommand ssh -q gateway.example.org nc %h %p $*
     HostName work-pc.example.org

With this, it’s possible to refer to just “work”, and SSH commands (even via Git) will take care of connecting via the proxy.

Setting up pull everywhere

The main idea here is to set up Git “remotes” pointing to all the other machines.

To be able to get the updates from the repository in ~/settings on my.example.org, simply run the following on all machines (except, of course, the home machine):
git remote add home ssh://home-pc.example.net/~/settings

To be able to get the updates from the “work” host specified with a proxy above, just use “work” for the host name:
git remote add work ssh://work/~/settings

To be able to pull from a machine which changes IP address, you could set up a DynDNS account and use one of their recommended update scripts to be able to refer to your machine using a single DNS name.

After cloning one of the copies on all of your hosts, you should be able to do the following to get all the changes from the repositories:
git remote update && git pull
If this doesn’t work, you might have more luck fetching each repository individually, and then rebasing to it:
git fetch home && git rebase home/master

To keep a backup on a separate machine, just do a
git clone --origin example ssh://example.org/~/settings
there and set up pushing defaults on the other machines using
git config push.default matching
git remote add backup ssh://backup.example.org/~/settings

Then you can just git push backup master to backup the local master branch.

References