Basti's Scratchpad on the Internet

Notes on Synchronization and Backup of $HOME using git, git-annex and mr

This is a dump of all git, git-annex and mr related configuration and scripts so I can edit them in one place. When tangled,

Repository structure,

~/annex/

Instead of having a single large annex folder synced across machines, I have split all files into 6 annexes,

  • documents
  • music
  • notes
  • old-code
  • photos

This scheme makes git-annex much faster. At first I was reluctant to go with this, instead of pushing pulling single annex, now I have to deal with 6. Then I found about mr which lets you run commands on a collection of repositories, even though there are 6 repos with mr a single command will push pull all of them.

These annexes are shared between 3 computers (one with two full copy of all repos and two with partial copies.), all behind NAT so all clients dump data to rsync.net (GPG encrypted with separate keys.) and sync changes with a repo inside rsync.net.

When ever I make changes to any of the repos, I just run mr push. It will iterate repositories with changes (new files/deletes/renames) and upload changes to and sync with rsync.net. Then when I switch machines I just do a mr pull which downloads all changes.

~/.mrconfig

include = cat ~/annex/.mrconfig-annex ~/source/.mrconfig-source
jobs = 3

~/annex/.mrconfig

[DEFAULT]
git_sync = git annex sync "$@"
git_push = git fast-push
git_pull = git annex sync
git_status = git na-status
git_drop = git git-drop-unused here 0
git_dead = git annex dead "$@"
git_upload = git annex sync; git annex copy --to depot --not --in depot ; git annex sync
git_download = git annex sync; git annex get --quiet --fast; git annex sync
git_missing = git annex find . --not --in depot
git_ffsck = git annex fsck --fast --from depot

lib = 
    enableCloud() {
       git config remote.origin.annex-ignore true
       git annex init "`hostname`"
       git annex enableremote depot
    }

[annex/documents]
checkout = git clone 'ssh://rsync/~/git-bare/documents.git' 'documents'
           cd documents/
           enableCloud     
skip = lazy
drop = git-drop-unused.hy here 0
       git-drop-unused.hy depot 30

[annex/photos]
checkout = git clone 'ssh://rsync/~/git-bare/photos.git' 'photos'
           cd photos/
           enableCloud
skip = lazy
drop = git-drop-unused.hy here 7
       git-drop-unused.hy depot 120

[annex/old-code]
checkout = git clone 'ssh://rsync/~/git-bare/old-code.git' 'old-code'
           cd old-code/
           enableCloud
skip = lazy
drop = git-drop-unused.hy here 0
       git-drop-unused.hy depot 30

[annex/music]
checkout = git clone 'ssh://rsync/~/git-bare/music.git' 'music'
           cd music/
           enableCloud
skip = lazy
drop = git-drop-unused.hy here 0
       git-drop-unused.hy depot 15

[annex/notes]
checkout = git clone 'ssh://rsync/~/git-bare/notes.git' 'notes'
           cd notes/
           enableCloud
sync = git annex sync
       git annex get .
       git annex copy --to depot --not --in depot
       git annex sync
pull = mr sync
drop = git-drop-unused.hy here 7
       git-drop-unused.hy depot 180

[annex/kiler]
pull = mr sync
push = mr sync
copy = git annex add .
       git commit -m 'Auto Add' || true
       git annex copy . --to yesim --auto
       git annex copy . --to buse --auto
       git annex sync
drop = git-drop-unused.hy here 0

~/.bin/git-drop-unused.hy

Drop all unused files by date,

#!/usr/bin/env hy

(import  [sh [grep git perl awk ErrorReturnCode]]
	 [re [split]]
	 [datetime [datetime date]]
	 [sys])

(def remote (if (>= (len sys.argv) 2)
	      (second sys.argv)
	      "here"))

(def drop-age (if (= (len sys.argv) 3)
		(int (nth sys.argv 2))
		180))

(defn unused-files []
  (let [[files (try 
		(-> (.annex git "unused" "--from" remote)
		    (perl "-ne" "print if /^    [0-9]+      .*/")
		    str)
		(catch [e ErrorReturnCode] ""))]]
    (->> files 
	 (split "\n")
	 (map (fn [x] 
		(->> (.strip x)
		     (split "      ")
		     (take 2)
		     (map (fn [x] (.strip x))))))
	 (filter (fn [x] 
		   (= (len x) 2)))
	 list)))

(defn last-seen [file]
  (let [[key (second file)]]
    (->> (git "--no-pager" "log" "-1" "-S" key "--pretty=format:%at")
	 str
	 (split "\n")
	 (map (fn [x] (.fromtimestamp datetime (float x))))
	 first)))

(defn age [file]
  (let [[delta (- (.today datetime) (last-seen file))]]
    delta.days))

(print "Dropping " remote)

(for [file (unused-files)]
  (let [[id (first file)]
	[file-age (age file)]]

    (if (>= file-age drop-age)
      (do 
       (print "Id " id " age " file-age " days...")
       (if (= remote "here")
	 (.annex git "dropunused" "--force" (str id))
	 (.annex git "dropunused" "--force" "--from" remote (str id)))))))

~/.bin/git-na-status

When running mr status avoid running git status on direct mode annex repos otherwise you get a bunch of type change changes.

#/bin/bash

if [ -d ".git/annex/" ]; then
    if `git config --get annex.direct`; then
        git annex status
    else
        git status --short
    fi
else
    git status --short
fi

~/.bin/git-pull-changes

Try to avoid running git annex get . (which takes a while on large repos) by comparing HEAD, before and after sync only try to get files when there are changes.

#/bin/bash

if [ -d '.git/annex/' ]; then
    oldHead=`git rev-parse HEAD`
    git annex sync;
    newHead=`git rev-parse HEAD`
    if [ "$oldHead" != "$newHead" ]; then
        git annex get . --fast  --quiet
        git annex sync
    else
        echo "No Change to Get..."
    fi
else
    git pull origin master
fi

~/.bin/git-fast-push

Custom push command. For repositories with no changes it simply returns true, for repositories with changes or new files,

  • If acting on a regular git repo, pushes changes to origin.
  • If acting on a git annex repo, uploads changes and sync with rsync.net.
#/bin/bash

updateAnnexHost() {
    echo 'Updating Remote...'
    ORIGIN=`git config --get remote.origin.url`
    HOST=`echo "$ORIGIN" | grep -oiP '//.*?\/' | cut -d/ -f3`
    DIR="/${ORIGIN#*//*/}"
    echo "$HOST $DIR"
    ssh $HOST "cd $DIR;git annex sync"
}

hasNoChanges(){
    git diff-index --quiet HEAD --
}

hasNewFiles(){
    if [ `git ls-files --exclude-standard --others| wc -l` != 0 ]; then 
        true
    else
        false
    fi
}

isRepoAhead(){
    if [ `git log origin/$(git branch | grep '*' | cut -d' ' -f2)..HEAD | wc -l` != 0 ]; then 
        true
    else
        false
    fi
}

#handle direct annex repo
if `git config --get annex.direct`; then
    oldHead=`git rev-parse HEAD`
    git annex add .
    git annex sync
    newHead=`git rev-parse HEAD`
    if [ "$oldHead" != "$newHead" ]; then
        if git config remote.depot.annex-uuid; then
            git annex copy --to depot --not --in depot
            git annex sync
        else
            git annex copy --to origin --not --in origin
            updateAnnexHost
        fi
    fi
    exit
fi

if ! hasNoChanges || hasNewFiles || isRepoAhead; then 
#handle indirect annex repo
    if [ -d '.git/annex/' ]; then    
        git annex add .
        git annex sync
        if git config remote.depot.annex-uuid; then
            git annex copy --to depot --not --in depot
            git annex sync
        else
            git annex copy --to origin --not --in origin
            updateAnnexHost
        fi
        exit
#handle plain git repo        
    else
        git push origin master
    fi
else
    true
fi

Webapp

Create autostart file relative paths don't work so tangle one file for each OS (Linux,OS X) then mv one to correct location,

/home/nakkaya/annex/notes
/home/nakkaya/annex/music
/home/nakkaya/annex/wallet
/home/nakkaya/annex/photos
/home/nakkaya/annex/old-code
/home/nakkaya/annex/documents
/Users/nakkaya/annex/notes
/Users/nakkaya/annex/music
/Users/nakkaya/annex/wallet
/Users/nakkaya/annex/photos
/Users/nakkaya/annex/documents

Start asistant and webapp,

git annex assistant --autostart && nohup git annex webapp

Misc

Setup encrypted annex directory remote,

git annex initremote mobile type=directory directory=/path/to/annex/repo/ encryption=hybrid keyid=ID embedcreds=yes

Setup encrypted annex S3 remote,

export AWS_ACCESS_KEY_ID="KID"
export AWS_SECRET_ACCESS_KEY="SKEY"
git annex initremote cloud type=S3 encryption=hybrid keyid=ID embedcreds=yes
git setup-bitbucket
git config remote.origin.annex-ignore true

Setup encrypted annex rsync remote,

git annex initremote depot type=rsync encryption=hybrid rsyncurl=rsync:annex/repo/ keyid=ID

/external

.mrconfig

I have one repository called kiler (means basement in Turkish) which holds around 6 TB of data (OS Disks, VM Images, Tech Talks, Movies, TV Shows etc.) spread over 8x2 TB USB drives.

[DEFAULT]
git_sync = git annex-add-sync "$@"
git_drop = git git-drop-unused here 0

[/media/nakkaya/damla/kiler]

[/media/nakkaya/esra/kiler]

[/media/nakkaya/merve/kiler]

[/media/nakkaya/ozge/kiler]

[/media/nakkaya/sedef/kiler]

[/media/nakkaya/ebru/kiler]

[/media/nakkaya/yesim/kiler]

[/media/nakkaya/buse/kiler]

~/.bin/git-annex-add-sync

I just dump files into the repo on one of the disks and run mr sync which will add the file and sync with other drives,

#/bin/bash

if [ -d '.git/annex/' ]; then
    oldHead=`git rev-parse HEAD`
    git annex add .;
    git annex sync
    newHead=`git rev-parse HEAD`
    if [ "$oldHead" != "$newHead" ]; then
        for remote in ` git config --get-regexp remote.*.url | awk '{print $2}'`; do
            (cd $remote && git annex sync)
        done
    else
        true
    fi
else
    true
fi

Misc

For my copy/paste pleasure, steps for adding a new disk.

git clone /media/nakkaya/esra/kiler/
git remote remove origin

DISKS="ebru damla esra merve ozge sedef yesim buse"

for i in $DISKS; do 
    git remote add $i /media/nakkaya/$i/kiler/
done

git annex init "new-disk-name"
git annex sync

for i in $DISKS; do 
    cd /media/nakkaya/$i/kiler/
    git remote add "new-disk-name" /media/nakkaya/new-disk-name/kiler/
done

~/source/

~/source/.mrconfig

Git Repos,

[DEFAULT]
git_pull = git pull origin master
git_push = git fast-push
sync = true

[source/latte]
checkout = git clone 'ssh://11837@rsync/~/latte.git' 'latte'
skip=lazy

[source/alter-ego]
checkout = git clone 'git@github.com:nakkaya/alter-ego.git' 'alter-ego'
skip=lazy

[source/ardrone]
checkout = git clone 'git@github.com:nakkaya/ardrone.git' 'ardrone'
skip=lazy

[source/clodiuno]
checkout = git clone 'git@github.com:nakkaya/clodiuno.git' 'clodiuno'
skip=lazy

[source/easy-dns]
checkout = git clone 'git@github.com:nakkaya/easy-dns.git' 'easy-dns'
skip=lazy

[source/emacs]
checkout = git clone 'git@github.com:nakkaya/emacs.git' 'emacs'
           cd emacs
           git submodule init
           git submodule update

[source/inbox-feed]
checkout = git clone 'git@github.com:nakkaya/inbox-feed.git' 'inbox-feed'
skip=lazy

[source/nakkaya.com]
checkout = git clone 'git@github.com:nakkaya/nakkaya.com.git' 'nakkaya.com'
skip=lazy

[source/net-eval]
checkout = git clone 'git@github.com:nakkaya/net-eval.git' 'net-eval'
skip=lazy

[source/neu-islanders]
checkout = git clone 'ssh://11837@rsync/~/neu-islanders.git' 'neu-islanders'
skip=lazy

[source/pid]
checkout = git clone 'git@github.com:nakkaya/pid.git' 'pid'
skip=lazy

[source/static]
checkout = git clone 'git@github.com:nakkaya/static.git' 'static'
skip=lazy

[source/vector-2d]
checkout = git clone 'git@github.com:nakkaya/vector-2d.git' 'vector-2d'
skip=lazy

[source/vision]
checkout = git clone 'git@github.com:nakkaya/vision.git' 'vision'
skip=lazy

[source/doganilic.com]
checkout = git clone 'ssh://rsync/~/git-bare/doganilic.com.git' 'doganilic.com'
skip=lazy

[source/coin-trader]
checkout = git clone 'ssh://rsync/~/git-bare/coin-trader.git' 'coin-trader'
skip=lazy

[source/vehicle-tracking]
checkout = git clone 'git@gitlab.neu.edu.tr:nakkaya/vehicle-tracking.git' 'vehicle-tracking'
skip=lazy
Other posts
comments powered by Disqus