Notes on Synchronization and Backup of $HOME using git, git-annex and mr
This is a dump of all git, git-annex and mr related configuration and scripts so I can edit them in one place. When tangled,
- All mr configs will go into their respective folders.
- All Bash scripts will go into ~/.bin/ folder.
Repository structure,
- ~/annex/ - git-annex repositories that are synced between computers.
- ~/source/ - Git repos.
- /external - Single git-annex repo spread over multiple USB drives. Working as a Poor Mans Raid.
~/annex/
Instead of having a single large annex folder synced across machines, I have split all files into 6 annexes,
- documents
- music
- notes
- old-code
- photos
This scheme makes git-annex much faster. At first I was reluctant to go with this, instead of pushing pulling single annex, now I have to deal with 6. Then I found about mr which lets you run commands on a collection of repositories, even though there are 6 repos with mr a single command will push pull all of them.
These annexes are shared between 3 computers (one with two full copy of all repos and two with partial copies.), all behind NAT so all clients dump data to rsync.net (GPG encrypted with separate keys.) and sync changes with a repo inside rsync.net.
When ever I make changes to any of the repos, I just run mr push. It will iterate repositories with changes (new files/deletes/renames) and upload changes to and sync with rsync.net. Then when I switch machines I just do a mr pull which downloads all changes.
~/.mrconfig
include = cat ~/annex/.mrconfig-annex ~/source/.mrconfig-source jobs = 3
~/annex/.mrconfig
[DEFAULT] git_sync = git annex sync "$@" git_push = git fast-push git_pull = git annex sync git_status = git na-status git_drop = git git-drop-unused here 0 git_dead = git annex dead "$@" git_upload = git annex sync; git annex copy --to depot --not --in depot ; git annex sync git_download = git annex sync; git annex get --quiet --fast; git annex sync git_missing = git annex find . --not --in depot git_ffsck = git annex fsck --fast --from depot lib = enableCloud() { git config remote.origin.annex-ignore true git annex init "`hostname`" git annex enableremote depot } [annex/documents] checkout = git clone 'ssh://rsync/~/git-bare/documents.git' 'documents' cd documents/ enableCloud skip = lazy drop = git-drop-unused.hy here 0 git-drop-unused.hy depot 30 [annex/photos] checkout = git clone 'ssh://rsync/~/git-bare/photos.git' 'photos' cd photos/ enableCloud skip = lazy drop = git-drop-unused.hy here 7 git-drop-unused.hy depot 120 [annex/old-code] checkout = git clone 'ssh://rsync/~/git-bare/old-code.git' 'old-code' cd old-code/ enableCloud skip = lazy drop = git-drop-unused.hy here 0 git-drop-unused.hy depot 30 [annex/music] checkout = git clone 'ssh://rsync/~/git-bare/music.git' 'music' cd music/ enableCloud skip = lazy drop = git-drop-unused.hy here 0 git-drop-unused.hy depot 15 [annex/notes] checkout = git clone 'ssh://rsync/~/git-bare/notes.git' 'notes' cd notes/ enableCloud sync = git annex sync git annex get . git annex copy --to depot --not --in depot git annex sync pull = mr sync drop = git-drop-unused.hy here 7 git-drop-unused.hy depot 180 [annex/kiler] pull = mr sync push = mr sync copy = git annex add . git commit -m 'Auto Add' || true git annex copy . --to yesim --auto git annex copy . --to buse --auto git annex sync drop = git-drop-unused.hy here 0
~/.bin/git-drop-unused.hy
Drop all unused files by date,
#!/usr/bin/env hy (import [sh [grep git perl awk ErrorReturnCode]] [re [split]] [datetime [datetime date]] [sys]) (def remote (if (>= (len sys.argv) 2) (second sys.argv) "here")) (def drop-age (if (= (len sys.argv) 3) (int (nth sys.argv 2)) 180)) (defn unused-files [] (let [[files (try (-> (.annex git "unused" "--from" remote) (perl "-ne" "print if /^ [0-9]+ .*/") str) (catch [e ErrorReturnCode] ""))]] (->> files (split "\n") (map (fn [x] (->> (.strip x) (split " ") (take 2) (map (fn [x] (.strip x)))))) (filter (fn [x] (= (len x) 2))) list))) (defn last-seen [file] (let [[key (second file)]] (->> (git "--no-pager" "log" "-1" "-S" key "--pretty=format:%at") str (split "\n") (map (fn [x] (.fromtimestamp datetime (float x)))) first))) (defn age [file] (let [[delta (- (.today datetime) (last-seen file))]] delta.days)) (print "Dropping " remote) (for [file (unused-files)] (let [[id (first file)] [file-age (age file)]] (if (>= file-age drop-age) (do (print "Id " id " age " file-age " days...") (if (= remote "here") (.annex git "dropunused" "--force" (str id)) (.annex git "dropunused" "--force" "--from" remote (str id)))))))
~/.bin/git-na-status
When running mr status avoid running git status on direct mode annex repos otherwise you get a bunch of type change changes.
#/bin/bash if [ -d ".git/annex/" ]; then if `git config --get annex.direct`; then git annex status else git status --short fi else git status --short fi
~/.bin/git-pull-changes
Try to avoid running git annex get . (which takes a while on large repos) by comparing HEAD, before and after sync only try to get files when there are changes.
#/bin/bash if [ -d '.git/annex/' ]; then oldHead=`git rev-parse HEAD` git annex sync; newHead=`git rev-parse HEAD` if [ "$oldHead" != "$newHead" ]; then git annex get . --fast --quiet git annex sync else echo "No Change to Get..." fi else git pull origin master fi
~/.bin/git-fast-push
Custom push command. For repositories with no changes it simply returns true, for repositories with changes or new files,
- If acting on a regular git repo, pushes changes to origin.
- If acting on a git annex repo, uploads changes and sync with rsync.net.
#/bin/bash updateAnnexHost() { echo 'Updating Remote...' ORIGIN=`git config --get remote.origin.url` HOST=`echo "$ORIGIN" | grep -oiP '//.*?\/' | cut -d/ -f3` DIR="/${ORIGIN#*//*/}" echo "$HOST $DIR" ssh $HOST "cd $DIR;git annex sync" } hasNoChanges(){ git diff-index --quiet HEAD -- } hasNewFiles(){ if [ `git ls-files --exclude-standard --others| wc -l` != 0 ]; then true else false fi } isRepoAhead(){ if [ `git log origin/$(git branch | grep '*' | cut -d' ' -f2)..HEAD | wc -l` != 0 ]; then true else false fi } #handle direct annex repo if `git config --get annex.direct`; then oldHead=`git rev-parse HEAD` git annex add . git annex sync newHead=`git rev-parse HEAD` if [ "$oldHead" != "$newHead" ]; then if git config remote.depot.annex-uuid; then git annex copy --to depot --not --in depot git annex sync else git annex copy --to origin --not --in origin updateAnnexHost fi fi exit fi if ! hasNoChanges || hasNewFiles || isRepoAhead; then #handle indirect annex repo if [ -d '.git/annex/' ]; then git annex add . git annex sync if git config remote.depot.annex-uuid; then git annex copy --to depot --not --in depot git annex sync else git annex copy --to origin --not --in origin updateAnnexHost fi exit #handle plain git repo else git push origin master fi else true fi
Webapp
Create autostart file relative paths don't work so tangle one file for each OS (Linux,OS X) then mv one to correct location,
/home/nakkaya/annex/notes /home/nakkaya/annex/music /home/nakkaya/annex/wallet /home/nakkaya/annex/photos /home/nakkaya/annex/old-code /home/nakkaya/annex/documents
/Users/nakkaya/annex/notes /Users/nakkaya/annex/music /Users/nakkaya/annex/wallet /Users/nakkaya/annex/photos /Users/nakkaya/annex/documents
Start asistant and webapp,
git annex assistant --autostart && nohup git annex webapp
Misc
Setup encrypted annex directory remote,
git annex initremote mobile type=directory directory=/path/to/annex/repo/ encryption=hybrid keyid=ID embedcreds=yes
Setup encrypted annex S3 remote,
export AWS_ACCESS_KEY_ID="KID" export AWS_SECRET_ACCESS_KEY="SKEY" git annex initremote cloud type=S3 encryption=hybrid keyid=ID embedcreds=yes git setup-bitbucket git config remote.origin.annex-ignore true
Setup encrypted annex rsync remote,
git annex initremote depot type=rsync encryption=hybrid rsyncurl=rsync:annex/repo/ keyid=ID
/external
.mrconfig
I have one repository called kiler (means basement in Turkish) which holds around 6 TB of data (OS Disks, VM Images, Tech Talks, Movies, TV Shows etc.) spread over 8x2 TB USB drives.
[DEFAULT] git_sync = git annex-add-sync "$@" git_drop = git git-drop-unused here 0 [/media/nakkaya/damla/kiler] [/media/nakkaya/esra/kiler] [/media/nakkaya/merve/kiler] [/media/nakkaya/ozge/kiler] [/media/nakkaya/sedef/kiler] [/media/nakkaya/ebru/kiler] [/media/nakkaya/yesim/kiler] [/media/nakkaya/buse/kiler]
~/.bin/git-annex-add-sync
I just dump files into the repo on one of the disks and run mr sync which will add the file and sync with other drives,
#/bin/bash if [ -d '.git/annex/' ]; then oldHead=`git rev-parse HEAD` git annex add .; git annex sync newHead=`git rev-parse HEAD` if [ "$oldHead" != "$newHead" ]; then for remote in ` git config --get-regexp remote.*.url | awk '{print $2}'`; do (cd $remote && git annex sync) done else true fi else true fi
Misc
For my copy/paste pleasure, steps for adding a new disk.
git clone /media/nakkaya/esra/kiler/ git remote remove origin DISKS="ebru damla esra merve ozge sedef yesim buse" for i in $DISKS; do git remote add $i /media/nakkaya/$i/kiler/ done git annex init "new-disk-name" git annex sync for i in $DISKS; do cd /media/nakkaya/$i/kiler/ git remote add "new-disk-name" /media/nakkaya/new-disk-name/kiler/ done
~/source/
~/source/.mrconfig
Git Repos,
[DEFAULT] git_pull = git pull origin master git_push = git fast-push sync = true [source/latte] checkout = git clone 'ssh://11837@rsync/~/latte.git' 'latte' skip=lazy [source/alter-ego] checkout = git clone 'git@github.com:nakkaya/alter-ego.git' 'alter-ego' skip=lazy [source/ardrone] checkout = git clone 'git@github.com:nakkaya/ardrone.git' 'ardrone' skip=lazy [source/clodiuno] checkout = git clone 'git@github.com:nakkaya/clodiuno.git' 'clodiuno' skip=lazy [source/easy-dns] checkout = git clone 'git@github.com:nakkaya/easy-dns.git' 'easy-dns' skip=lazy [source/emacs] checkout = git clone 'git@github.com:nakkaya/emacs.git' 'emacs' cd emacs git submodule init git submodule update [source/inbox-feed] checkout = git clone 'git@github.com:nakkaya/inbox-feed.git' 'inbox-feed' skip=lazy [source/nakkaya.com] checkout = git clone 'git@github.com:nakkaya/nakkaya.com.git' 'nakkaya.com' skip=lazy [source/net-eval] checkout = git clone 'git@github.com:nakkaya/net-eval.git' 'net-eval' skip=lazy [source/neu-islanders] checkout = git clone 'ssh://11837@rsync/~/neu-islanders.git' 'neu-islanders' skip=lazy [source/pid] checkout = git clone 'git@github.com:nakkaya/pid.git' 'pid' skip=lazy [source/static] checkout = git clone 'git@github.com:nakkaya/static.git' 'static' skip=lazy [source/vector-2d] checkout = git clone 'git@github.com:nakkaya/vector-2d.git' 'vector-2d' skip=lazy [source/vision] checkout = git clone 'git@github.com:nakkaya/vision.git' 'vision' skip=lazy [source/doganilic.com] checkout = git clone 'ssh://rsync/~/git-bare/doganilic.com.git' 'doganilic.com' skip=lazy [source/coin-trader] checkout = git clone 'ssh://rsync/~/git-bare/coin-trader.git' 'coin-trader' skip=lazy [source/vehicle-tracking] checkout = git clone 'git@gitlab.neu.edu.tr:nakkaya/vehicle-tracking.git' 'vehicle-tracking' skip=lazy