Installing and Using Tarsnap for Fun and Profit

Well, “profit” is a strong word. Maybe “not losing money” would be a better description. Perhaps even “not screwing over readers.”

I back up my personal stuff with a combination of snapshots, tarballs, rsync, and sneakernet. This is fine for my email and my personal web site. Chances are, if all four of my backup sites are simultaneously destroyed, I won’t care.

A couple years ago I opened my own ecommerce site, so I could sell my self-published books directly to readers. For the record, I didn’t expect Tilted Windmill Press direct sales to actually, y’know, go anywhere. I didn’t expect that people would buy books directly from me when they could just go to their favorite ebookstore, hit a button, and have the book miraculously appear on their ereader.

I was wrong. People buy books from me. Once every month or two, someone even throws a few bucks in the tip jar, or flat-out overpays for their books. I am pleasantly surprised.

So: I was wrong about self-publishing, and now I was wrong about author direct sales. Pessimism is grand, because you’re either correct or you get a pleasant surprise. Thank you all.

But now I find myself in a position where I actually have commercially valuable data, and I need to back it up. Like a real business. I need offsite backups. I need them automated. And I need to be able to recover them, so that people who have bought my books can continue to download them, in the off chance that the Detroit area is firebombed off the Earth while I’m at BSDCan.

So it’s time for Tarsnap.

Why Tarsnap?

  • It works very much like tar, so I don’t have to learn any new command-line arguments. (If you’re not familiar with tar, you need to be.)
  • The terms of service are readable by human beings and more reasonable than other backup services.
  • The code is open and auditable.
  • When Tarsnap’s author screws up, he admits it and handles it correctly.
  • It’s cheap. Any backup priced in picodollars gets my attention.

    I also see the author regularly at regular BSD conferences. I can slap him in person if he does anything truly daft.

    Tarsnap has a quick Getting Started page. We’ll do the easy things first. Sign up for a Tarsnap account. Once your account is active, put some money in it–$5 will suffice.

    Now let’s check your prerequisites. You need:

  • GnuPG

    BSD systems come with everything else you need.

    Linux users must install

  • a compiler, like gcc or clang
  • make
  • OpenSSL (including header files)
  • zlib (including header files)
  • System header files
  • OpenSSL header files
  • The ext2fs/ext2_fs.h (not the linux/ext2_fs.h header)

    The Tarsnap download page lists specific packages for Debian-based and Red Hat-based Linuxes.

    Go to the download page and get both the source code and the signed hash file. Tarsnap is only available as source code, so that you can verify the code integrity yourself. So let’s do that.

    Start by using GnuPG to verify the integrity of the Tarsnap code. If you’re not familiar with GnuPG and OpenPGP, some daftie wrote a whole book on PGP & GPG. Once you install GnuPG, run the gpg command to get the configuration files.

    # gpg
    gpg: directory `/home/mwlucas/.gnupg' created
    gpg: new configuration file `/home/mwlucas/.gnupg/gpg.conf' created
    gpg: WARNING: options in `/home/mwlucas/.gnupg/gpg.conf' are not yet active during this run
    gpg: keyring `/home/mwlucas/.gnupg/secring.gpg' created
    gpg: keyring `/home/mwlucas/.gnupg/pubring.gpg' created
    gpg: Go ahead and type your message ...
    ^C
    gpg: signal Interrupt caught ... exiting

    Hit ^C. I just wanted the configuration and key files.

    Now edit $HOME/.gnupg/gpg.conf. Set the following options.

    keyserver hkp://keys.gnupg.net
    keyserver-options auto-key-retrieve

    See if our GPG client can verify the signature file tarsnap-sigs-1.0.35.asc.

    # gpg --decrypt tarsnap-sigs-1.0.35.asc
    SHA256 (tarsnap-autoconf-1.0.35.tgz) = 6c9f6756bc43bc225b842f7e3a0ec7204e0cf606e10559d27704e1cc33098c9a
    gpg: Signature made Sun Feb 16 23:20:35 2014 EST using RSA key ID E5979DF7
    gpg: Good signature from "Tarsnap source code signing key (Colin Percival) " [unknown]
    gpg: WARNING: This key is not certified with a trusted signature!
    gpg: There is no indication that the signature belongs to the owner.
    Primary key fingerprint: 634B 377B 46EB 990B 58FF EB5A C8BF 43BA E597 9DF7

    Some interesting things here. The most important line here is the statement ‘Good signature from “Tarsnap source code signing key”.’ Your GPG program grabbed the source code signing key from a public key server and used it to verify that the signature file is not tampered with.

    As you’re new to OpenPGP, this is all you can do. You’re not attached to the Web of Trust, so you can’t verify the signature chain. (I do recommend that you get an OpenPGP key and collect a few signatures, so you can verify code signatures if nothing else.)

    Now that we know the signature file is good, we can use the cryptographic hash in the file to validate that the tarsnap code we downloaded is what the Tarsnap author intended. Near the top of the signature file you’ll see the line:

    SHA256 (tarsnap-autoconf-1.0.35.tgz) = 6c9f6756bc43bc225b842f7e3a0ec7204e0cf606e10559d27704e1cc33098c9a

    Use the sha256(1) program (or sha256sum, or shasum -a 256, or whatever your particular Unix calls the SHA-256 checksum generator) to verify the source code’s integrity.

    # sha256 tarsnap-autoconf-1.0.35.tgz
    SHA256 (tarsnap-autoconf-1.0.35.tgz) = 6c9f6756bc43bc225b842f7e3a0ec7204e0cf606e10559d27704e1cc33098c9a

    The checksum in the signature file and the checksum you compute match. You have valid source code, and can proceed.

    Extract the source code.

    # tar -xf tarsnap-autoconf-1.0.35.tgz
    # cd tarsnap-autoconf-1.0.35
    # ./configure
    ...
    configure: creating ./config.status
    config.status: creating Makefile
    config.status: creating config.h
    config.status: executing depfiles commands
    #

    If the configure script ends any way other than this, you’re on Linux and didn’t install the necessary development packages. The libraries alone won’t suffice, you must have the development versions.

    If configure completed, run

    # make all install clean

    Tarsnap is now ready to use.

    Start by creating a Tarsnap key for this machine and attaching it to your Tarsnap account. Here I create a key for my machine www.

    # tarsnap-keygen –keyfile /root/tarsnap.key –user mwlucas@michaelwlucas.com –machine pestilence
    Enter tarsnap account password:
    #

    I now have a tarsnap key file. /root/tarsnap.key looks like this:

    # START OF TARSNAP KEY FILE
    dGFyc25hcAAAAAAAAAAzY6MEAAAAAAEAALG8Ix2yYMu+TN6Pj7td2EhjYlGCGrRRknJQ8AeY
    uJsctXIEfurQCOQN5eZFLi8HSCCLGHCMRpM40E6Jc6rJExcPLYkVQAJmd6auGKMWTb5j9gOr
    SeCCEsUj3GzcTaDCLsg/O4dYjl6vb/he9bOkX6NbPomygOpBHqcMOUIBm2eyuOvJ1d9R+oVv
    ...

    This machine is now registered and ready to go.

    This key is important. If your machine is destroyed and you need access to your remote backup, you will need this key! Before you proceed, back it up somewhere other than the machine you’re backing up. There’s lots of advice out there on how to back up private keys. Follow it.

    Now let’s store some backups in the cloud. I’m going to play with my /etc/ directory, because it’s less than 3MB. Start by backing up a single directory.

    # tarsnap -c -f wwwetctest etc/
    Directory /usr/local/tarsnap-cache created for "--cachedir /usr/local/tarsnap-cache"
    Total size Compressed size
    All archives 1996713 382896
    (unique data) 1946025 366495
    This archive 1996713 382896
    New data 1946025 366495

    Nothing seems to happen on the local system. Let’s check and be sure that there’s a backup out in the cloud:

    # tarsnap --list-archives
    wwwetctest

    I then went into /etc and did some cleanup, removing files that shouldn’t have ever been there. This stuff grows in /etc on any long-lived system.

    # tarsnap -c -f wwwetctest-20140716-1508 etc/
    Total size Compressed size
    All archives 3986206 765446
    (unique data) 2120798 403833
    This archive 1989493 382550
    New data 174773 37338

    # tarsnap --list-archives
    wwwetctest
    wwwetctest-20140716-1508

    Note that the compressed size of this archive is much smaller than the first one. Tarsnap only stored the diffs between the two backups.

    If you want more detail about your listed backups, add -v to see the creation date. Add a second -v to see the command used to create the archive.

    # tarsnap --list-archives -vv
    wwwetctest 2014-07-16 15:02:41 tarsnap -c -f wwwetctest etc/
    wwwetctest-20140716-1508 2014-07-16 15:09:38 tarsnap -c -f wwwetctest-20140716-1508 etc/

    Let’s pretend that I need a copy of my backup. Here I extract the newest backup into /tmp/etc.

    # cd /tmp
    # tarsnap -x -f wwwetctest-20140716-1508

    Just for my own amusement, I’ll extract the older backup as well and compare the contents.

    # cd /tmp
    # tarsnap -x -f wwwetctest

    The files I removed during my cleanup are now present.

    What about rotating backups? I now have two backups. The second one is a differential backup against the first. If I blow away the first backup, what happens to the older backup?

    # tarsnap -d -f wwwetctest
    Total size Compressed size
    All archives 1989493 382550
    (unique data) 1938805 366149
    This archive 1996713 382896
    Deleted data 181993 37684

    It doesn’t look like it deleted very much data. And indeed, a check of archive shows that all my files are there.

    And now, the hard part: what do I need to back up? That’s a whole separate class of problem…

  • 4 comments to Installing and Using Tarsnap for Fun and Profit

    • Michael, on OpenBSD there is a port of tarsnap. It simplifies installation, if one trusts the ports tree (which uses a SHA256 checksum for the source and is available via public AnonCVS servers that provide fingerprints).

      http://ports.su/sysutils/tarsnap

    • sean

      Josh, Michael,

      Tarsnap is also in ports and pkgs for FreeBSD…but I agree it’s a good practice to verify the signature anyway.

    • sean

      Hi Michael,

      I noticed a small error on your site, maybe it’s intentional to save space.

      The keygen command actually requires double dashes on all of the arguments:

      usage: tarsnap-keygen –keyfile key-file –user user-name –machine machine-name

      So yours would be this:

      #tarsnap-keygen –-keyfile /root/tarsnap.key –-user mwlucas@michaelwlucas.com –-machine pestilence

      Best,
      sean

    • Sean — aargh! WordPress smoothed away my nice double dashes. Thanks for pointing that out, I’ll figure out how to make WP show what I typed.