Automatic Amazon s3 Backups on Ubuntu / Debian

Amazon s3 Backup on Ubuntu Server

VPS (Virtual Private Server) hosting is the next level up from shared hosting. You get a lot more server usage for each of your dollars, but the catch is that you lose all of the easiness of shared hosting.

One of the most important things you need to set up with your VPS is automatic backups. If your VPS crashes and your data is lost, your entire blogging history will be wiped out in an instant if you don’t have backups at the ready.

This article isn’t going to be for everyone, it assumes two things:

  • You’ve already set up your VPS (If you’re on shared hosting, have a look at this automatic database backup post instead).
  • You’re comfortable with the command line (If you didn’t set up your VPS yourself, I highly recommend you don’t fiddle around with anything here unless you’re certain of what you’re doing!)

The last thing to note is that I’ve done all of this on Ubuntu, though it should have no trouble with Debian either. The software I use is all compatible with other Linux distros though, but I haven’t used them so you may need to adapt certain steps.

If both of those are okay with you though, let’s carry on and set up our ideal backup system!

An Overview of Our Setup

Let’s start by taking a step back and getting a plan of how our backup system will work.

  • Every day, at a time you set, the backup process begins.
  • First, a backup of your database will be taken and saved on the server.
  • Next, the database program will connect to your Amazon s3 account, and make a full backup of your site if needs be.
  • Alternatively, it will only backup the changes from yesterday’s backup (i.e. an incremental backup).
  • Before sending out the backups, all of your files will be encrypted so that no-one but you will be able to read them.

In pictorial form:

Automatic Backup to s3

One thing to note is that we will work through this as though we are backing up just one site. You can of course apply this to as many sites, databases, and directories on your server as you like.

Step 1 – Set Up Encryption

To set this up, we’ll actually be working backwards through the steps above (So you’ll be able to test each one before moving to the next).

The encryption tool we’ll use is called GPG (Gnu Privacy Guard). GPG works by creating two key files:

  • Public key – Used to encrypt your data. It doesn’t matter who sees this.
  • Private key – Used to decrypt your data. This file must be kept safe and only seen by you.

The two files it creates are essentially a pair. Files encrypted by a public key can only be decrypted by the corresponding secret key. If you lose your private key, you will not get your files back, ever.

So, let’s get to it!

  • In your command line (e.g. Putty on Windows, or terminal on Linux/Mac), type the following:
gpg --gen-key

You’ll be walked through a few options for your key, select the following:

  • Key type – DSA and Elgamal (Default)
  • Key size – 2048 bits (Again, the default)
  • Expiration – Do not expire (Not necessary for what we’re doing as you won’t be sharing the public key with anyone).
  • Name, Comment and Email – You can enter whatever you like here, but do take a note of them somewhere. They’ll help you remember which key is which if you create multiple keys later.
  • Password – Make sure you remember whatever you type, there’s no way to get it back if you forget!
  • When it talks about “generating entropy” to make the key, it means that the server needs to be in use in order for it to get some random numbers. Just go refresh a webpage on the server a few times, or run some commands in another terminal window.

When your key is made, you’ll see a few lines about it. The important one looks like this:

pub   2048D/3514FEC1 2010-03-05

The 3514FEC1 is the part you need. That’s your key ID, and you’ll need it for later!

If you do end up forgetting your key ID though, it’s easy enough to get that back. Just type:

gpg --list-keys

That’s our encryption set up and ready to use! If you’d like to learn more about what all you can do with GPG key, have a look at this GPG quick start guide.

Step 2- Sign up for Amazon s3

I should start by saying that while s3 is not a free service, it’s incredibly inexpensive! My bill for the last month was $2.60, and that was with backing up a lot more than just this site! It’s the cheapest peace-of-mind ever.

Start off by signing up at Amazon Web Services (Not linked to your regular Amazon account). They have a few different services, but the only one we want at the minute is s3  (Simple Storage Service).

When you’ve registered, log in to your account and click the “Security Credentials” link.

On this page, you’ll need to create a new access key (You can see the link in the screenshot below). When you’ve made it, take a note of your Access Key and Secret Access Key (click the “Show” link to see the secret one).

Amazon s3 Accounts

If you’re a FireFox user, you should also install the s3Fox plugin. It gives you an extremely easy way of seeing what’s in your s3 account, and even uploading/downloading files from it. It’s not essential, but definitely a handy tool!

Step 3 – Install Duplicity

The backup system is fairly easy to put in place, all thanks to the program we’ll be using; Duplicity.

Let’s start by installing Duplicity.

sudo apt-get install duplicity

Now with it installed, we just have to create a script that tells it how to run. Duplicity can take a wide range of commands, and you can read more about them all here.

Step 4 – Our Duplicity Backup Script

Here is how we want to set it up:

  • Encrypt with our GPG key.
  • Backup to an Amazon s3 “bucket” (a bucket on s3 is like a folder).
  • Make an incremental backup every day.
  • Make a full backup if it’s been more than 2 weeks since our last full backup.
  • Remove backups older than one month.

You can change any of the parameters you like, you’ll see where you can do it.

With your favorite text editor (I use Nano), create a new file and paste the following into it:

#!/bin/sh
export PASSPHRASE=YOUR_GPG_PASSWORD
export AWS_ACCESS_KEY_ID=YOUR_AMAZON_KEY
export AWS_SECRET_ACCESS_KEY=YOUR_AMAZON_SECRET_KEY
 
# Delete any older than 1 month
duplicity remove-older-than 1M --encrypt-key=YOUR_GPG_KEY --sign-key=YOUR_GPG_KEY s3+http://BUCKETNAME
 
# Make the regular backup
# Will be a full backup if past the older-than parameter
duplicity --full-if-older-than 14D --encrypt-key=YOUR_GPG_KEY --sign-key=YOUR_GPG_KEY /DIRECTORY/TO/BACKUP/ s3+http://BUCKETNAME
 
export PASSPHRASE=
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=

You’ll need to update some info in that script to your details. They should be self-explanatory. Replace the bit after the = in lines 2-4, the 4 instances of YOUR_GPG_KEY further down the page.

Also, replace the 2 instances of BUCKETNAME with the name of your bucket on s3 (Don’t worry if it doesn’t exist yet, Duplicity will create it for you!), and last of all, the /DIRECTORY/TO/BACKUP/ with the folder to backup.

Now save the script (e.g. backup.sitename.sh), and run it. Now if you check your s3Fox plugin, you should see the files (Well, the encrypted version of them).

s3Fox FireFox plugin

Step 5 – A Restore Script

It’s not much use backing up your files if you can’t get them back when you need them, so we still have to set up our restore script!

And a warning; do make sure you set this up and test it now. If it turns out that you can’t decrypt your backups or any error like that, then it’s too late to discover that come the time you actually need to make a restore!

#!/bin/sh
export PASSPHRASE=YOUR_GPG_PASSWORD
export AWS_ACCESS_KEY_ID=YOUR_AMAZON_KEY
export AWS_SECRET_ACCESS_KEY=YOUR_AMAZON_SECRET_KEY
 
## Two options for restoring, uncomment and edit the one to use!
## (to restore everything, just take out the --file-to-restore command and filename)
 
# Restore a single file
# NOTE - REMEMBER to name the file in both the --file-to-restore and in the location you will restore it to!
# Also file name (path) is relative to the root of the directory backed up (e.g. pliableweb.com/test is just test)
#duplicity --file-to-restore FILENAME s3+http://BUCKETNAME /FILE/TO/RESTORE/TO --encrypt-key=YOUR_GPG_KEY --sign-key=YOUR_GPG_KEY -vinfo
 
# Restore a file from a specified day
# NOTE - Remember to name the file in both locations again!
#duplicity -t4D --file-to-restore FILENAME s3+http://BUCKETNAME /FILE/TO/RESTORE/TO --encrypt-key=YOUR_GPG_KEY --sign-key=YOUR_GPG_KEY
 
export PASSPHRASE=
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=

Once again, you’ll need to replace parts of that with your own details. For explanations of what each thing is to be replaced with, look back to the explanation for the backup script, they’re the same!

And last of all, you’ll see that I’ve commented out the commands. Delete the # infront of them to uncomment them when you want to use them. That’s just a precaution in case you run the script by accident!

Step 6 – Backup Your Databases

We’re getting there, promise!

There’s absolutely no point in backing up your files if you aren’t backing up your databases as well. Thankfully, it’s not difficult to do.

You have 2 options:

  • Use a WordPress plugin and backup to an email address. You can read the full automatic WordPress backup guide here.
  • Make a backup of your database and include it with the files being backed up to s3.

Naturally the one I’ll be talking about here is the s3 solution! To do it, all you need is another shell script.

The best script I’ve found to do this is AutoMySQLBackup. It will:

  • Make daily, weekly, and monthly backups of your database, and delete old ones (You set how long to keep them for in the script).
  • Email you a warning if anything goes wrong with the backup (Extremely useful). You get the peace of mind of being notified if there’s a problem, but no spam because if it all goes well, you won’t hear from it (in the settings at the top of the script, set MAILCONTENT="quiet").

Step 7 – Automate all of This

The final step is to set this up to run automatically so that you can forget all about it! We’ve made this very easy to do by storing all of our commands in shell scripts. All we need to do is use cron to run them at set time.

If you aren’t familiar with Cron, Ubuntu Help has a great explanation of them.

To access your crontab, enter:

crontab –e

Now, here’s an example of 2 cron jobs you could add:

40 8 * * * ./backup-db-problogdesign.sh
0 9 * * * ./backup-problogdesign.sh > /var/log/backup.problogdesign.log

The first will back up the database. The second will then run the whole backup to s3 20 minutes later, and store the output in a log file for you (Make sure you’ve created the log file already though).

If you only wanted to run it every other day, you could use:

40 8 * * */2 ./backup-db-problogdesign.sh
0 9 * * */2 ./backup-problogdesign.sh > /var/log/backup.problogdesign.log

 

Troubleshooting

There are a few places you could go wrong in all this. If you do have trouble, here are a few things to try:

  • Test each step, one at a time. Is your encryption working? Are you able to connect to Amazon s3? Is Duplicity working? Last of all, does it all work from cron?
  • If the trouble is with your encryption, are the keys owned by the same user as the one who runs the commands?
  • s3 buckets names and GPG IDs and passwords need to be written down in a few places. Quadruple check for typos!

Conclusion

You’ve now got a fairly robust backup system in place. All of your files will be copied safely to a third party server, every single day.

The major flaw here, which some of you may have spotted already, is that your GPG password is stored in plain view on your server, and that anyone with access to the Duplicity script can delete your backups. If someone gets into your server account, this isn’t going to help you, you’re only protected against hardware failures.

If anyone has any thoughts on getting around that issue, I’d love to hear them!

Update (14/03/2010): Check out a tip from Matt in the comments, to backup your key to your local computer.

Share

  1. I worked out a way to encrypt files w\o storing the private key / password on the server, and only using it for the restore. Basically, it generates an unknown password, uses the password to encrypts the file with AES, secures the password with an RSA public key, and then tars the encrypted password and encrypted file together.

    I haven’t perfected the code for it, but here is the repo: http://github.com/tjsingleton/.....Encryption.

    • Michael Martin (1511 comments)12 March 10

      Hi TJ,

      I’m definitely interested in trying that out. That would be a massive upgrade to the whole setup, it’s really the only thing that annoys me about my current server setup. Thanks very much for sharing!

    • Simon Slade (1 comments)5 April 10

      Nice tip TJ! Thanks for sharing it :) Should add to the main post Michael ;)

  2. redwall_hp (144 comments)12 March 10

    Doesn’t look too hard. I will probably give it a try sometime in the next few weeks. Thanks for the article.

    • redwall_hp (144 comments)13 March 10

      Okay. So far I have my backup and restore scripts working. I still have to set up the DB dump script and then put the scripts in cron, but I’m taking a break for today.

      While setting things up, I thought of something that I would recommend doing: export your GPG keys and save them to at least one safe place. Otherwise, if your server goes up in smoke and you need to restore from your backup, you might not have the keys required to decrypt the files. :)

      Just run these two commands, replacing THE_KEY_ID with your GPG key’s ID, and changing the filenames if you wish.

      gpg -ao MyPublicKey.key –export THE_KEY_ID

      gpg -ao MyPrivateKey.key–export-secret-keys THE_KEY_ID

      You can then use scp to copy the files to your local computer. Now you can burn them to CDs to bury in the yard, put them on a USB drive to keep on your keyring, and print the contents out and mail them somewhere. (I’m not going to go to that extreme, but I’m keeping a copy on my laptop and in my Dropbox.)

    • Michael Martin (1511 comments)14 March 10

      Good tip Matt! I’ve done that as well (just storing it on my computer, though I like the idea of burying it in the back yard! ;) )

      I’ll add that to the post now, thanks! :)

    • redwall_hp (144 comments)15 March 10

      If you go with the backyard method, be sure to draw up a treasure map and put it in a safe place. :)

    • Wolfgeek (2 comments)1 August 10

      When I do this, the result of the first command is: gpg: Invalid option “-export”

      I’m not finding much help on Google. Any ideas?

    • Wolfgeek (2 comments)1 August 10

      Um….never mind. That’s what I get for trying to do this at 1:30am. I should have known it was a double-dash. Don’t I feel foolish.

  3. TB (3 comments)13 March 10

    While the method did sound promising I find performance to be horrid.

    My current scenario consists of daily “offsite” backups of servers in a remote DC to a local NAS using rsync+ssh.

    One of my servers has ~1GB (943MB atm) to backup, which takes about 15-20min using rsync+ssh over a 20Mbps connection.

    Using duplicity with an s3 backend, it took 124 minutes to transfer all 5MB chunks of data, transferring a total of 1064MB (13% overhead for a full backup).

    *Conclusion*
    RSync was about 6x faster so, considering it had only 1/5 of the bandwidth that was available to duplicity (20Mbps vs 100Mbps), it is clear to me this method won’t scale at all.

    Have you had better results, or are you backing up less data?

    • redwall_hp (144 comments)14 March 10

      I don’t have everything set up all the way yet, but I used duplicity to back up around 400MB to S3 in under five minutes. Maybe you were having some network congestion at the time, or a process competing with duplicity for resources?

    • Michael Martin (1511 comments)14 March 10

      Something seems off there to be honest. I haven’t got any specific stats on this (Haven’t had a look at how long any of this took since I first set it up in January).

      One site I back up though is over 1GB. It was still a matter of minutes to back up though (From a Linode 540 : http://www.linode.com/ )

      Wish I could pinpoint the issue for you, but it could even just be related to having issues with encrypting and preparing the files on your server. Mightn’t be anything to do with network speed (Or like Matt said, maybe there were issues at the time?)

    • TB (3 comments)14 March 10

      In order to get some additional metrics I retested with the original setup, which confirmed my earlier findings. Duplicity is not struggling for resources, but bandwidth is a problem with just 1-2Mbps throughput.

      I retested with a new setup aswell:

      - Installed duplicity from source (v.0.6.08b) vs from the stable debian repos (v. 0.4.11). Lenny-backports has a more recent version aswell (v. 0.6.06), which I would recommend over installing from source on production systems.
      - Switched to a EU bucket, since I’m operating from BE and NL (DC)

      Observations:
      - Objects are being pushed to S3 EU at 10-25Mbps
      - It now takes about 40 min. to backup 1.02GB

      I’ll be running some additional tests coming weeks but it’s looking much better already!

      So thanks for the tip and taking the time to reply.

  4. redwall_hp (144 comments)15 March 10

    Out of curiosity, what are you backing up, Michael? Are you backing up your entire root partition, or just /var/www and wherever your MySQL dumps are stored?

    I know I primarily need to backup my web root, my MySQL dumps, and the NGINX/PHP config files. What’s the best way to set up the shell scripts for that? Should I set up more than one bucket and run more than one Duplicity command? Should I try to use the –include argument to pass more than once directory to Duplicity? Or should I just set it to back up everything?

  5. Mike (15 comments)8 April 10

    Hey,

    a friend of mine did some nice work on crypto containers

    http://www.disenchant.ch/blog/.....ainers/288

    hf

    Mike

  6. James D. (1 comments)17 April 10

    I’m using this plugin to do the same thing… I don’t know if it does the encryption part http://www.webdesigncompany.ne.....ss-backup/

  7. Gift Boxes (8 comments)26 April 10

    Many information is new for me.

    Thanks for your so many useful posts, I will follow you to learn wp and try it for my new blog.

  8. Interieur (4 comments)1 June 10

    This is all new to me too :-)

  9. Desarrollo web (13 comments)29 July 10

    I usually do go through the spam filter, so even the simple fact you have a Gravatar

Leave a Comment

Your reply will be added to the comment above (Below any other replies to this comment) -

(We DoFollow)

Not sure how to get an image with your comment?