Automatic Amazon s3 Backups on Ubuntu / Debian40
VPS (Virtual Private Server) hosting is the next level up from shared hosting. You get a lot more server usage for each of your dollars, but the catch is that you lose all of the easiness of shared hosting.
One of the most important things you need to set up with your VPS is automatic backups. If your VPS crashes and your data is lost, your entire blogging history will be wiped out in an instant if you don’t have backups at the ready.
This article isn’t going to be for everyone, it assumes two things:
- You’ve already set up your VPS (If you’re on shared hosting, have a look at this automatic database backup post instead).
- You’re comfortable with the command line (If you didn’t set up your VPS yourself, I highly recommend you don’t fiddle around with anything here unless you’re certain of what you’re doing!)
The last thing to note is that I’ve done all of this on Ubuntu, though it should have no trouble with Debian either. The software I use is all compatible with other Linux distros though, but I haven’t used them so you may need to adapt certain steps.
If both of those are okay with you though, let’s carry on and set up our ideal backup system!
An Overview of Our Setup
Let’s start by taking a step back and getting a plan of how our backup system will work.
- Every day, at a time you set, the backup process begins.
- First, a backup of your database will be taken and saved on the server.
- Next, the database program will connect to your Amazon s3 account, and make a full backup of your site if needs be.
- Alternatively, it will only backup the changes from yesterday’s backup (i.e. an incremental backup).
- Before sending out the backups, all of your files will be encrypted so that no-one but you will be able to read them.
In pictorial form:
One thing to note is that we will work through this as though we are backing up just one site. You can of course apply this to as many sites, databases, and directories on your server as you like.
Step 1 – Set Up Encryption
To set this up, we’ll actually be working backwards through the steps above (So you’ll be able to test each one before moving to the next).
The encryption tool we’ll use is called GPG (Gnu Privacy Guard). GPG works by creating two key files:
- Public key – Used to encrypt your data. It doesn’t matter who sees this.
- Private key – Used to decrypt your data. This file must be kept safe and only seen by you.
The two files it creates are essentially a pair. Files encrypted by a public key can only be decrypted by the corresponding secret key. If you lose your private key, you will not get your files back, ever.
So, let’s get to it!
- In your command line (e.g. Putty on Windows, or terminal on Linux/Mac), type the following:
You’ll be walked through a few options for your key, select the following:
- Key type – DSA and Elgamal (Default)
- Key size – 2048 bits (Again, the default)
- Expiration – Do not expire (Not necessary for what we’re doing as you won’t be sharing the public key with anyone).
- Name, Comment and Email – You can enter whatever you like here, but do take a note of them somewhere. They’ll help you remember which key is which if you create multiple keys later.
- Password – Make sure you remember whatever you type, there’s no way to get it back if you forget!
- When it talks about “generating entropy” to make the key, it means that the server needs to be in use in order for it to get some random numbers. Just go refresh a webpage on the server a few times, or run some commands in another terminal window.
When your key is made, you’ll see a few lines about it. The important one looks like this:
pub 2048D/3514FEC1 2010-03-05
The 3514FEC1 is the part you need. That’s your key ID, and you’ll need it for later!
If you do end up forgetting your key ID though, it’s easy enough to get that back. Just type:
That’s our encryption set up and ready to use! If you’d like to learn more about what all you can do with GPG key, have a look at this GPG quick start guide.
Step 2- Sign up for Amazon s3
I should start by saying that while s3 is not a free service, it’s incredibly inexpensive! My bill for the last month was $2.60, and that was with backing up a lot more than just this site! It’s the cheapest peace-of-mind ever.
Start off by signing up at Amazon Web Services (Not linked to your regular Amazon account). They have a few different services, but the only one we want at the minute is s3 (Simple Storage Service).
When you’ve registered, log in to your account and click the “Security Credentials” link.
On this page, you’ll need to create a new access key (You can see the link in the screenshot below). When you’ve made it, take a note of your Access Key and Secret Access Key (click the “Show” link to see the secret one).
If you’re a FireFox user, you should also install the s3Fox plugin. It gives you an extremely easy way of seeing what’s in your s3 account, and even uploading/downloading files from it. It’s not essential, but definitely a handy tool!
Step 3 – Install Duplicity
The backup system is fairly easy to put in place, all thanks to the program we’ll be using; Duplicity.
Let’s start by installing Duplicity.
sudo apt-get install duplicity
Now with it installed, we just have to create a script that tells it how to run. Duplicity can take a wide range of commands, and you can read more about them all here.
Step 4 – Our Duplicity Backup Script
Here is how we want to set it up:
- Encrypt with our GPG key.
- Backup to an Amazon s3 “bucket” (a bucket on s3 is like a folder).
- Make an incremental backup every day.
- Make a full backup if it’s been more than 2 weeks since our last full backup.
- Remove backups older than one month.
You can change any of the parameters you like, you’ll see where you can do it.
With your favorite text editor (I use Nano), create a new file and paste the following into it:
#!/bin/sh export PASSPHRASE=YOUR_GPG_PASSWORD export AWS_ACCESS_KEY_ID=YOUR_AMAZON_KEY export AWS_SECRET_ACCESS_KEY=YOUR_AMAZON_SECRET_KEY # Delete any older than 1 month duplicity remove-older-than 1M --encrypt-key=YOUR_GPG_KEY --sign-key=YOUR_GPG_KEY s3+http://BUCKETNAME # Make the regular backup # Will be a full backup if past the older-than parameter duplicity --full-if-older-than 14D --encrypt-key=YOUR_GPG_KEY --sign-key=YOUR_GPG_KEY /DIRECTORY/TO/BACKUP/ s3+http://BUCKETNAME export PASSPHRASE= export AWS_ACCESS_KEY_ID= export AWS_SECRET_ACCESS_KEY=
You’ll need to update some info in that script to your details. They should be self-explanatory. Replace the bit after the = in lines 2-4, the 4 instances of YOUR_GPG_KEY further down the page.
Also, replace the 2 instances of BUCKETNAME with the name of your bucket on s3 (Don’t worry if it doesn’t exist yet, Duplicity will create it for you!), and last of all, the /DIRECTORY/TO/BACKUP/ with the folder to backup.
Now save the script (e.g. backup.sitename.sh), and run it. Now if you check your s3Fox plugin, you should see the files (Well, the encrypted version of them).
Step 5 – A Restore Script
It’s not much use backing up your files if you can’t get them back when you need them, so we still have to set up our restore script!
And a warning; do make sure you set this up and test it now. If it turns out that you can’t decrypt your backups or any error like that, then it’s too late to discover that come the time you actually need to make a restore!
#!/bin/sh export PASSPHRASE=YOUR_GPG_PASSWORD export AWS_ACCESS_KEY_ID=YOUR_AMAZON_KEY export AWS_SECRET_ACCESS_KEY=YOUR_AMAZON_SECRET_KEY ## Two options for restoring, uncomment and edit the one to use! ## (to restore everything, just take out the --file-to-restore command and filename) # Restore a single file # NOTE - REMEMBER to name the file in both the --file-to-restore and in the location you will restore it to! # Also file name (path) is relative to the root of the directory backed up (e.g. pliableweb.com/test is just test) #duplicity --file-to-restore FILENAME s3+http://BUCKETNAME /FILE/TO/RESTORE/TO --encrypt-key=YOUR_GPG_KEY --sign-key=YOUR_GPG_KEY -vinfo # Restore a file from a specified day # NOTE - Remember to name the file in both locations again! #duplicity -t4D --file-to-restore FILENAME s3+http://BUCKETNAME /FILE/TO/RESTORE/TO --encrypt-key=YOUR_GPG_KEY --sign-key=YOUR_GPG_KEY export PASSPHRASE= export AWS_ACCESS_KEY_ID= export AWS_SECRET_ACCESS_KEY=
Once again, you’ll need to replace parts of that with your own details. For explanations of what each thing is to be replaced with, look back to the explanation for the backup script, they’re the same!
And last of all, you’ll see that I’ve commented out the commands. Delete the # infront of them to uncomment them when you want to use them. That’s just a precaution in case you run the script by accident!
Step 6 – Backup Your Databases
We’re getting there, promise!
There’s absolutely no point in backing up your files if you aren’t backing up your databases as well. Thankfully, it’s not difficult to do.
You have 2 options:
- Use a WordPress plugin and backup to an email address. You can read the full automatic WordPress backup guide here.
- Make a backup of your database and include it with the files being backed up to s3.
Naturally the one I’ll be talking about here is the s3 solution! To do it, all you need is another shell script.
The best script I’ve found to do this is AutoMySQLBackup. It will:
- Make daily, weekly, and monthly backups of your database, and delete old ones (You set how long to keep them for in the script).
- Email you a warning if anything goes wrong with the backup (Extremely useful). You get the peace of mind of being notified if there’s a problem, but no spam because if it all goes well, you won’t hear from it (in the settings at the top of the script, set MAILCONTENT="quiet").
Step 7 – Automate all of This
The final step is to set this up to run automatically so that you can forget all about it! We’ve made this very easy to do by storing all of our commands in shell scripts. All we need to do is use cron to run them at set time.
If you aren’t familiar with Cron, Ubuntu Help has a great explanation of them.
To access your crontab, enter:
Now, here’s an example of 2 cron jobs you could add:
40 8 * * * ./backup-db-problogdesign.sh 0 9 * * * ./backup-problogdesign.sh > /var/log/backup.problogdesign.log
The first will back up the database. The second will then run the whole backup to s3 20 minutes later, and store the output in a log file for you (Make sure you’ve created the log file already though).
If you only wanted to run it every other day, you could use:
40 8 * * */2 ./backup-db-problogdesign.sh 0 9 * * */2 ./backup-problogdesign.sh > /var/log/backup.problogdesign.log
There are a few places you could go wrong in all this. If you do have trouble, here are a few things to try:
- Test each step, one at a time. Is your encryption working? Are you able to connect to Amazon s3? Is Duplicity working? Last of all, does it all work from cron?
- If the trouble is with your encryption, are the keys owned by the same user as the one who runs the commands?
- s3 buckets names and GPG IDs and passwords need to be written down in a few places. Quadruple check for typos!
You’ve now got a fairly robust backup system in place. All of your files will be copied safely to a third party server, every single day.
The major flaw here, which some of you may have spotted already, is that your GPG password is stored in plain view on your server, and that anyone with access to the Duplicity script can delete your backups. If someone gets into your server account, this isn’t going to help you, you’re only protected against hardware failures.
If anyone has any thoughts on getting around that issue, I’d love to hear them!
Enjoy this post? You should follow me on Twitter!