Automatic Amazon s3 Backups on Ubuntu / Debian
40
VPS (Virtual Private Server) hosting is the next level up from shared hosting. You get a lot more server usage for each of your dollars, but the catch is that you lose all of the easiness of shared hosting.
One of the most important things you need to set up with your VPS is automatic backups. If your VPS crashes and your data is lost, your entire blogging history will be wiped out in an instant if you don’t have backups at the ready.
This article isn’t going to be for everyone, it assumes two things:
- You’ve already set up your VPS (If you’re on shared hosting, have a look at this automatic database backup post instead).
- You’re comfortable with the command line (If you didn’t set up your VPS yourself, I highly recommend you don’t fiddle around with anything here unless you’re certain of what you’re doing!)
The last thing to note is that I’ve done all of this on Ubuntu, though it should have no trouble with Debian either. The software I use is all compatible with other Linux distros though, but I haven’t used them so you may need to adapt certain steps.
If both of those are okay with you though, let’s carry on and set up our ideal backup system!
An Overview of Our Setup
Let’s start by taking a step back and getting a plan of how our backup system will work.
- Every day, at a time you set, the backup process begins.
- First, a backup of your database will be taken and saved on the server.
- Next, the database program will connect to your Amazon s3 account, and make a full backup of your site if needs be.
- Alternatively, it will only backup the changes from yesterday’s backup (i.e. an incremental backup).
- Before sending out the backups, all of your files will be encrypted so that no-one but you will be able to read them.
In pictorial form:
One thing to note is that we will work through this as though we are backing up just one site. You can of course apply this to as many sites, databases, and directories on your server as you like.
Step 1 – Set Up Encryption
To set this up, we’ll actually be working backwards through the steps above (So you’ll be able to test each one before moving to the next).
The encryption tool we’ll use is called GPG (Gnu Privacy Guard). GPG works by creating two key files:
- Public key – Used to encrypt your data. It doesn’t matter who sees this.
- Private key – Used to decrypt your data. This file must be kept safe and only seen by you.
The two files it creates are essentially a pair. Files encrypted by a public key can only be decrypted by the corresponding secret key. If you lose your private key, you will not get your files back, ever.
So, let’s get to it!
- In your command line (e.g. Putty on Windows, or terminal on Linux/Mac), type the following:
gpg --gen-key |
You’ll be walked through a few options for your key, select the following:
- Key type – DSA and Elgamal (Default)
- Key size – 2048 bits (Again, the default)
- Expiration – Do not expire (Not necessary for what we’re doing as you won’t be sharing the public key with anyone).
- Name, Comment and Email – You can enter whatever you like here, but do take a note of them somewhere. They’ll help you remember which key is which if you create multiple keys later.
- Password – Make sure you remember whatever you type, there’s no way to get it back if you forget!
- When it talks about “generating entropy” to make the key, it means that the server needs to be in use in order for it to get some random numbers. Just go refresh a webpage on the server a few times, or run some commands in another terminal window.
When your key is made, you’ll see a few lines about it. The important one looks like this:
pub 2048D/3514FEC1 2010-03-05 |
The 3514FEC1 is the part you need. That’s your key ID, and you’ll need it for later!
If you do end up forgetting your key ID though, it’s easy enough to get that back. Just type:
gpg --list-keys |
That’s our encryption set up and ready to use! If you’d like to learn more about what all you can do with GPG key, have a look at this GPG quick start guide.
Step 2- Sign up for Amazon s3
I should start by saying that while s3 is not a free service, it’s incredibly inexpensive! My bill for the last month was $2.60, and that was with backing up a lot more than just this site! It’s the cheapest peace-of-mind ever.
Start off by signing up at Amazon Web Services (Not linked to your regular Amazon account). They have a few different services, but the only one we want at the minute is s3 (Simple Storage Service).
When you’ve registered, log in to your account and click the “Security Credentials” link.
On this page, you’ll need to create a new access key (You can see the link in the screenshot below). When you’ve made it, take a note of your Access Key and Secret Access Key (click the “Show” link to see the secret one).
If you’re a FireFox user, you should also install the s3Fox plugin. It gives you an extremely easy way of seeing what’s in your s3 account, and even uploading/downloading files from it. It’s not essential, but definitely a handy tool!
Step 3 – Install Duplicity
The backup system is fairly easy to put in place, all thanks to the program we’ll be using; Duplicity.
Let’s start by installing Duplicity.
sudo apt-get install duplicity |
Now with it installed, we just have to create a script that tells it how to run. Duplicity can take a wide range of commands, and you can read more about them all here.
Step 4 – Our Duplicity Backup Script
Here is how we want to set it up:
- Encrypt with our GPG key.
- Backup to an Amazon s3 “bucket” (a bucket on s3 is like a folder).
- Make an incremental backup every day.
- Make a full backup if it’s been more than 2 weeks since our last full backup.
- Remove backups older than one month.
You can change any of the parameters you like, you’ll see where you can do it.
With your favorite text editor (I use Nano), create a new file and paste the following into it:
#!/bin/sh export PASSPHRASE=YOUR_GPG_PASSWORD export AWS_ACCESS_KEY_ID=YOUR_AMAZON_KEY export AWS_SECRET_ACCESS_KEY=YOUR_AMAZON_SECRET_KEY # Delete any older than 1 month duplicity remove-older-than 1M --encrypt-key=YOUR_GPG_KEY --sign-key=YOUR_GPG_KEY s3+http://BUCKETNAME # Make the regular backup # Will be a full backup if past the older-than parameter duplicity --full-if-older-than 14D --encrypt-key=YOUR_GPG_KEY --sign-key=YOUR_GPG_KEY /DIRECTORY/TO/BACKUP/ s3+http://BUCKETNAME export PASSPHRASE= export AWS_ACCESS_KEY_ID= export AWS_SECRET_ACCESS_KEY= |
You’ll need to update some info in that script to your details. They should be self-explanatory. Replace the bit after the = in lines 2-4, the 4 instances of YOUR_GPG_KEY further down the page.
Also, replace the 2 instances of BUCKETNAME with the name of your bucket on s3 (Don’t worry if it doesn’t exist yet, Duplicity will create it for you!), and last of all, the /DIRECTORY/TO/BACKUP/ with the folder to backup.
Now save the script (e.g. backup.sitename.sh), and run it. Now if you check your s3Fox plugin, you should see the files (Well, the encrypted version of them).
Step 5 – A Restore Script
It’s not much use backing up your files if you can’t get them back when you need them, so we still have to set up our restore script!
And a warning; do make sure you set this up and test it now. If it turns out that you can’t decrypt your backups or any error like that, then it’s too late to discover that come the time you actually need to make a restore!
#!/bin/sh export PASSPHRASE=YOUR_GPG_PASSWORD export AWS_ACCESS_KEY_ID=YOUR_AMAZON_KEY export AWS_SECRET_ACCESS_KEY=YOUR_AMAZON_SECRET_KEY ## Two options for restoring, uncomment and edit the one to use! ## (to restore everything, just take out the --file-to-restore command and filename) # Restore a single file # NOTE - REMEMBER to name the file in both the --file-to-restore and in the location you will restore it to! # Also file name (path) is relative to the root of the directory backed up (e.g. pliableweb.com/test is just test) #duplicity --file-to-restore FILENAME s3+http://BUCKETNAME /FILE/TO/RESTORE/TO --encrypt-key=YOUR_GPG_KEY --sign-key=YOUR_GPG_KEY -vinfo # Restore a file from a specified day # NOTE - Remember to name the file in both locations again! #duplicity -t4D --file-to-restore FILENAME s3+http://BUCKETNAME /FILE/TO/RESTORE/TO --encrypt-key=YOUR_GPG_KEY --sign-key=YOUR_GPG_KEY export PASSPHRASE= export AWS_ACCESS_KEY_ID= export AWS_SECRET_ACCESS_KEY= |
Once again, you’ll need to replace parts of that with your own details. For explanations of what each thing is to be replaced with, look back to the explanation for the backup script, they’re the same!
And last of all, you’ll see that I’ve commented out the commands. Delete the # infront of them to uncomment them when you want to use them. That’s just a precaution in case you run the script by accident!
Step 6 – Backup Your Databases
We’re getting there, promise!
There’s absolutely no point in backing up your files if you aren’t backing up your databases as well. Thankfully, it’s not difficult to do.
You have 2 options:
- Use a WordPress plugin and backup to an email address. You can read the full automatic WordPress backup guide here.
- Make a backup of your database and include it with the files being backed up to s3.
Naturally the one I’ll be talking about here is the s3 solution! To do it, all you need is another shell script.
The best script I’ve found to do this is AutoMySQLBackup. It will:
- Make daily, weekly, and monthly backups of your database, and delete old ones (You set how long to keep them for in the script).
- Email you a warning if anything goes wrong with the backup (Extremely useful). You get the peace of mind of being notified if there’s a problem, but no spam because if it all goes well, you won’t hear from it (in the settings at the top of the script, set MAILCONTENT="quiet").
Step 7 – Automate all of This
The final step is to set this up to run automatically so that you can forget all about it! We’ve made this very easy to do by storing all of our commands in shell scripts. All we need to do is use cron to run them at set time.
If you aren’t familiar with Cron, Ubuntu Help has a great explanation of them.
To access your crontab, enter:
crontab –e |
Now, here’s an example of 2 cron jobs you could add:
40 8 * * * ./backup-db-problogdesign.sh 0 9 * * * ./backup-problogdesign.sh > /var/log/backup.problogdesign.log |
The first will back up the database. The second will then run the whole backup to s3 20 minutes later, and store the output in a log file for you (Make sure you’ve created the log file already though).
If you only wanted to run it every other day, you could use:
40 8 * * */2 ./backup-db-problogdesign.sh 0 9 * * */2 ./backup-problogdesign.sh > /var/log/backup.problogdesign.log |
Troubleshooting
There are a few places you could go wrong in all this. If you do have trouble, here are a few things to try:
- Test each step, one at a time. Is your encryption working? Are you able to connect to Amazon s3? Is Duplicity working? Last of all, does it all work from cron?
- If the trouble is with your encryption, are the keys owned by the same user as the one who runs the commands?
- s3 buckets names and GPG IDs and passwords need to be written down in a few places. Quadruple check for typos!
Conclusion
You’ve now got a fairly robust backup system in place. All of your files will be copied safely to a third party server, every single day.
The major flaw here, which some of you may have spotted already, is that your GPG password is stored in plain view on your server, and that anyone with access to the Duplicity script can delete your backups. If someone gets into your server account, this isn’t going to help you, you’re only protected against hardware failures.
If anyone has any thoughts on getting around that issue, I’d love to hear them!
Update (14/03/2010): Check out a tip from Matt in the comments, to backup your key to your local computer.
Enjoy this post? You should follow me on Twitter!
I worked out a way to encrypt files w\o storing the private key / password on the server, and only using it for the restore. Basically, it generates an unknown password, uses the password to encrypts the file with AES, secures the password with an RSA public key, and then tars the encrypted password and encrypted file together.
I haven’t perfected the code for it, but here is the repo: http://github.com/tjsingleton/Key-Secured-File-Encryption.
Hi TJ,
I’m definitely interested in trying that out. That would be a massive upgrade to the whole setup, it’s really the only thing that annoys me about my current server setup. Thanks very much for sharing!
Nice tip TJ! Thanks for sharing it :) Should add to the main post Michael ;)
Doesn’t look too hard. I will probably give it a try sometime in the next few weeks. Thanks for the article.
Okay. So far I have my backup and restore scripts working. I still have to set up the DB dump script and then put the scripts in cron, but I’m taking a break for today.
While setting things up, I thought of something that I would recommend doing: export your GPG keys and save them to at least one safe place. Otherwise, if your server goes up in smoke and you need to restore from your backup, you might not have the keys required to decrypt the files. :)
Just run these two commands, replacing THE_KEY_ID with your GPG key’s ID, and changing the filenames if you wish.
gpg -ao MyPublicKey.key –export THE_KEY_ID
gpg -ao MyPrivateKey.key–export-secret-keys THE_KEY_ID
You can then use scp to copy the files to your local computer. Now you can burn them to CDs to bury in the yard, put them on a USB drive to keep on your keyring, and print the contents out and mail them somewhere. (I’m not going to go to that extreme, but I’m keeping a copy on my laptop and in my Dropbox.)
Good tip Matt! I’ve done that as well (just storing it on my computer, though I like the idea of burying it in the back yard! ;) )
I’ll add that to the post now, thanks! :)
If you go with the backyard method, be sure to draw up a treasure map and put it in a safe place. :)
When I do this, the result of the first command is: gpg: Invalid option “-export”
I’m not finding much help on Google. Any ideas?
Um….never mind. That’s what I get for trying to do this at 1:30am. I should have known it was a double-dash. Don’t I feel foolish.
While the method did sound promising I find performance to be horrid.
My current scenario consists of daily “offsite” backups of servers in a remote DC to a local NAS using rsync+ssh.
One of my servers has ~1GB (943MB atm) to backup, which takes about 15-20min using rsync+ssh over a 20Mbps connection.
Using duplicity with an s3 backend, it took 124 minutes to transfer all 5MB chunks of data, transferring a total of 1064MB (13% overhead for a full backup).
*Conclusion*
RSync was about 6x faster so, considering it had only 1/5 of the bandwidth that was available to duplicity (20Mbps vs 100Mbps), it is clear to me this method won’t scale at all.
Have you had better results, or are you backing up less data?
I don’t have everything set up all the way yet, but I used duplicity to back up around 400MB to S3 in under five minutes. Maybe you were having some network congestion at the time, or a process competing with duplicity for resources?
Something seems off there to be honest. I haven’t got any specific stats on this (Haven’t had a look at how long any of this took since I first set it up in January).
One site I back up though is over 1GB. It was still a matter of minutes to back up though (From a Linode 540 : http://www.linode.com/ )
Wish I could pinpoint the issue for you, but it could even just be related to having issues with encrypting and preparing the files on your server. Mightn’t be anything to do with network speed (Or like Matt said, maybe there were issues at the time?)
In order to get some additional metrics I retested with the original setup, which confirmed my earlier findings. Duplicity is not struggling for resources, but bandwidth is a problem with just 1-2Mbps throughput.
I retested with a new setup aswell:
– Installed duplicity from source (v.0.6.08b) vs from the stable debian repos (v. 0.4.11). Lenny-backports has a more recent version aswell (v. 0.6.06), which I would recommend over installing from source on production systems.
– Switched to a EU bucket, since I’m operating from BE and NL (DC)
Observations:
– Objects are being pushed to S3 EU at 10-25Mbps
– It now takes about 40 min. to backup 1.02GB
I’ll be running some additional tests coming weeks but it’s looking much better already!
So thanks for the tip and taking the time to reply.
I don’t understand this part
“Just run these two commands, replacing THE_KEY_ID with your GPG key’s ID, and changing the filenames if you wish.
gpg -ao MyPublicKey.key –export THE_KEY_ID
gpg -ao MyPrivateKey.key–export-secret-keys THE_KEY_ID”
I save it to my local machine, how do i restore it to the new machine?
so if i set –full-if-older-than 14D to 30D there will be 30 days of backups and each month it will create a full backup?
so I could roll back to day 17?
Out of curiosity, what are you backing up, Michael? Are you backing up your entire root partition, or just /var/www and wherever your MySQL dumps are stored?
I know I primarily need to backup my web root, my MySQL dumps, and the NGINX/PHP config files. What’s the best way to set up the shell scripts for that? Should I set up more than one bucket and run more than one Duplicity command? Should I try to use the –include argument to pass more than once directory to Duplicity? Or should I just set it to back up everything?
Hey,
a friend of mine did some nice work on crypto containers
http://www.disenchant.ch/blog/incremental-backups-of-crypto-containers/288
hf
Mike
another great news,thanks.Can’t wait to download!
I’m using this plugin to do the same thing… I don’t know if it does the encryption part http://www.webdesigncompany.net/automatic-wordpress-backup/
Many information is new for me.
Thanks for your so many useful posts, I will follow you to learn wp and try it for my new blog.
This is all new to me too :-)
I usually do go through the spam filter, so even the simple fact you have a Gravatar
I too love Word press and am addicted to smileys, however sadly I am not a web designer. I am so envious of this new breed of creative talent. I am a creative person by nature, but I just cannot get my head around the web protocols and coding.
Restoring is working OK for me, but only when I restore from the same system.
(yes, I transfered the public and private keys over)
when I try to restore I get this error “No files found in archive – nothing restored.”
I also notice that duplicity created another set of files inside the bucket.
it appears to have created a new container for the new computer that is empty.
How do I force it to access the Ubuntu data?
Good thing the overview was well defined and stated. This can be very helpful for the users because they will have a back up.
excellent guide Michael!
i read it yesterday when i was searching for backup ideas for my site that is hosted on Amazon EC2 instance. Although I create EBS backups but don’t think I can do that daily and keep versions for 6 months – that will become too costly as every time it will create a snapshot of 15GB. And there does not seem to be a way to selectively restore individual files (i may be ignorant here)
so i tried the script, tested it using smaller backups initially without encryption, and later with encryptions. i must say that i did not know about how to create backups till one day earlier and now i’m up and running with a solid backup for my server just in a day’s time, thanks for your detailed instructions.
And by the way, i tested exporting and importing keys, and i’m able to restore my backups on different systems, too. sometimes gpg requires signing of the key after import (gpg –sign-key user-id). if it still does not work after signing, increase the trust level to 5 (gpg –edit-key and then command> trust).
All the procedures or steps that were given are well explained and discussed. Users will be able to use this effectively.
Thanks for recommending the method !
The timing of the post has become quite ironic. 2 days after I published it, all hell breaks loose on my server and it’s been down most of the past 2 days. I’d have lost my sanity by now if I didn’t know I had those backups…
Hoteluri Bucuresti
Good Day,
Thank you very much for taking the diligence in crafting a modern backup solution.
On my a Mac it appears the GNUPG library is not installed by default so users should download and install that first, or build the key on the server (as I did).
Also, though not noted in the instructions to run the duplicity script you need python-boto installed. The two commands to install that are here: http://goo.gl/9NmDC
I have a few questions regarding your instructions.
1. When generating a key, the system creates a folder called “.gnupg” with several files. Which file is a the private key and which file is the public key? To note, as of this writing DSA is no longer the default “RSA and RSA” is now the default option which I choose along with 4096 key size.
2. In your example script you mention adding the user values for the first three lines. Do we redudantly add those same values next to export = in the last three lines as well?
3. For “s3+http://BUCKETNAME” what format do we use for custom domains since Amazon now supports custom domains. Would it be “s3+http://backup.example.com” or in the format of “s3+http://backup.example.com.s3.amazonaws.com”? Or does this script not currently support custom domains?
4. For what would the structure be for the directory we want to backup. For example if I’ logged in as user chris and want to backup the contents of a folder called “test” in my home directory would I use: “/home/chris/test/”?
Those are my current questions as I currently get and error when running the script but believe its do to a home directory or more likely the bucketname format in correct. Thanks in advance!
Nice informations, I’ll try as soon as possible!
Thanks.
“If someone gets into your server account, this isn’t going to help you, you’re only protected against hardware failures.”
I too am also wondering about that. Looking at S3 stuff the upload/delete options are in the same security setting… >:
Wonder if its possible to SCP to another sever, and have that user only have upload only ability.
Thank you very much by the tutorial, the truth is that automate this type of processes is a big help
I’m using s3cmd and not duplicity so I have no idea how that works, but this is how I am protecting my bucket files. Maybe it helps someone.
I made an iam user with the policy to only be able to put files wich make them not being able to list or delete any files. They can however replace files with the same name, so I tar the files with a filename that is partly random. Not perfect, but should be enough to protect them.
I’m only copying the webfiles, databases and scripts for setting up the server and configure the services, crontabs, etc. Since it’s only one file, it’s only one put and about 25MB storage every day for me. I don’t automatically delete any backups.
Thanks for putting together this valuable guide. Much appreciated.
By the way, I am using S3 as automatic backup for my wordpress blog using “Automatic Backup” plugin. I have uploaded all images on S3, and so my image URL looks like –
http://bucketname.s3.amazonaws.com/wp-content/uploads/2010/12/Picture.png
How would i go about changing the URL of the image OR swapping the image if I have to? What changes do i make to 1) change the image and 2) change the URL of the image back to http://www.domain.com/wp-content/uploads/2010/12/Picture.png?
Thanks in advance.
Hi,
It looks a very promising script, unfortunately, I have an error while running it :
“This backend requires boto library, version 0.9d or later”
It’s seems to be a known issue, but I haven’t found a way to solve it yet …
sudo apt-get install python-boto
Thanks !
Searching my error on google, i saw some bugs report with a python-boto version … but I didn’t even check if it was installed on my server :P
Shouldn’t it be installed with duplicity as a dependancy ?
Looks like there is some debate on that: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498278
Maintainer Alexander Zangerl says:
“in my opinion, not automatically enforcing S3 support is perfectly reasonable, given that the signficant amount of other, more mainstream and also non-commercial backup services that duplicity supports by default.”
Step 4:
Where do you save the file?
I all the time emailed this website post page to all my friends, since if
like to read it then my friends will too.