Monday, June 28, 2010

An eCryptfs Backup Strategy


Disclaimer: I am often asked about best practices regarding eCryptfs backups. I am not necessarily advocating this as the best approach; rather it this is simply my approach. Do with it what you will ;-)

I generally perform two types of backups...
  1. Backups to Trusted, typically Local Storage (~hourly)
  2. Backups to Untrusted, typically Remote Storage (~daily)
For me, trusted local storage generally means hardware that I own the physical control of, and that I am the only person with immediate root access. This might be a system in my home or office, or even static media locked in a safe deposit box at the bank -- understanding of course that I must trust the physical controls in place. If I don't trust the physical controls, then it's not trusted local storage. My laptop, since I often travel with it, is not trusted local storage, since there's a fair possibility that it might be stolen.

And for me, untrusted remote storage generally means a reasonably secure system, but one that I do not have physical control over and on which I may not be the (only) root user. This includes co-lo's and various forms of web and cloud storage (such as Amazon S3).

I will keep backup copies of my cleartext data on trusted local storage. For me, this means an hourly cronjob that does something like this on the LAN:

rsync -aP /home/$USER/ \
trusted.local.storage:/var/backups/home/$USER/

For untrusted remote storage, I never send my cleartext data, but rather my encrypted private data for backup. And since it's usually over a WAN, I use a daily cronjob that does something like:

rsync -azP $HOME/.Private/ \
untrusted.remote.storage:/var/backups/home/$USER/.Private/

And in both cases, I will periodically (once a month?) run rsync with --delete and --dry-run by hand, check the diff, and then re-run with --delete if I'm satisfied with the results. Do this with care ;-)

This may or may not be ideal for you, and some of you probably have even better ideas! Please feel free to leave a comment if you'd like to share your best practices for backing up your eCryptfs data.

:-Dustin


photo © MIROSLAV VAJDIƄ† from openphoto.net CC:Attribution-ShareAlike

7 comments:

  1. Out of curiosity -- for your (encrypted) offsite storage, do you have any way of producing easy-to-interpret results from rsync --delete --dry-run?

    It seems to me like that would just show a list of long, opaque encrypted filenames, which isn't particularly easy to screen for accidental changes.

    ReplyDelete
  2. a question: I've switched to ecryptfs $HOME, but with a plain fs an rsync would last in 20 minutes, now with encrypted one it lasts about 4 hours!
    It is normal? I know there is a performance penalty, but... There is something that can be done to speed up?
    Running lucid lynx 10.04 both amd64 and i386 with "--verbose --links --perms --times --group --owner --devices --recursive --delete --hard-links --specials"

    ReplyDelete
  3. I use duplicity for my backups. It produces tar encrypted volumes and has lots of backends, such as sftp, scp, imap.

    I produce an daily incremental and weekely full backups. I keep two full backup sets. I use it for both trusted local and off site backups. The advantage is I can simply copy my local backup to a portable usb drive and take it with me, knowing that my sensitive stuff is encrypted and if I lose the drive no one can read it.

    Duplicity is available in the Ubuntu repos or from http://www.nongnu.org/duplicity/

    Ian.

    ReplyDelete
  4. How long does it take to finish the two types of backup? I don't think it will be finished within two days, I have backup my data using "trusted backup" as you name it, and it took 1 day for me after it's done.

    ReplyDelete
  5. You should use the -c option of rsync for encrypted volumes, else it will consider every file has changed.

    ReplyDelete
  6. Thanks om. Good suggestion.

    I use dirvish for differential local backups and was wondering why my encrypted home was always getting fully copied.

    dirvish is based on rsync using hard links to create full trees and only copying changed files. It's working very well except for using a lot more backup space with ecryptfs contents.

    dirvish has a config option (checksum: 1) which forces -c for rsync and that seems to work well. It takes longer to backup but the resulting differentials are much smaller, and about 7x less data across the network.

    ReplyDelete
  7. Scrub my last comment. It doesn't seem to actually work to use -c as after re-testing I found just as many files were copied. Perhaps it's just that many of my home files get changed every day? Or is there some reason why rsync sees ecryptfs files as always modified?

    ReplyDelete

Please do not use blog comments for support requests! Blog comments do not scale well to this effect.

Instead, please use Launchpad for Bugs and StackExchange for Questions.
* bugs.launchpad.net
* stackexchange.com

Thanks,
:-Dustin