From the Canyon Edge -- :-Dustin

Wednesday, August 1, 2012

Introducing overlayroot -- overlayfs + dmcrypt!

A beautiful live oak tree with exposed, overlaying roots, near my house in Austin, Texas
I'm thrilled to introduce what I hope will be an unexpected, innovative, and useful surprise for Ubuntu 12.10!  Along with Scott Moser, I've been hard at work on a new binary package called overlayroot, which is part of the cloud-initramfs-tools source package and project.

Background

In a hallway conversation at UDS-Quantal in Oakland, CA in May 2012, I briefly described a concept to Kees Cook, Tyler Hicks, and Scott Moser...



I look at Ubuntu Cloud AMI's much like Ubuntu Desktop LiveISO's...  The root filesystem of every instance that launches is basically "the same".  What's unique about the system -- the system's actual DNA -- all of that is generated after that first boot.
Deoxyribonucleic acid.  Or so we're told.
Log files, host-specific configuration files, variable state information, user home directories, 3rd party software, temporary files, and so on...  Ideally, all of that information would be encrypted before landing on a random disk in a cloud data center!

About two weeks ago, I started working on implementing this, using two awesome Linux kernel technologies:


  1. overlayfs
    • noting that overlayfs is not yet upstream in the Linux kernel, though Ubuntu carries the patch set (thanks to Andy Whitcroft for his maintenance and upstream communications)
  2. dmcrypt
    • which has been upstream in the Linux kernel for quite some time


A medieval crypt we visited outside of the Glasgow Cathedral in Scotland
As a base, I started from some documentation and prototype code in the lengthy Ubuntu wiki article, aufsRootFileSystemOnUsbFlash.  There’s some interesting sample code embedded in that page, licensed under the GPLv3 and attributed to Axel Heider, Sebastian P., and Nicholas A. Schembri.  I’ve never met these individuals but I appreciate their efforts and I’m delighted to embrace, extend, package, and publish their work directly in the Ubuntu distribution!  I had started writing something similar from scratch before I found this.  The real magic for me was seeing the mount command called with the --move option.  Very cool!  That’s really what makes all of this work in the initramfs ;-)


Shameless advertising.  She's gotta feed the monkey man.
A special thanks to my employer, Gazzang, who have enabled me to work on this open source project, as it lines up well with our broad interests in securing information in the Cloud and Big Data deployments.  In particular, Gazzang’s zTrustee commercial key management service can be used to protect, secure, backup, revoke, and retrieve the keys that make this whole procedure work.  Contact us if you’re interested in that.

And an even bigger thanks to Scott Moser for working right there with me on this somewhat ad hoc project.  Scott and I had worked together for years, across IBM and Canonical.  It’s been a lot of fun reconnecting with him, hacking on such an interesting, innovative, and broadly useful feature!

Overlayfs

Overlayfs is a successor of a series of union filesystem implementations, such as unionfs, aufs, etc.  It creates a single unified view of two directories.  One is the "lower" filesystem -- in our case, this a read-only mount of our original, pristine Ubuntu AMI.  The second is the "upper" filesystem -- in our case, this is a read-write mount of an encrypted block device.  We’re hopeful that it might one day make it upstream into Linux, though progress on that seems to have stalled.  Thankfully, Ubuntu’s kernels are carrying it for now.

dmcrypt

dmcrypt is a block device encryption scheme (different from eCryptfs, which implements a per-file encryption scheme).  Many of my blog readers know that I'm one of the authors and maintainers of eCryptfs, as well as one of its biggest fans :-)  That said, I recognize that dmcrypt solves a different problem, and I chose dmcrypt for this solution.  In this case, we'd like to encrypt the entire block device, ensuring that all reads and writes are happening to and from encrypted storage.  dmcrypt uses cryptsetup and LUKS to configure, format, and open a block device for encryption.  A virtual block device is "mapped" and presented in /dev/mapper/, and this is what will be mounted with read-write capability onto the overlayfs upper filesystem.

You can read more at this excellent, detailed blog post about dmcrypt.

A couple of ways dmcrypt can be used
Installation

As of August 1, 2012, the overlayroot package is now in the default Ubuntu cloud daily images for quantal, which will debut in the Ubuntu 12.10 Beta1 milestone.

For older releases of Ubuntu, we’re also automatically publishing backports of the package to the PPA at ppa:cloud-initramfs-tools/ppa.  I’ve verified the backported functionality on at least Ubuntu 12.04 LTS (Precise).  You can add the PPA and install using:

sudo apt-add-repository ppa:cloud-initramfs-tools/ppa
sudo apt-get install -y overlayroot

Configuration

Once the package is installed, you need to edit the configuration file at /etc/overlayroot.conf.  The syntax, modes, and capabilities are thoroughly described in the inline documentation in the file itself, but I’ll briefly recap here...

The file should define a single variable, overlayroot, which, by default is empty (meaning that the functionality is disabled).  Alternatively, you can also pass the same variable and string definition on the kernel command line.  The functionality actually works on physical hardware just as well as cloud instances.  There are basically 3 different modes that overlayroot supports:
  1. The backing device is tmpfs
    • eg, overlayroot=”tmpfs”
    • changes on top of the root filesystem are stored in a temporary filesystem in memory
    • all changes are discarded on reboot
    • the amount of available disk space is limited to ½ of the total memory available on the system
    • the tmpfs is mounted onto /media/root-rw
  2. The backing device is block storage
    • eg, overlayroot=”dev=/dev/xvdb”
    • changes on top of the root filesystem are stored on a separate block device, typically /dev/xvdb in AWS
    • these changes are not encrypted by default
    • changes are preserved across reboots
    • the system root filesystem will have as much storage as available on the block storage device
    • the block device is mounted onto /media/root-rw
  3. The backing device is encrypted
    • eg, overlayroot=”crypt:dev=/dev/xvdb”
    • changes on top of the root filesystem are stored in an encrypted block device
    • typically, /dev/xvdb is the backing device, which is mapped and encrypted using cryptsetup and LUKS
    • /dev/mapper/secure is the encrypted block device
    • /dev/mapper/secure is dmcrypt mounted onto /media/root-rw
    • reboots are unfortunately not supported by default, unless you hardcode the dmcrypt mount passphrase in the initramfs
      • See below for more information about that

All three of these have the tremendous advantage of keeping your root disk pristine and your changes on a separate partition, for auditability and rollback purposes.  Also, there’s a really nice side effect of all of this -- whereas typically you have a root volume with only a couple of GB of space, if your attached EBS storage backing your overlayroot is 100GB, your root partition now has 100GB of available space.  This is far more useful to me than having to symlink and dance around all of that juicy storage inconveniently located over in /mnt!

There are several other options documented inline in the configuration file.  I’ll leave those for future posts and as an exercise for the reader ;-)  The three simple configuration options in bold above, however, should get you up and running in AWS and OpenStack Clouds trivially with Ubuntu’s 12.10 (Quantal) images though.

Verification

You might want to verify that the encryption is operating as expected.  You should see a mount table that looks like this:

ubuntu@ip-10-140-27-223:~$ mount
overlayroot on / type overlayfs (rw,lowerdir=/media/root-ro/,upperdir=/media/root-rw/overlay)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
/dev/xvda1 on /media/root-ro type ext4 (ro)
tmpfs-root on /media/root-rw type tmpfs (rw,relatime)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
/dev/xvdb on /media/root-ro/mnt type ext3 (ro)
/media/root-ro/mnt on /mnt type overlayfs (rw,lowerdir=/media/root-ro/mnt,upperdir=/media/root-rw/overlay/mnt)


Note that your original root volume is mounted on /media/root-ro:

/dev/xvda1 on /media/root-ro type ext4 (ro)

And your root disk is, in fact, an overlayfs on top of /media/root-ro/ and /media/root-rw/overlay.

overlayroot on / type overlayfs (rw,lowerdir=/media/root-ro/,upperdir=/media/root-rw/overlay)



If you’re using tmpfs, you should see:

tmpfs-root on /media/root-rw type tmpfs (rw,relatime)



If you’re using unencrypted block storage, you should see:

/dev/xvdb on /media/root-rw type ext4 (rw,relatime,data=ordered)


And if you’re using encrypted block storage, you should see:

/dev/mapper/secure on /media/root-rw type ext4 (rw,relatime,data=ordered)


To verify that your dmcrypt setup of your block storage, you can run:

$ sudo cryptsetup luksDump /dev/xvdb
LUKS header information for /dev/xvdb

Version:        1
Cipher name:    aes
Cipher mode:    cbc-essiv:sha256
Hash spec:      sha1
Payload offset: 4096
MK bits:        256
MK digest:      bc be 66 cd 87 3d 33 6c 2c 99 72 00 b6 d2 be b0 69 6a 76 39
MK salt:        42 14 8d da 89 15 e4 66 a0 84 6e 6a 7f bc 3e 34
               da a7 44 3d 6b 80 bb 6f 5b 44 77 4a 9c b0 91 4c
MK iterations:  27625
UUID:           245e08b9-4458-4e37-a029-68ebd5f65eca

Key Slot 0: ENABLED
       Iterations:             110583
       Salt:                   a8 fe 2a ec cd 0c 23 7d 13 ef 91 aa 05 88 24 9b
                               2e a1 cd 54 54 96 0d 0b ce 88 aa cc 72 a8 d4 63
       Key material offset:    8
       AF stripes:             4000
Key Slot 1: DISABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED

And if you’re really paranoid, you can run `sudo strings /dev/xvdb` on the block device and look for anything recognizable as clear text data ;-)

$ sudo strings /dev/xvdb
LUKS cbc-essiv:sha256 sha1 \9P< kl2d44f1f1-ccff-4fc7-857c-35519faa9158 Ow\rn {(\T UK9Y> 8Ybk < 'Yb* $i"/5X BIS8
...
The Matrix

Encryption from Instance Genesis

So the above example requires a first boot, some configuration, and then a reboot, which means that some of the data generated on first boot (like the host ssh keys) actually landed on the read-only root, before overlayroot was able to do its work.

Michelangelo's Creation of Adam, a Genesis event 
Well, ideally we’d capture all of those changes in the overlay as well.  Doing that is a little harder than it sounds.  We need to be able to feed that configuration information into a brand new instance at boot time, during the initramfs.  We can’t use user-data, cloud-init, or the metadata service, because networking is not yet up and running in the instance.

Scott Moser came up with a rather inspired workaround :-)  He created a “minimal”, 1GB snapshot of a volume that has a filesystem on it, and a very basic overlayroot.conf configuration file.  If such a volume is attached to an instance, and has a volume label of OROOTCFG, the the overlayroot hooks in the initramfs will load and use the overlayroot.conf on that device.  You can simply create a new instance, attaching a volume based on this snapshot!  From the command line, you’d need something like this:

euca-run-instances --key=ec2-keypair --instance-type=m1.small  --block-device-mapping=/dev/sdc=snap-67d20b17:10:0 ami-e300a88a

The “:10” says that your volume should be 10GB.  And note that the “:0” denotes whether to delete the volume on termination or not (0=false/do-not-delete, 1=true/delete).
Note that that snapshot was a test we were using and has since been deleted. I'm hoping Gazzang will be willing to host a couple of these snapshots for general usage. I'll update this post when and if that happens!

And from the GUI, it looks like this:

AWS Console, launching an image with an EBS snapshot attached


Entropy and Keys

Using mode 3 as described above, you can easily ensure that any runtime data and changes to the stock system image are encrypted before being written to disk.  This begs the question, of course, around the keys used to handle this encryption.  I’ll try to explain the process, complexity, and potential solutions now.

Encryption of any kind -- be it disk encryption, file encryption, network transport layer encryption, etc. -- always requires high quality keys.  Generating high quality keys requires high quality entropy.  And high quality encryption keys always require “key management”.  I have more information on that in another post, including talks I’m giving on that topic to two conferences in August 2012.


Keys in the Cloud -- not an easy problem to solve

A user of overlayroot with encryption can, at their option, specify the dmcrypt mount passphrase itself in the overlayroot.conf file, using the pass=$PASS option.  This has two distinct advantages, and one significant disadvantage.  The advantages are that the user can control the quality/length of the passphrase, and that this passphrase can be used mount the encrypted volume repeatedly, at every boot, ensuring that the persistent encrypted data is available each time the system reboots.  The disadvantage is that the passphrase is plainly visible inside of the initramfs or in your /boot on the base partition, rendering your encryption defenseless if your attacker has access to both your read-only root volume and your encrypted root volume.  For some use cases, this might be a reasonable trade-off, though for others, it may not.

In the default case, however, where the pass=$PASS option is not specified, the encryption mount passphrase is automatically generated at boot time.  This presents its own set of challenges for the security-minded, as computers have painfully little entropy at initial boot time, as shown in this USENIX paper.  The problem is exacerbated in cloud computing environments, where these “computers” are actually virtual machines, which, by design, have been cloned to appear quite similarly to one another.

Knowing that this is an issue causing some consternation for security professionals, I’ll describe in detail here how our overlayroot encryption passphrase is generated in the initramfs, and invite your review, feedback, and suggestions.

Key Generation Design

An automatically generated overlayroot dmcrypt passphrase is a sha512sum string, which consists of 128 characters of hex [0-9a-f].  Since we’re using cryptsetup with LUKS, this forms a 512-bit wrapping passphrase.  LUKS and cryptsetup will then generate another random 256-bit volume master key.  The wrapping key can be changed easily without re-encrypting all of the data on the device, whereas the volume key cannot.  A compromise of either of these keys will compromise your encrypted data.

Random bloke cutting keys
To generate this key, we need to provide the sha512sum utility (which we’ve added to the initramfs) at least 512-bits of random data.  Our current implementation does the following:


  1. We perform a best-effort seeding of /dev/urandom with some psuedo-random data, from two places, if available.  First, we use /random-seed, if found in the initramfs itself.  And second, we use /var/lib/urandom/random-seed, if found in the root image.  The initramfs /random-seed file is created by our overlayroot’s initramfs hooks script, which tries to read up to 4096 bytes of data from /dev/random (true random data).  The root filesystem’s /var/lib/urandom/random-seed file is populated on each system reboot, to carry some randomness across boots.  Neither of these are definitive, but just add some spice to the pool of psuedo-random data multiplexed at /dev/urandom.
  2. Then we seed our input data with the output of `stat -L /dev/* /proc/* /sys/*` in the initramfs.  This adds entropy to our input due to the mostly unpredictable combinations of timestamps associated with the Access/Modify/Change times of each file, directory, and block device in the top level of these kernel filesystems.
  3. Next, we concatenate data from each of 3 places:
    • /proc/sys/kernel/random/boot_id
      • a 16-byte uuid, psuedo-random, “unique” per boot
    • /proc/sys/kernel/random/uuid
      • a 16-byte uuid, psuedo-random, “unique” every time it’s read
    • /dev/urandom
      • 4096-bytes of peusdo-randomness, which is jumbled by our seeding as described above
  4. Finally, we read and concatenate the first 4096 bytes from the specified backing block device
    • /dev/xvdb (presumably, in AWS)
      • Here, a sophisticated or paranoid user could write up to 4KB of their own quality random data to the block device from the same or even a different instance
      • But by default, a fresh block device from Amazon contains a filesystem, which actually has its own somewhat unique metadata written to the head of the block device
  5. Once we’ve gathered all of that data, we calculate the sha512sum of the total, and write it to a root-owned file stored in a tmpfs.  Once booted, the root user can recover this value from /dev/.initramfs/overlayroot.XXXXXX.  This string is used by overlayroot and fed to cryptsetup’s --key-file parameter.
  6. Note that cryptsetup also generates the master volume key.  We are using the default cryptsetup option, which generates a psuedo-random key from /dev/urandom.  We have tried to help that along by seeding /dev/urandom with some additional input.
    • Note that we tried using the cryptsetup --use-random option, but our test instances hung indefinitely, blocking on /dev/random.  Sorry.

In Conclusion

If you're still with me here, you should really be proud of yourself :-) This is a bit of a heavy topic, but oh-so-important! We're entrusting more and more of our digital lives, corporate information, trade secrets, cyber property, and critical data to remote storage scatter about the globe.

How can you possibly know who has physical access to those bits? The sobering reality is that, frankly...you can't.

I continue to advocate in favor of intelligent cryptographic solutions to this problem, and with overlayroot, you now have a very powerful tool at your disposal, to easily encrypt your information in Ubuntu cloud instances. This covers automatically generated system exhaust spewed to /var, as well as your precious configuration files, user data, and service information.

:-Dustin

4 comments:

  1. You might like to look into truerand or twuewand for more entropy.

    ReplyDelete
  2. Nice work!

    Is there a way to "commit" stuff in the upper fs to the lower fs?

    ReplyDelete
  3. Hi,
    using overlayroot=tmpfs on USB stick (with ext2 fs) os is xubuntu 12.10. Generally works fine, however there is small issue with network mounts.

    Both nfs and samba have problems if mounts are defined in fstab - without overlayfs works fine. With overlayfs complains that network mounts are not available during boot.

    Any idea what could be problem?

    ReplyDelete
  4. >>dmcrypt uses cryptsetup and LUKS to configure, format, and open a block device for encryption.

    Sounds a little strange.

    @IvanZ: I would blame overlayfs, aufs2 might be better for the job.

    ReplyDelete

Please do not use blog comments for support requests! Blog comments do not scale well to this effect.

Instead, please use Launchpad for Bugs and StackExchange for Questions.
* bugs.launchpad.net
* stackexchange.com

Thanks,
:-Dustin