A beautiful live oak tree with exposed, overlaying roots, near my house in Austin, Texas |
Background
In a hallway conversation at UDS-Quantal in Oakland, CA in May 2012, I briefly described a concept to Kees Cook, Tyler Hicks, and Scott Moser...
I look at Ubuntu Cloud AMI's much like Ubuntu Desktop LiveISO's... The root filesystem of every instance that launches is basically "the same". What's unique about the system -- the system's actual DNA -- all of that is generated after that first boot.
Deoxyribonucleic acid. Or so we're told. |
About two weeks ago, I started working on implementing this, using two awesome Linux kernel technologies:
- overlayfs
- noting that overlayfs is not yet upstream in the Linux kernel, though Ubuntu carries the patch set (thanks to Andy Whitcroft for his maintenance and upstream communications)
- dmcrypt
- which has been upstream in the Linux kernel for quite some time
A medieval crypt we visited outside of the Glasgow Cathedral in Scotland |
Shameless advertising. She's gotta feed the monkey man. |
And an even bigger thanks to Scott Moser for working right there with me on this somewhat ad hoc project. Scott and I had worked together for years, across IBM and Canonical. It’s been a lot of fun reconnecting with him, hacking on such an interesting, innovative, and broadly useful feature!
Overlayfs
Overlayfs is a successor of a series of union filesystem implementations, such as unionfs, aufs, etc. It creates a single unified view of two directories. One is the "lower" filesystem -- in our case, this a read-only mount of our original, pristine Ubuntu AMI. The second is the "upper" filesystem -- in our case, this is a read-write mount of an encrypted block device. We’re hopeful that it might one day make it upstream into Linux, though progress on that seems to have stalled. Thankfully, Ubuntu’s kernels are carrying it for now.
dmcrypt
dmcrypt is a block device encryption scheme (different from eCryptfs, which implements a per-file encryption scheme). Many of my blog readers know that I'm one of the authors and maintainers of eCryptfs, as well as one of its biggest fans :-) That said, I recognize that dmcrypt solves a different problem, and I chose dmcrypt for this solution. In this case, we'd like to encrypt the entire block device, ensuring that all reads and writes are happening to and from encrypted storage. dmcrypt uses cryptsetup and LUKS to configure, format, and open a block device for encryption. A virtual block device is "mapped" and presented in /dev/mapper/, and this is what will be mounted with read-write capability onto the overlayfs upper filesystem.
A couple of ways dmcrypt can be used |
As of August 1, 2012, the overlayroot package is now in the default Ubuntu cloud daily images for quantal, which will debut in the Ubuntu 12.10 Beta1 milestone.
For older releases of Ubuntu, we’re also automatically publishing backports of the package to the PPA at ppa:cloud-initramfs-tools/ppa. I’ve verified the backported functionality on at least Ubuntu 12.04 LTS (Precise). You can add the PPA and install using:
sudo apt-add-repository ppa:cloud-initramfs-tools/ppa
sudo apt-get install -y overlayroot
Configuration
Once the package is installed, you need to edit the configuration file at /etc/overlayroot.conf. The syntax, modes, and capabilities are thoroughly described in the inline documentation in the file itself, but I’ll briefly recap here...
The file should define a single variable, overlayroot, which, by default is empty (meaning that the functionality is disabled). Alternatively, you can also pass the same variable and string definition on the kernel command line. The functionality actually works on physical hardware just as well as cloud instances. There are basically 3 different modes that overlayroot supports:
- The backing device is tmpfs
- eg, overlayroot=”tmpfs”
- changes on top of the root filesystem are stored in a temporary filesystem in memory
- all changes are discarded on reboot
- the amount of available disk space is limited to ½ of the total memory available on the system
- the tmpfs is mounted onto /media/root-rw
- The backing device is block storage
- eg, overlayroot=”dev=/dev/xvdb”
- changes on top of the root filesystem are stored on a separate block device, typically /dev/xvdb in AWS
- these changes are not encrypted by default
- changes are preserved across reboots
- the system root filesystem will have as much storage as available on the block storage device
- the block device is mounted onto /media/root-rw
- The backing device is encrypted
- eg, overlayroot=”crypt:dev=/dev/xvdb”
- changes on top of the root filesystem are stored in an encrypted block device
- typically, /dev/xvdb is the backing device, which is mapped and encrypted using cryptsetup and LUKS
- /dev/mapper/secure is the encrypted block device
- /dev/mapper/secure is dmcrypt mounted onto /media/root-rw
- reboots are unfortunately not supported by default, unless you hardcode the dmcrypt mount passphrase in the initramfs
- See below for more information about that
All three of these have the tremendous advantage of keeping your root disk pristine and your changes on a separate partition, for auditability and rollback purposes. Also, there’s a really nice side effect of all of this -- whereas typically you have a root volume with only a couple of GB of space, if your attached EBS storage backing your overlayroot is 100GB, your root partition now has 100GB of available space. This is far more useful to me than having to symlink and dance around all of that juicy storage inconveniently located over in /mnt!
There are several other options documented inline in the configuration file. I’ll leave those for future posts and as an exercise for the reader ;-) The three simple configuration options in bold above, however, should get you up and running in AWS and OpenStack Clouds trivially with Ubuntu’s 12.10 (Quantal) images though.
Verification
You might want to verify that the encryption is operating as expected. You should see a mount table that looks like this:
ubuntu@ip-10-140-27-223:~$ mount
overlayroot on / type overlayfs (rw,lowerdir=/media/root-ro/,upperdir=/media/root-rw/overlay)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
/dev/xvda1 on /media/root-ro type ext4 (ro)
tmpfs-root on /media/root-rw type tmpfs (rw,relatime)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
/dev/xvdb on /media/root-ro/mnt type ext3 (ro)
/media/root-ro/mnt on /mnt type overlayfs (rw,lowerdir=/media/root-ro/mnt,upperdir=/media/root-rw/overlay/mnt)
Note that your original root volume is mounted on /media/root-ro:
/dev/xvda1 on /media/root-ro type ext4 (ro)
And your root disk is, in fact, an overlayfs on top of /media/root-ro/ and /media/root-rw/overlay.
overlayroot on / type overlayfs (rw,lowerdir=/media/root-ro/,upperdir=/media/root-rw/overlay)
If you’re using tmpfs, you should see:
tmpfs-root on /media/root-rw type tmpfs (rw,relatime)
If you’re using unencrypted block storage, you should see:
And if you’re using encrypted block storage, you should see:
/dev/mapper/secure on /media/root-rw type ext4 (rw,relatime,data=ordered)
To verify that your dmcrypt setup of your block storage, you can run:
$ sudo cryptsetup luksDump /dev/xvdb
LUKS header information for /dev/xvdb
Version: 1
Cipher name: aes
Cipher mode: cbc-essiv:sha256
Hash spec: sha1
Payload offset: 4096
MK bits: 256
MK digest: bc be 66 cd 87 3d 33 6c 2c 99 72 00 b6 d2 be b0 69 6a 76 39
MK salt: 42 14 8d da 89 15 e4 66 a0 84 6e 6a 7f bc 3e 34
da a7 44 3d 6b 80 bb 6f 5b 44 77 4a 9c b0 91 4c
MK iterations: 27625
UUID: 245e08b9-4458-4e37-a029-68ebd5f65eca
Key Slot 0: ENABLED
Iterations: 110583
Salt: a8 fe 2a ec cd 0c 23 7d 13 ef 91 aa 05 88 24 9b
2e a1 cd 54 54 96 0d 0b ce 88 aa cc 72 a8 d4 63
Key material offset: 8
AF stripes: 4000
Key Slot 1: DISABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED
And if you’re really paranoid, you can run `sudo strings /dev/xvdb` on the block device and look for anything recognizable as clear text data ;-)
The Matrix |
Encryption from Instance Genesis
So the above example requires a first boot, some configuration, and then a reboot, which means that some of the data generated on first boot (like the host ssh keys) actually landed on the read-only root, before overlayroot was able to do its work.
Michelangelo's Creation of Adam, a Genesis event |
Scott Moser came up with a rather inspired workaround :-) He created a “minimal”, 1GB snapshot of a volume that has a filesystem on it, and a very basic overlayroot.conf configuration file. If such a volume is attached to an instance, and has a volume label of OROOTCFG, the the overlayroot hooks in the initramfs will load and use the overlayroot.conf on that device. You can simply create a new instance, attaching a volume based on this snapshot! From the command line, you’d need something like this:
euca-run-instances --key=ec2-keypair --instance-type=m1.small --block-device-mapping=/dev/sdc=snap-67d20b17:10:0 ami-e300a88a
The “:10” says that your volume should be 10GB. And note that the “:0” denotes whether to delete the volume on termination or not (0=false/do-not-delete, 1=true/delete).
Note that that snapshot was a test we were using and has since been deleted. I'm hoping Gazzang will be willing to host a couple of these snapshots for general usage. I'll update this post when and if that happens!
And from the GUI, it looks like this:
AWS Console, launching an image with an EBS snapshot attached |
Entropy and Keys
Using mode 3 as described above, you can easily ensure that any runtime data and changes to the stock system image are encrypted before being written to disk. This begs the question, of course, around the keys used to handle this encryption. I’ll try to explain the process, complexity, and potential solutions now.
Encryption of any kind -- be it disk encryption, file encryption, network transport layer encryption, etc. -- always requires high quality keys. Generating high quality keys requires high quality entropy. And high quality encryption keys always require “key management”. I have more information on that in another post, including talks I’m giving on that topic to two conferences in August 2012.
Keys in the Cloud -- not an easy problem to solve |
A user of overlayroot with encryption can, at their option, specify the dmcrypt mount passphrase itself in the overlayroot.conf file, using the pass=$PASS option. This has two distinct advantages, and one significant disadvantage. The advantages are that the user can control the quality/length of the passphrase, and that this passphrase can be used mount the encrypted volume repeatedly, at every boot, ensuring that the persistent encrypted data is available each time the system reboots. The disadvantage is that the passphrase is plainly visible inside of the initramfs or in your /boot on the base partition, rendering your encryption defenseless if your attacker has access to both your read-only root volume and your encrypted root volume. For some use cases, this might be a reasonable trade-off, though for others, it may not.
In the default case, however, where the pass=$PASS option is not specified, the encryption mount passphrase is automatically generated at boot time. This presents its own set of challenges for the security-minded, as computers have painfully little entropy at initial boot time, as shown in this USENIX paper. The problem is exacerbated in cloud computing environments, where these “computers” are actually virtual machines, which, by design, have been cloned to appear quite similarly to one another.
Knowing that this is an issue causing some consternation for security professionals, I’ll describe in detail here how our overlayroot encryption passphrase is generated in the initramfs, and invite your review, feedback, and suggestions.
Key Generation Design
An automatically generated overlayroot dmcrypt passphrase is a sha512sum string, which consists of 128 characters of hex [0-9a-f]. Since we’re using cryptsetup with LUKS, this forms a 512-bit wrapping passphrase. LUKS and cryptsetup will then generate another random 256-bit volume master key. The wrapping key can be changed easily without re-encrypting all of the data on the device, whereas the volume key cannot. A compromise of either of these keys will compromise your encrypted data.
Random bloke cutting keys |
- We perform a best-effort seeding of /dev/urandom with some psuedo-random data, from two places, if available. First, we use /random-seed, if found in the initramfs itself. And second, we use /var/lib/urandom/random-seed, if found in the root image. The initramfs /random-seed file is created by our overlayroot’s initramfs hooks script, which tries to read up to 4096 bytes of data from /dev/random (true random data). The root filesystem’s /var/lib/urandom/random-seed file is populated on each system reboot, to carry some randomness across boots. Neither of these are definitive, but just add some spice to the pool of psuedo-random data multiplexed at /dev/urandom.
- Then we seed our input data with the output of `stat -L /dev/* /proc/* /sys/*` in the initramfs. This adds entropy to our input due to the mostly unpredictable combinations of timestamps associated with the Access/Modify/Change times of each file, directory, and block device in the top level of these kernel filesystems.
- Next, we concatenate data from each of 3 places:
- /proc/sys/kernel/random/boot_id
- a 16-byte uuid, psuedo-random, “unique” per boot
- /proc/sys/kernel/random/uuid
- a 16-byte uuid, psuedo-random, “unique” every time it’s read
- /dev/urandom
- 4096-bytes of peusdo-randomness, which is jumbled by our seeding as described above
- Finally, we read and concatenate the first 4096 bytes from the specified backing block device
- /dev/xvdb (presumably, in AWS)
- Here, a sophisticated or paranoid user could write up to 4KB of their own quality random data to the block device from the same or even a different instance
- But by default, a fresh block device from Amazon contains a filesystem, which actually has its own somewhat unique metadata written to the head of the block device
- Once we’ve gathered all of that data, we calculate the sha512sum of the total, and write it to a root-owned file stored in a tmpfs. Once booted, the root user can recover this value from /dev/.initramfs/overlayroot.XXXXXX. This string is used by overlayroot and fed to cryptsetup’s --key-file parameter.
- Note that cryptsetup also generates the master volume key. We are using the default cryptsetup option, which generates a psuedo-random key from /dev/urandom. We have tried to help that along by seeding /dev/urandom with some additional input.
- Note that we tried using the cryptsetup --use-random option, but our test instances hung indefinitely, blocking on /dev/random. Sorry.
You might like to look into truerand or twuewand for more entropy.
ReplyDeleteNice work!
ReplyDeleteIs there a way to "commit" stuff in the upper fs to the lower fs?
Hi,
ReplyDeleteusing overlayroot=tmpfs on USB stick (with ext2 fs) os is xubuntu 12.10. Generally works fine, however there is small issue with network mounts.
Both nfs and samba have problems if mounts are defined in fstab - without overlayfs works fine. With overlayfs complains that network mounts are not available during boot.
Any idea what could be problem?
>>dmcrypt uses cryptsetup and LUKS to configure, format, and open a block device for encryption.
ReplyDeleteSounds a little strange.
@IvanZ: I would blame overlayfs, aufs2 might be better for the job.