Wednesday, August 29, 2012

The Linux Foundation's Cloud Open: Security and Privacy in the Cloud

Howdy all!

I just delivered my presentation at the Linux Foundation's CloudOpen 2012 event, and I'm happy to share my slides below.  You can also download the PDF.


I must say that this conference is simply one of the absolute best conferences around.  This year co-located several events, including LinuxCon, Linux Plumbers, as well as the new CloudOpen conference.  I've spoken at each of these in the past, and always found the quality of the presentations, evening events, and hallway conversations as the absolute best in the industry.

:-Dustin

Monday, August 13, 2012

Data encryption -- Why? Some numbers...


At last weekend's Texas Linux Fest, at the end of my presentation, Data Security and Privacy in the Cloud, an attendee asked a great question.  I'll paraphrase...
So...  What's the actual threat model?  Why are you insisting that people encrypt their data in the cloud?  Where's the risk?  When might unencrypted data get compromised?  Who is accessing that data?
A couple of weeks ago, an article from ComputerWorld made the front page of Slashdot:

'Wall of Shame' exposes 21M medical record breaches New rules under the Health Information Technology for Economic and Clinical Health Act, By August 7, 2012 06:00 AM ET


Here's a few absolutely astounding numbers from that article, which were pulled from the US Department of Health and Human Services Health Information Privacy website by the author of that article.


Since the data is publicly available, I was able to download and import all of these into a spreadsheet and run some numbers and verify ComputerWorld's article.  I can confirm that the Mr. Mearian's numbers are quite accurate, and just as scary.  Since September 2009:

  • 21+ million people have had their health care records exposed
  • 480 breaches have been reported

The top 6 breaches all affected more than 1 million individuals:

  • 4.9 million records: TRICARE Management Activity, the US Department of Defense's health care program, exposed 4.9 million health care records when backup tapes went missing
  • 1.9 million records: Health Net lost 1.9 million records when backup hard drives went missing
  • 1.7 million records: New York City Health & Hospital's Corporation's North Bronx Health Care Network reported the theft of 1.7 million records
  • 1.22 million records: AvMed Health Plans reported the loss of a laptop with 1.22 million patient records
  • 1.02 million records: Blue Cross Blue Shield of Tennessee exposed 1.02 million records with the loss of an external hard drive
  • 1.05 million records: Nemours Foundation (runs children's hospitals) lost 1.05 million records with missing backup tapes


Such breaches are very costly, too.

  • $4.3 million: Cignet Health of Prince George's County civil lawsuit penalty
  • $1.5 million: Blue Cross Blue Shield of Tennessee penalties
    • have since encrypted all of their hard drives, 885TB of data
  • $1.7 million: Alaska Department of Health penalty
    • due to theft of a thumb drive, stolen from an employee's car


Running a few more reports on the public CSV data,

  • Across 480 reported breaches, these were the top reasons given for the incident:
    • 55%: Theft of devices or physical media
      • 26%: Hacking/Unauthorized access
      • 12%: Lost devices, disks, tapes, drives, media
      • 5%: Improper disposal of devices
      • 3%: Other

    The most disappointing part, to me, is that 72% of those breaches stemming from theft, lost devices, and improper disposal -- a total of 15.6 million individual's health records. This means that the vast majority of these compromises are easily preventable, through the use of comprehensive data encryption. And I'd argue that many of the remaining 28% of the breaches attributed to hacking, unauthorized access, and other disclosures could also be thwarted, slowed, or deterred by coupling encryption with advanced key management, access controls, and regular auditing.

    So here I am, writing the same thing I've been writing in this blog for 4 years now...
    1. Encrypt your data.
    2. Help your colleagues, friends, and families encrypt their data.
    3. Insist that your employers institute thorough security policies around encryption.
    4. Ask hard questions of your health care providers and financial services professionals, about the privacy of the data of yours they have. Hold them accountable.
    There's a wide range of tools available, from free/open source, to paid commercial offerings. On the free/open source side, I'm a proponent, author, and maintainer of both eCryptfs and overlayroot (which uses dmcrypt). These can help protect your home directory and your private data in cloud instances.


    And from the commercial side, my employer, Gazzang, sells an enterprise-class encryption product called zNcrypt, and I've architected Gazzang's cloud-compatible key management system, zTrustee. I have no doubt that the combination of these two technologies -- comprehensive data encryption and a robust key management solution -- could have prevented the compromise of millions of these records.

    :-Dustin

    Monday, August 6, 2012

    ecryptfs-utils-100 released

    Most of the original IBM LTC Security Team that designed and implemented eCryptfs, 2005-2008, along with a couple of Gazzangers who have also contributed to eCryptfs.  Gazzang hosted a small reception on Thursday, August 2, 2012.
    I'm pleased to announce the 100th release of the ecryptfs-utils userspace package!

    Somewhat unusually, eCryptfs userspace packages simply increment a single integral revision number, rather than a major or minor revision.  That project maintenance decision predates my involvement as project maintainer.  But it seems to work, at least for me :-)  I started maintaining the eCryptfs project and package at release 50, so this marks roughly my 50th release too.

    Grepping through the changelog, I counted 157 bugs fixed over the last 6 years.  I really like to recognize the contributors who have helped bring a stable and reliable eCryptfs to you:

    Apologies if I missed you...let me know and I'll add you in there ;-)


    Changelog follows.   Here's to another 100!

    ecryptfs-utils (100) precise; urgency=low
    
      [ Tyler Hicks ]
      * src/pam_ecryptfs/pam_ecryptfs.c, src/libecryptfs/key_management.c:
          LP: #1024476
        - fix regression introduced in ecryptfs-utils-99 when Encrypted
          Home/Private is in use and the eCryptfs kernel code is compiled as a
          module
        - drop check for kernel filename encryption support in pam_ecryptfs, as
          appropriate privileges to load the eCryptfs kernel module may not be
          available and filename encryption has been supported since 2.6.29
        - always add filename encryption key to the kernel keyring from pam_mount
    
      [ Colin King ]
      * tests/kernel/inode-race-stat/test.c:
        - limit number of forks based on fd limits
      * tests/kernel/enospc.sh, tests/kernel/enospc/test.c,
        tests/kernel/Makefile.am, tests/kernel/tests.rc:
        - add test case for ENOSPC
    
      [ Tim Harder ]
      * m4/ac_python_devel.m4: LP: #1029217
        - proplery save and restore CPPFLAGS and LIBS when python support is
          enabled
    
     -- Dustin Kirkland Thu, 02 Aug 2012 16:33:22 -0500
    

    Cheers,
    Dustin

    Saturday, August 4, 2012

    Texas Linux Fest 2012 Presentation

    As promised, here are the slides from my Texas Linux Fest 2012 presentation:







    Enjoy!
    :-Dustin

    Wednesday, August 1, 2012

    Introducing overlayroot -- overlayfs + dmcrypt!

    A beautiful live oak tree with exposed, overlaying roots, near my house in Austin, Texas
    I'm thrilled to introduce what I hope will be an unexpected, innovative, and useful surprise for Ubuntu 12.10!  Along with Scott Moser, I've been hard at work on a new binary package called overlayroot, which is part of the cloud-initramfs-tools source package and project.

    Background

    In a hallway conversation at UDS-Quantal in Oakland, CA in May 2012, I briefly described a concept to Kees Cook, Tyler Hicks, and Scott Moser...



    I look at Ubuntu Cloud AMI's much like Ubuntu Desktop LiveISO's...  The root filesystem of every instance that launches is basically "the same".  What's unique about the system -- the system's actual DNA -- all of that is generated after that first boot.
    Deoxyribonucleic acid.  Or so we're told.
    Log files, host-specific configuration files, variable state information, user home directories, 3rd party software, temporary files, and so on...  Ideally, all of that information would be encrypted before landing on a random disk in a cloud data center!

    About two weeks ago, I started working on implementing this, using two awesome Linux kernel technologies:


    1. overlayfs
      • noting that overlayfs is not yet upstream in the Linux kernel, though Ubuntu carries the patch set (thanks to Andy Whitcroft for his maintenance and upstream communications)
    2. dmcrypt
      • which has been upstream in the Linux kernel for quite some time


    A medieval crypt we visited outside of the Glasgow Cathedral in Scotland
    As a base, I started from some documentation and prototype code in the lengthy Ubuntu wiki article, aufsRootFileSystemOnUsbFlash.  There’s some interesting sample code embedded in that page, licensed under the GPLv3 and attributed to Axel Heider, Sebastian P., and Nicholas A. Schembri.  I’ve never met these individuals but I appreciate their efforts and I’m delighted to embrace, extend, package, and publish their work directly in the Ubuntu distribution!  I had started writing something similar from scratch before I found this.  The real magic for me was seeing the mount command called with the --move option.  Very cool!  That’s really what makes all of this work in the initramfs ;-)


    Shameless advertising.  She's gotta feed the monkey man.
    A special thanks to my employer, Gazzang, who have enabled me to work on this open source project, as it lines up well with our broad interests in securing information in the Cloud and Big Data deployments.  In particular, Gazzang’s zTrustee commercial key management service can be used to protect, secure, backup, revoke, and retrieve the keys that make this whole procedure work.  Contact us if you’re interested in that.

    And an even bigger thanks to Scott Moser for working right there with me on this somewhat ad hoc project.  Scott and I had worked together for years, across IBM and Canonical.  It’s been a lot of fun reconnecting with him, hacking on such an interesting, innovative, and broadly useful feature!

    Overlayfs

    Overlayfs is a successor of a series of union filesystem implementations, such as unionfs, aufs, etc.  It creates a single unified view of two directories.  One is the "lower" filesystem -- in our case, this a read-only mount of our original, pristine Ubuntu AMI.  The second is the "upper" filesystem -- in our case, this is a read-write mount of an encrypted block device.  We’re hopeful that it might one day make it upstream into Linux, though progress on that seems to have stalled.  Thankfully, Ubuntu’s kernels are carrying it for now.

    dmcrypt

    dmcrypt is a block device encryption scheme (different from eCryptfs, which implements a per-file encryption scheme).  Many of my blog readers know that I'm one of the authors and maintainers of eCryptfs, as well as one of its biggest fans :-)  That said, I recognize that dmcrypt solves a different problem, and I chose dmcrypt for this solution.  In this case, we'd like to encrypt the entire block device, ensuring that all reads and writes are happening to and from encrypted storage.  dmcrypt uses cryptsetup and LUKS to configure, format, and open a block device for encryption.  A virtual block device is "mapped" and presented in /dev/mapper/, and this is what will be mounted with read-write capability onto the overlayfs upper filesystem.

    You can read more at this excellent, detailed blog post about dmcrypt.

    A couple of ways dmcrypt can be used
    Installation

    As of August 1, 2012, the overlayroot package is now in the default Ubuntu cloud daily images for quantal, which will debut in the Ubuntu 12.10 Beta1 milestone.

    For older releases of Ubuntu, we’re also automatically publishing backports of the package to the PPA at ppa:cloud-initramfs-tools/ppa.  I’ve verified the backported functionality on at least Ubuntu 12.04 LTS (Precise).  You can add the PPA and install using:

    sudo apt-add-repository ppa:cloud-initramfs-tools/ppa
    sudo apt-get install -y overlayroot

    Configuration

    Once the package is installed, you need to edit the configuration file at /etc/overlayroot.conf.  The syntax, modes, and capabilities are thoroughly described in the inline documentation in the file itself, but I’ll briefly recap here...

    The file should define a single variable, overlayroot, which, by default is empty (meaning that the functionality is disabled).  Alternatively, you can also pass the same variable and string definition on the kernel command line.  The functionality actually works on physical hardware just as well as cloud instances.  There are basically 3 different modes that overlayroot supports:
    1. The backing device is tmpfs
      • eg, overlayroot=”tmpfs”
      • changes on top of the root filesystem are stored in a temporary filesystem in memory
      • all changes are discarded on reboot
      • the amount of available disk space is limited to ½ of the total memory available on the system
      • the tmpfs is mounted onto /media/root-rw
    2. The backing device is block storage
      • eg, overlayroot=”dev=/dev/xvdb”
      • changes on top of the root filesystem are stored on a separate block device, typically /dev/xvdb in AWS
      • these changes are not encrypted by default
      • changes are preserved across reboots
      • the system root filesystem will have as much storage as available on the block storage device
      • the block device is mounted onto /media/root-rw
    3. The backing device is encrypted
      • eg, overlayroot=”crypt:dev=/dev/xvdb”
      • changes on top of the root filesystem are stored in an encrypted block device
      • typically, /dev/xvdb is the backing device, which is mapped and encrypted using cryptsetup and LUKS
      • /dev/mapper/secure is the encrypted block device
      • /dev/mapper/secure is dmcrypt mounted onto /media/root-rw
      • reboots are unfortunately not supported by default, unless you hardcode the dmcrypt mount passphrase in the initramfs
        • See below for more information about that

    All three of these have the tremendous advantage of keeping your root disk pristine and your changes on a separate partition, for auditability and rollback purposes.  Also, there’s a really nice side effect of all of this -- whereas typically you have a root volume with only a couple of GB of space, if your attached EBS storage backing your overlayroot is 100GB, your root partition now has 100GB of available space.  This is far more useful to me than having to symlink and dance around all of that juicy storage inconveniently located over in /mnt!

    There are several other options documented inline in the configuration file.  I’ll leave those for future posts and as an exercise for the reader ;-)  The three simple configuration options in bold above, however, should get you up and running in AWS and OpenStack Clouds trivially with Ubuntu’s 12.10 (Quantal) images though.

    Verification

    You might want to verify that the encryption is operating as expected.  You should see a mount table that looks like this:

    ubuntu@ip-10-140-27-223:~$ mount
    overlayroot on / type overlayfs (rw,lowerdir=/media/root-ro/,upperdir=/media/root-rw/overlay)
    proc on /proc type proc (rw,noexec,nosuid,nodev)
    sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
    udev on /dev type devtmpfs (rw,mode=0755)
    devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
    tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
    /dev/xvda1 on /media/root-ro type ext4 (ro)
    tmpfs-root on /media/root-rw type tmpfs (rw,relatime)
    none on /sys/fs/fuse/connections type fusectl (rw)
    none on /sys/kernel/debug type debugfs (rw)
    none on /sys/kernel/security type securityfs (rw)
    none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
    none on /run/shm type tmpfs (rw,nosuid,nodev)
    /dev/xvdb on /media/root-ro/mnt type ext3 (ro)
    /media/root-ro/mnt on /mnt type overlayfs (rw,lowerdir=/media/root-ro/mnt,upperdir=/media/root-rw/overlay/mnt)


    Note that your original root volume is mounted on /media/root-ro:

    /dev/xvda1 on /media/root-ro type ext4 (ro)

    And your root disk is, in fact, an overlayfs on top of /media/root-ro/ and /media/root-rw/overlay.

    overlayroot on / type overlayfs (rw,lowerdir=/media/root-ro/,upperdir=/media/root-rw/overlay)



    If you’re using tmpfs, you should see:

    tmpfs-root on /media/root-rw type tmpfs (rw,relatime)



    If you’re using unencrypted block storage, you should see:

    /dev/xvdb on /media/root-rw type ext4 (rw,relatime,data=ordered)


    And if you’re using encrypted block storage, you should see:

    /dev/mapper/secure on /media/root-rw type ext4 (rw,relatime,data=ordered)


    To verify that your dmcrypt setup of your block storage, you can run:

    $ sudo cryptsetup luksDump /dev/xvdb
    LUKS header information for /dev/xvdb

    Version:        1
    Cipher name:    aes
    Cipher mode:    cbc-essiv:sha256
    Hash spec:      sha1
    Payload offset: 4096
    MK bits:        256
    MK digest:      bc be 66 cd 87 3d 33 6c 2c 99 72 00 b6 d2 be b0 69 6a 76 39
    MK salt:        42 14 8d da 89 15 e4 66 a0 84 6e 6a 7f bc 3e 34
                   da a7 44 3d 6b 80 bb 6f 5b 44 77 4a 9c b0 91 4c
    MK iterations:  27625
    UUID:           245e08b9-4458-4e37-a029-68ebd5f65eca

    Key Slot 0: ENABLED
           Iterations:             110583
           Salt:                   a8 fe 2a ec cd 0c 23 7d 13 ef 91 aa 05 88 24 9b
                                   2e a1 cd 54 54 96 0d 0b ce 88 aa cc 72 a8 d4 63
           Key material offset:    8
           AF stripes:             4000
    Key Slot 1: DISABLED
    Key Slot 2: DISABLED
    Key Slot 3: DISABLED
    Key Slot 4: DISABLED
    Key Slot 5: DISABLED
    Key Slot 6: DISABLED
    Key Slot 7: DISABLED

    And if you’re really paranoid, you can run `sudo strings /dev/xvdb` on the block device and look for anything recognizable as clear text data ;-)

    $ sudo strings /dev/xvdb
    LUKS cbc-essiv:sha256 sha1 \9P< kl2d44f1f1-ccff-4fc7-857c-35519faa9158 Ow\rn {(\T UK9Y> 8Ybk < 'Yb* $i"/5X BIS8
    ...
    The Matrix

    Encryption from Instance Genesis

    So the above example requires a first boot, some configuration, and then a reboot, which means that some of the data generated on first boot (like the host ssh keys) actually landed on the read-only root, before overlayroot was able to do its work.

    Michelangelo's Creation of Adam, a Genesis event 
    Well, ideally we’d capture all of those changes in the overlay as well.  Doing that is a little harder than it sounds.  We need to be able to feed that configuration information into a brand new instance at boot time, during the initramfs.  We can’t use user-data, cloud-init, or the metadata service, because networking is not yet up and running in the instance.

    Scott Moser came up with a rather inspired workaround :-)  He created a “minimal”, 1GB snapshot of a volume that has a filesystem on it, and a very basic overlayroot.conf configuration file.  If such a volume is attached to an instance, and has a volume label of OROOTCFG, the the overlayroot hooks in the initramfs will load and use the overlayroot.conf on that device.  You can simply create a new instance, attaching a volume based on this snapshot!  From the command line, you’d need something like this:

    euca-run-instances --key=ec2-keypair --instance-type=m1.small  --block-device-mapping=/dev/sdc=snap-67d20b17:10:0 ami-e300a88a

    The “:10” says that your volume should be 10GB.  And note that the “:0” denotes whether to delete the volume on termination or not (0=false/do-not-delete, 1=true/delete).
    Note that that snapshot was a test we were using and has since been deleted. I'm hoping Gazzang will be willing to host a couple of these snapshots for general usage. I'll update this post when and if that happens!

    And from the GUI, it looks like this:

    AWS Console, launching an image with an EBS snapshot attached


    Entropy and Keys

    Using mode 3 as described above, you can easily ensure that any runtime data and changes to the stock system image are encrypted before being written to disk.  This begs the question, of course, around the keys used to handle this encryption.  I’ll try to explain the process, complexity, and potential solutions now.

    Encryption of any kind -- be it disk encryption, file encryption, network transport layer encryption, etc. -- always requires high quality keys.  Generating high quality keys requires high quality entropy.  And high quality encryption keys always require “key management”.  I have more information on that in another post, including talks I’m giving on that topic to two conferences in August 2012.


    Keys in the Cloud -- not an easy problem to solve

    A user of overlayroot with encryption can, at their option, specify the dmcrypt mount passphrase itself in the overlayroot.conf file, using the pass=$PASS option.  This has two distinct advantages, and one significant disadvantage.  The advantages are that the user can control the quality/length of the passphrase, and that this passphrase can be used mount the encrypted volume repeatedly, at every boot, ensuring that the persistent encrypted data is available each time the system reboots.  The disadvantage is that the passphrase is plainly visible inside of the initramfs or in your /boot on the base partition, rendering your encryption defenseless if your attacker has access to both your read-only root volume and your encrypted root volume.  For some use cases, this might be a reasonable trade-off, though for others, it may not.

    In the default case, however, where the pass=$PASS option is not specified, the encryption mount passphrase is automatically generated at boot time.  This presents its own set of challenges for the security-minded, as computers have painfully little entropy at initial boot time, as shown in this USENIX paper.  The problem is exacerbated in cloud computing environments, where these “computers” are actually virtual machines, which, by design, have been cloned to appear quite similarly to one another.

    Knowing that this is an issue causing some consternation for security professionals, I’ll describe in detail here how our overlayroot encryption passphrase is generated in the initramfs, and invite your review, feedback, and suggestions.

    Key Generation Design

    An automatically generated overlayroot dmcrypt passphrase is a sha512sum string, which consists of 128 characters of hex [0-9a-f].  Since we’re using cryptsetup with LUKS, this forms a 512-bit wrapping passphrase.  LUKS and cryptsetup will then generate another random 256-bit volume master key.  The wrapping key can be changed easily without re-encrypting all of the data on the device, whereas the volume key cannot.  A compromise of either of these keys will compromise your encrypted data.

    Random bloke cutting keys
    To generate this key, we need to provide the sha512sum utility (which we’ve added to the initramfs) at least 512-bits of random data.  Our current implementation does the following:


    1. We perform a best-effort seeding of /dev/urandom with some psuedo-random data, from two places, if available.  First, we use /random-seed, if found in the initramfs itself.  And second, we use /var/lib/urandom/random-seed, if found in the root image.  The initramfs /random-seed file is created by our overlayroot’s initramfs hooks script, which tries to read up to 4096 bytes of data from /dev/random (true random data).  The root filesystem’s /var/lib/urandom/random-seed file is populated on each system reboot, to carry some randomness across boots.  Neither of these are definitive, but just add some spice to the pool of psuedo-random data multiplexed at /dev/urandom.
    2. Then we seed our input data with the output of `stat -L /dev/* /proc/* /sys/*` in the initramfs.  This adds entropy to our input due to the mostly unpredictable combinations of timestamps associated with the Access/Modify/Change times of each file, directory, and block device in the top level of these kernel filesystems.
    3. Next, we concatenate data from each of 3 places:
      • /proc/sys/kernel/random/boot_id
        • a 16-byte uuid, psuedo-random, “unique” per boot
      • /proc/sys/kernel/random/uuid
        • a 16-byte uuid, psuedo-random, “unique” every time it’s read
      • /dev/urandom
        • 4096-bytes of peusdo-randomness, which is jumbled by our seeding as described above
    4. Finally, we read and concatenate the first 4096 bytes from the specified backing block device
      • /dev/xvdb (presumably, in AWS)
        • Here, a sophisticated or paranoid user could write up to 4KB of their own quality random data to the block device from the same or even a different instance
        • But by default, a fresh block device from Amazon contains a filesystem, which actually has its own somewhat unique metadata written to the head of the block device
    5. Once we’ve gathered all of that data, we calculate the sha512sum of the total, and write it to a root-owned file stored in a tmpfs.  Once booted, the root user can recover this value from /dev/.initramfs/overlayroot.XXXXXX.  This string is used by overlayroot and fed to cryptsetup’s --key-file parameter.
    6. Note that cryptsetup also generates the master volume key.  We are using the default cryptsetup option, which generates a psuedo-random key from /dev/urandom.  We have tried to help that along by seeding /dev/urandom with some additional input.
      • Note that we tried using the cryptsetup --use-random option, but our test instances hung indefinitely, blocking on /dev/random.  Sorry.

    In Conclusion

    If you're still with me here, you should really be proud of yourself :-) This is a bit of a heavy topic, but oh-so-important! We're entrusting more and more of our digital lives, corporate information, trade secrets, cyber property, and critical data to remote storage scatter about the globe.

    How can you possibly know who has physical access to those bits? The sobering reality is that, frankly...you can't.

    I continue to advocate in favor of intelligent cryptographic solutions to this problem, and with overlayroot, you now have a very powerful tool at your disposal, to easily encrypt your information in Ubuntu cloud instances. This covers automatically generated system exhaust spewed to /var, as well as your precious configuration files, user data, and service information.

    :-Dustin