Wednesday, November 14, 2012

Johnny Football sure looks familiar...


Who exactly is Desmond blocking here?  Because he's holding the ball in the wrong hand to strike a Hesiman pose...


Certainly not Alabama cornerback Deion Belue.


Johnny Manziel for the Heisman!

:-Dustin

Wednesday, November 7, 2012

Fascinating, Unique, Memorable Authentication Strings



I was asked a very interesting question by a reporter earlier this week.  To paraphrase, I was asked for "better ways" a website might secure information, rather than a password.

Here's an article I've written in the past on the topic, as to how I manage my own passwords.  I still use a long, randomly generated password for each and every account (200+ and counting), to this day, but honestly, great passwords are unfortunately impossible to remember.

It's absolutely ABOMINABLE and should be ILLEGAL when sites try to identify you or recover your password by using some marginally public information.

  • Which of the following phone numbers have you been associated with in the past?
  • Which of these addresses have you used in the past?
  • What's the name of the street you grew up on?
  • What's your mother's maiden name?
  • What's your high school mascot?
All of those are trivial to discover about a person.  Try it on someone you sort of know -- a friend or colleague.  I bet you could socially engineer your way through 4 or 5 of those in a matter of minutes.

Fortunately, there's a much better approach.  Unfortunately, very few people sites actually use it.

The best such sites actually enable you to choose both your security question/hint/challenge, and the answer/response.

Now, selecting a great question/hint/challenge is a bit of an art, but here's an excellent strategy...

Given a short sentence fragment consisting of pronouns, each and every human mind can make some fascinating, unique, and most importantly, memorable, connections.  The more pronouns, the better.  Pronouns are basically variables, with distinct but difficult-to-guessable values.  I'm sure you've played a Mad Lib game before as a kid, right?
Here's a simple example, to introduce the concept:
  • Challenge: He looked at her
  • Response: BogartBergman
The question is a reference to the line in Casablanca, "Here's lookin' at you kid".  In that quote, Rick (Humphrey Bogart) toasts Lisa (Ingrid Bergman).  That question will jog my memory and I'll remember the rest.  Others probably won't make that connection.  Pronouns are like programming variables.  I happen to have their values in memory, but others won't.  Out of context, it makes no sense whatsoever.  Just say it outloud, "He looked at her."

The more pronouns you use the better.  Here's another example:
  • Challenge: He traversed it for his mother
  • Response: CaesarRubiconAurelia
If classic movies and classic Rome aren't in your wheelhouse, use something more personal.  Maybe your Dad took your Mom on a nice vacation...
  • Challenge: He took her here for this
  • Response: JimDianeBaliAnniversary
Almost anything sufficiently ambiguous would work...
  • Challenge: Best that ever was
  • Response: BrettFavre4
Pose that same question to a few thousand people and you'll get anything from MuhammadAli to TyrannosaurusRex to SharkWeek1987 or billions of other responses.  But ask the same person that question, and they'll come up with a memorable response.  In this case, it's almost like a hash or HMAC.

The reason that this works is that these challenge/responses are subjective, rather than objective and discoverable facts, like your Mom's middle name.

Hopefully you're starting to get the idea :-)

Use longer challenges, with more pronouns, for higher quality, more entropy in your responses!  Perhaps you can post your own suggestions in the comments below...

I'm actually working on an automatic challenge creator, that you'll soon be able to use to generate your own challenges, and derive your own response.

:-Dustin

Tuesday, October 30, 2012

Seed /dev/urandom through Metadata (EC2, OpenStack, Eucalyptus)


If you attended my talk about Entropy at the OpenStack Summit in San Diego earlier this month, or you read my post and slides here on my blog, you noticed that I had a few suggestions as to how we might improve entropy in Linux cloud instances and virtual machines.

There's one very easy approach that you can handle entirely on your end, when launching an instance, if you use Ubuntu's cloud-init utility, which consumes the user-data field from the metadata service.

You simply need to use ec2-run-instances or euca-run-instances with the --user-data-file option.

Cloud-init supports a directive called write_files.  Here, you can specify a path, ownerships, permissions, encoding, and content of a given file, which cloud-init will write a boot time.  Leveraging, this, you can simply "create" (actually, just append to) the psuedo random device that the Linux kernel provides at /dev/urandom, with is owned by root:root and permissioned rw-rw-rw-.  The stanza should look like this:

write_files:
-   encoding: b64
    content: $SEED
    owner: root:root
    path: /dev/urandom
    perms: '0666'


Now, you'll need to generate this using a script on your end, and populate the $SEED variable.  To do that, simply use this on your host system where you launch your cloud instance:

SEED="$(head -c 512 /dev/urandom | base64 -w 0)"

This command will read 512 bytes from your locale system's /dev/urandom and base64 encode it without wrapping lines.  You could, alternatively, read from your local system's /dev/random if you have enough time and entropy.

Using the recipe above, you can ensure that your instance has at least some bit (actually, 4096 bits) of randomness that was collected outside of your cloud provider's environment.

I'm representing Gazzang this week at the Ubuntu Developer Summit this week in Copenhagen, Denmark pushing for better security, entropy, and key management inside of Ubuntu's cloud images.

Cheers,
:-Dustin

Friday, October 19, 2012

Encrypt Everything Everywhere (in OpenStack)


Here are the slides from the first of my two presentations on behalf of Gazzang today at the OpenStack Summit in San Diego, CA.

In this presentation, we started out examining the landscape of security within cloud computing, what's changed, and why encryption is essential to the integrity of the industry.  We talked about the types of sensitive data that need protection, and looked at some astounding statistics about data breaches in the past 3 years that could have been easily thwarted with comprehensive encryption.

We then discussed the several layers where encryption is essential:

  1. Network layer
  2. Filesystem layer
  3. Block storage layer
Within the network layer, SSH seems to be well understood and practiced, though we did talk a little about some tools that can help with SSH public key management, namely ssh-copy-id, ssh-import-id, and DNS records for SSHFP.  We also talked about TLS and SSL, the fact that many applications and services support SSL, but that it's rarely configured or enabled (even in OpenStack!).  PKI and key management tend to be the hard part here...

At the block storage layer, we discussed dmcrypt, and how it can be used to efficient protect entire block devices.  We discussed several places within OpenStack where dmcrypt could and should be used.  We also discussed some of the shortcomings or limitations of dmcrypt (single key for the entire volume, hard to incrementally backup, all-or-nothing encryption).

I then introduced overlayroot to the OpenStack crowd, as convenient way of all local changes and data within an OpenStack guest.

At the filesystem layer, we discussed per-file encryption with eCryptfs, as well as Gazzang's commercially supported distribution of eCryptfs called zNcrypt.  We compared and contrasted eCryptfs with dmcrypt, and discussed the most appropriate places to use each within OpenStack.

Finally we touched on the piece of technology required to bring all of this together -- key management. To actually secure any encrypted data, you need to safely and securely store the keys necessary to access the data somewhere other than on the same systems having the encrypted data.  We talked a little about OpenStack Keystone, and how it does and doesn't solve this problem.  We also introduced Gazzang zTrustee, which is our commercially supported key management solution that Gazzang offers as a turnkey solution in this space.

Enjoy!



:-Dustin

Thursday, October 18, 2012

Entropy, or Lack Thereof, in OpenStack Instances (and how to improve that)


I gave two presentations today at the OpenStack Design Summit in sunny San Diego, CA, as we prepare for the Grizzly development cycle.

In this presentation, I spent about 40 minutes discussing several research papers over the last 6 years showing the problems with entropy and randomness in cloud computing.  Namely:

  1. The Analysis of the Linux Random Number Generator (2006)
  2. The iSEC Partners Presentation at BlackHat (2009)
  3. Minding your P's and Q's (2012)

There's two pieces of the entropy problem in OpenStack and cloud computing that I'm interested in helping improve:

  1. Better initial seeds for the psuedo random number generator at instance initialization
  2. Better ongoing entropy gathering throughout the lifetime of the instance.
To the first point (better seeds), I suggested a series of technologies that could significantly improve the situation in OpenStack in the near term:
  1. The hypervisor could provide a random seed through a block device to the guest
  2. The hypervisor could expose a urandom device through the metadata service
    • Actually, I'm sitting next to Scott Moser right now, who attended my talk earlier today and merely hours after my talk, he has already hacked this into the OpenStack metadata service :-)  His merge proposal is here.  This is why I love open source software...
  3. The user can pass their own locally generated seed to the instance through cloud-init and the userdata
  4. Additional seed data can be assembled through the aNerd protocol
    • There's lots more to say about this one...I'll have another post on this soon!
As for improving the ongoing entropy gathering...
  1. Eventually, a new wave of cloud servers with modern CPUs will have Intel's DRNG feature and leverage the new rdrand instruction
    • Unfortunately, we're probably a little ways off from that being widely available
    • Colin King has benchmarked it -- really impressive performance!
  2. KVM's new virtio-rng driver is pretty cool too, allowing a server to pass through access to a hardware random number generator
  3. HAVEGE simply rocks, and should be installed in every cloud instance
  4. Gazzang's zTrustee encryption key manager also supports a secure, authenticated entropy service (as a commercial offering from my employer)

Enjoy!


:-Dustin

Thursday, September 20, 2012

Three of my favorite things...

...are space, Texas, and data security, and they're all right here!

The Endeavor Space Shuttle passed over the Gazzang world headquarters in Austin this morning.  The Statesman has the full article.  This image is pretty spectacular without my grafitti (but I couldn't resist)!


:-Dustin

Friday, September 14, 2012

Gazzang Secures Cloudera Distribution of Hadoop, MongoDB

Earlier this week, Gazzang and Cloudera publicly announced an official partnership, providing the Big Data industry with a first-of-its-kind offering ... a commercially supported encrypted Hadoop distribution!  Check out that spiffy Gazzang logo right there in the mix with our esteemed friends at Ubuntu and PuppetLabs!



We have certified Gazzang's eCryptfs-based transparent filesystem encryption solution, zNcrypt, running under the Cloudera Distribution of Hadoop (CDH), protecting all of the data at rest.

We at Gazzang are really quite proud of this milestone and our unique opportunity to really help secure enterprises leveraging Big Data.

It quite nicely aligns with our partnership with 10gen as well, where we have also certified zNcrypt protecting MongoDB NOSQL data as well.  Once again, we feel we're in quite good company with our colleagues at Canonical and Red Hat, working with 10gen.



Here's to building an ecosystem around security and privacy of this next generation of applications leveraging NOSQL and Big Data!

:-Dustin

Wednesday, September 5, 2012

A Food and Drink Guide to Nawlins

Knowing that I grew up in Louisiana, a colleague asked me recently for some restaurant recommendations, as she's heading to New Orleans soon.  I sent her this email privately, but realized in retrospect that I've been asked this question before, from time to time.

Also, how long overdue are we for an Ubuntu Developer Summit in New Orleans?  :-)

Breakfast

 For that New Orleans ambiance, it's hard to beat Cafe du Monde, for chicory cafe au lait and beignets.  I'm definitely more of an espresso-based coffee drinker, but I make an occasional exception for a little chicory cafe.  And the beignets are to die for.  If you're more in the mood for a classic egg/toast/bacon/sausage breakfast, you can venture a bit further afield to the Camelia Grill (expect to wait a very long time, though, as the line is typically wrapped around the block).

Brunch

You'd be hard pressed to find a more ridiculously overwhelming all-you-can eat pile of decadent deliciousness than the jazz brunch at the Court of Two Sisters.  It's quite possibly Kim's favorite thing to do in the entire world.  It's a little pricey, maybe $25 per person. But it's all you can eat of the best Cajun sweets and savories.  It's in the courtyard of a 200+ year old house, with a little jazz band
that wanders around the courtyard serenading you at table side.  I'd call this a don't-miss dining experience in the world (much less New Orleans alone), if you're a true foodie at heart.

Lunch

There's a ton of good po-boy options all around the city -- they're pretty easy to sniff out.  Literally just follow your nose, smelling around for the best fried oysters or shrimp or catfish you can find, and you won't go wrong.  But I'd recommend seeking out a good muffaletta -- basically a giant Italian sandwich with olive tapenade and multiple cuts of meat.  My favorite is from Cafe Maspero.  That's a lot harder to find outside of New Orleans.  You can either order a half or get a whole one to split.  It's gigantic.

Siesta

I much prefer the mid-day crowd at Pat O'Brien's (compared to the craziness that ensues after dark).  Pat O'Brien was a bartender in the 1930's in New Orleans that supposedly invented the hurricane cocktail.  I'll actually have a refreshing mint julepespecially if it's even slightly warm outside in their lovely courtyard.

Dinner

So, so many dinner options...I dedicate an entire blog to it.  You can't go wrong with any of the Brennan family restaurants.  All are very high-end, with world re-known, expert chefs.  There's a few options in the French Quarter. My favorite is the Bourbon House, which in addition to having an amazing seafood and steak menu, also has a few hundred whiskey options and a fabulous cocktail bar.  There's also just the classic Brennan'swhich is outstanding too, as well as the Commander's Palace.  If you really do one of the nicer restaurants around town, maybe bring a dinner jacket and a nice dress-- a lot of these places are very French, very traditional, and very fancy, even if they're not terribly expensive.

Dessert

If you end up at either Brennan's or the Bourbon house, make sure you have a Banana's Foster.  It was actually invented at Brennan's in 1951.  I'm also a big fan of New Orleans style Bourbon bread pudding and pecan pie.  These are everywhere.

Night Cap

You can drink all the Huge Ass Beers and Handgrenades you want outside as you walk up and down Bourbon Street.  On the other hand, if you want to dip in for a couple of fancy cocktails, I'd recommend getting a Sazerac at the gorgeous Sazerac Bar (the world's first jazz nightclub) in the bottom of the world famous Roosevelt Hotel (now a Waldorf Astoria property). Another amazing, New Orleans original cocktail is the Ramos Gin Fizz at the Hotel Monteleone, which actually has a "carousel bar" -- if you're lucky enough to get a seat at the bar, you'll rotate 360 degrees around the bar tenders while they serve your fellow patrons.

:-Dustin

Wednesday, August 29, 2012

The Linux Foundation's Cloud Open: Security and Privacy in the Cloud

Howdy all!

I just delivered my presentation at the Linux Foundation's CloudOpen 2012 event, and I'm happy to share my slides below.  You can also download the PDF.


I must say that this conference is simply one of the absolute best conferences around.  This year co-located several events, including LinuxCon, Linux Plumbers, as well as the new CloudOpen conference.  I've spoken at each of these in the past, and always found the quality of the presentations, evening events, and hallway conversations as the absolute best in the industry.

:-Dustin

Monday, August 13, 2012

Data encryption -- Why? Some numbers...


At last weekend's Texas Linux Fest, at the end of my presentation, Data Security and Privacy in the Cloud, an attendee asked a great question.  I'll paraphrase...
So...  What's the actual threat model?  Why are you insisting that people encrypt their data in the cloud?  Where's the risk?  When might unencrypted data get compromised?  Who is accessing that data?
A couple of weeks ago, an article from ComputerWorld made the front page of Slashdot:

'Wall of Shame' exposes 21M medical record breaches New rules under the Health Information Technology for Economic and Clinical Health Act, By August 7, 2012 06:00 AM ET


Here's a few absolutely astounding numbers from that article, which were pulled from the US Department of Health and Human Services Health Information Privacy website by the author of that article.


Since the data is publicly available, I was able to download and import all of these into a spreadsheet and run some numbers and verify ComputerWorld's article.  I can confirm that the Mr. Mearian's numbers are quite accurate, and just as scary.  Since September 2009:

  • 21+ million people have had their health care records exposed
  • 480 breaches have been reported

The top 6 breaches all affected more than 1 million individuals:

  • 4.9 million records: TRICARE Management Activity, the US Department of Defense's health care program, exposed 4.9 million health care records when backup tapes went missing
  • 1.9 million records: Health Net lost 1.9 million records when backup hard drives went missing
  • 1.7 million records: New York City Health & Hospital's Corporation's North Bronx Health Care Network reported the theft of 1.7 million records
  • 1.22 million records: AvMed Health Plans reported the loss of a laptop with 1.22 million patient records
  • 1.02 million records: Blue Cross Blue Shield of Tennessee exposed 1.02 million records with the loss of an external hard drive
  • 1.05 million records: Nemours Foundation (runs children's hospitals) lost 1.05 million records with missing backup tapes


Such breaches are very costly, too.

  • $4.3 million: Cignet Health of Prince George's County civil lawsuit penalty
  • $1.5 million: Blue Cross Blue Shield of Tennessee penalties
    • have since encrypted all of their hard drives, 885TB of data
  • $1.7 million: Alaska Department of Health penalty
    • due to theft of a thumb drive, stolen from an employee's car


Running a few more reports on the public CSV data,

  • Across 480 reported breaches, these were the top reasons given for the incident:
    • 55%: Theft of devices or physical media
      • 26%: Hacking/Unauthorized access
      • 12%: Lost devices, disks, tapes, drives, media
      • 5%: Improper disposal of devices
      • 3%: Other

    The most disappointing part, to me, is that 72% of those breaches stemming from theft, lost devices, and improper disposal -- a total of 15.6 million individual's health records. This means that the vast majority of these compromises are easily preventable, through the use of comprehensive data encryption. And I'd argue that many of the remaining 28% of the breaches attributed to hacking, unauthorized access, and other disclosures could also be thwarted, slowed, or deterred by coupling encryption with advanced key management, access controls, and regular auditing.

    So here I am, writing the same thing I've been writing in this blog for 4 years now...
    1. Encrypt your data.
    2. Help your colleagues, friends, and families encrypt their data.
    3. Insist that your employers institute thorough security policies around encryption.
    4. Ask hard questions of your health care providers and financial services professionals, about the privacy of the data of yours they have. Hold them accountable.
    There's a wide range of tools available, from free/open source, to paid commercial offerings. On the free/open source side, I'm a proponent, author, and maintainer of both eCryptfs and overlayroot (which uses dmcrypt). These can help protect your home directory and your private data in cloud instances.


    And from the commercial side, my employer, Gazzang, sells an enterprise-class encryption product called zNcrypt, and I've architected Gazzang's cloud-compatible key management system, zTrustee. I have no doubt that the combination of these two technologies -- comprehensive data encryption and a robust key management solution -- could have prevented the compromise of millions of these records.

    :-Dustin

    Monday, August 6, 2012

    ecryptfs-utils-100 released

    Most of the original IBM LTC Security Team that designed and implemented eCryptfs, 2005-2008, along with a couple of Gazzangers who have also contributed to eCryptfs.  Gazzang hosted a small reception on Thursday, August 2, 2012.
    I'm pleased to announce the 100th release of the ecryptfs-utils userspace package!

    Somewhat unusually, eCryptfs userspace packages simply increment a single integral revision number, rather than a major or minor revision.  That project maintenance decision predates my involvement as project maintainer.  But it seems to work, at least for me :-)  I started maintaining the eCryptfs project and package at release 50, so this marks roughly my 50th release too.

    Grepping through the changelog, I counted 157 bugs fixed over the last 6 years.  I really like to recognize the contributors who have helped bring a stable and reliable eCryptfs to you:

    Apologies if I missed you...let me know and I'll add you in there ;-)


    Changelog follows.   Here's to another 100!

    ecryptfs-utils (100) precise; urgency=low
    
      [ Tyler Hicks ]
      * src/pam_ecryptfs/pam_ecryptfs.c, src/libecryptfs/key_management.c:
          LP: #1024476
        - fix regression introduced in ecryptfs-utils-99 when Encrypted
          Home/Private is in use and the eCryptfs kernel code is compiled as a
          module
        - drop check for kernel filename encryption support in pam_ecryptfs, as
          appropriate privileges to load the eCryptfs kernel module may not be
          available and filename encryption has been supported since 2.6.29
        - always add filename encryption key to the kernel keyring from pam_mount
    
      [ Colin King ]
      * tests/kernel/inode-race-stat/test.c:
        - limit number of forks based on fd limits
      * tests/kernel/enospc.sh, tests/kernel/enospc/test.c,
        tests/kernel/Makefile.am, tests/kernel/tests.rc:
        - add test case for ENOSPC
    
      [ Tim Harder ]
      * m4/ac_python_devel.m4: LP: #1029217
        - proplery save and restore CPPFLAGS and LIBS when python support is
          enabled
    
     -- Dustin Kirkland Thu, 02 Aug 2012 16:33:22 -0500
    

    Cheers,
    Dustin

    Saturday, August 4, 2012

    Texas Linux Fest 2012 Presentation

    As promised, here are the slides from my Texas Linux Fest 2012 presentation:







    Enjoy!
    :-Dustin

    Wednesday, August 1, 2012

    Introducing overlayroot -- overlayfs + dmcrypt!

    A beautiful live oak tree with exposed, overlaying roots, near my house in Austin, Texas
    I'm thrilled to introduce what I hope will be an unexpected, innovative, and useful surprise for Ubuntu 12.10!  Along with Scott Moser, I've been hard at work on a new binary package called overlayroot, which is part of the cloud-initramfs-tools source package and project.

    Background

    In a hallway conversation at UDS-Quantal in Oakland, CA in May 2012, I briefly described a concept to Kees Cook, Tyler Hicks, and Scott Moser...



    I look at Ubuntu Cloud AMI's much like Ubuntu Desktop LiveISO's...  The root filesystem of every instance that launches is basically "the same".  What's unique about the system -- the system's actual DNA -- all of that is generated after that first boot.
    Deoxyribonucleic acid.  Or so we're told.
    Log files, host-specific configuration files, variable state information, user home directories, 3rd party software, temporary files, and so on...  Ideally, all of that information would be encrypted before landing on a random disk in a cloud data center!

    About two weeks ago, I started working on implementing this, using two awesome Linux kernel technologies:


    1. overlayfs
      • noting that overlayfs is not yet upstream in the Linux kernel, though Ubuntu carries the patch set (thanks to Andy Whitcroft for his maintenance and upstream communications)
    2. dmcrypt
      • which has been upstream in the Linux kernel for quite some time


    A medieval crypt we visited outside of the Glasgow Cathedral in Scotland
    As a base, I started from some documentation and prototype code in the lengthy Ubuntu wiki article, aufsRootFileSystemOnUsbFlash.  There’s some interesting sample code embedded in that page, licensed under the GPLv3 and attributed to Axel Heider, Sebastian P., and Nicholas A. Schembri.  I’ve never met these individuals but I appreciate their efforts and I’m delighted to embrace, extend, package, and publish their work directly in the Ubuntu distribution!  I had started writing something similar from scratch before I found this.  The real magic for me was seeing the mount command called with the --move option.  Very cool!  That’s really what makes all of this work in the initramfs ;-)


    Shameless advertising.  She's gotta feed the monkey man.
    A special thanks to my employer, Gazzang, who have enabled me to work on this open source project, as it lines up well with our broad interests in securing information in the Cloud and Big Data deployments.  In particular, Gazzang’s zTrustee commercial key management service can be used to protect, secure, backup, revoke, and retrieve the keys that make this whole procedure work.  Contact us if you’re interested in that.

    And an even bigger thanks to Scott Moser for working right there with me on this somewhat ad hoc project.  Scott and I had worked together for years, across IBM and Canonical.  It’s been a lot of fun reconnecting with him, hacking on such an interesting, innovative, and broadly useful feature!

    Overlayfs

    Overlayfs is a successor of a series of union filesystem implementations, such as unionfs, aufs, etc.  It creates a single unified view of two directories.  One is the "lower" filesystem -- in our case, this a read-only mount of our original, pristine Ubuntu AMI.  The second is the "upper" filesystem -- in our case, this is a read-write mount of an encrypted block device.  We’re hopeful that it might one day make it upstream into Linux, though progress on that seems to have stalled.  Thankfully, Ubuntu’s kernels are carrying it for now.

    dmcrypt

    dmcrypt is a block device encryption scheme (different from eCryptfs, which implements a per-file encryption scheme).  Many of my blog readers know that I'm one of the authors and maintainers of eCryptfs, as well as one of its biggest fans :-)  That said, I recognize that dmcrypt solves a different problem, and I chose dmcrypt for this solution.  In this case, we'd like to encrypt the entire block device, ensuring that all reads and writes are happening to and from encrypted storage.  dmcrypt uses cryptsetup and LUKS to configure, format, and open a block device for encryption.  A virtual block device is "mapped" and presented in /dev/mapper/, and this is what will be mounted with read-write capability onto the overlayfs upper filesystem.

    You can read more at this excellent, detailed blog post about dmcrypt.

    A couple of ways dmcrypt can be used
    Installation

    As of August 1, 2012, the overlayroot package is now in the default Ubuntu cloud daily images for quantal, which will debut in the Ubuntu 12.10 Beta1 milestone.

    For older releases of Ubuntu, we’re also automatically publishing backports of the package to the PPA at ppa:cloud-initramfs-tools/ppa.  I’ve verified the backported functionality on at least Ubuntu 12.04 LTS (Precise).  You can add the PPA and install using:

    sudo apt-add-repository ppa:cloud-initramfs-tools/ppa
    sudo apt-get install -y overlayroot

    Configuration

    Once the package is installed, you need to edit the configuration file at /etc/overlayroot.conf.  The syntax, modes, and capabilities are thoroughly described in the inline documentation in the file itself, but I’ll briefly recap here...

    The file should define a single variable, overlayroot, which, by default is empty (meaning that the functionality is disabled).  Alternatively, you can also pass the same variable and string definition on the kernel command line.  The functionality actually works on physical hardware just as well as cloud instances.  There are basically 3 different modes that overlayroot supports:
    1. The backing device is tmpfs
      • eg, overlayroot=”tmpfs”
      • changes on top of the root filesystem are stored in a temporary filesystem in memory
      • all changes are discarded on reboot
      • the amount of available disk space is limited to ½ of the total memory available on the system
      • the tmpfs is mounted onto /media/root-rw
    2. The backing device is block storage
      • eg, overlayroot=”dev=/dev/xvdb”
      • changes on top of the root filesystem are stored on a separate block device, typically /dev/xvdb in AWS
      • these changes are not encrypted by default
      • changes are preserved across reboots
      • the system root filesystem will have as much storage as available on the block storage device
      • the block device is mounted onto /media/root-rw
    3. The backing device is encrypted
      • eg, overlayroot=”crypt:dev=/dev/xvdb”
      • changes on top of the root filesystem are stored in an encrypted block device
      • typically, /dev/xvdb is the backing device, which is mapped and encrypted using cryptsetup and LUKS
      • /dev/mapper/secure is the encrypted block device
      • /dev/mapper/secure is dmcrypt mounted onto /media/root-rw
      • reboots are unfortunately not supported by default, unless you hardcode the dmcrypt mount passphrase in the initramfs
        • See below for more information about that

    All three of these have the tremendous advantage of keeping your root disk pristine and your changes on a separate partition, for auditability and rollback purposes.  Also, there’s a really nice side effect of all of this -- whereas typically you have a root volume with only a couple of GB of space, if your attached EBS storage backing your overlayroot is 100GB, your root partition now has 100GB of available space.  This is far more useful to me than having to symlink and dance around all of that juicy storage inconveniently located over in /mnt!

    There are several other options documented inline in the configuration file.  I’ll leave those for future posts and as an exercise for the reader ;-)  The three simple configuration options in bold above, however, should get you up and running in AWS and OpenStack Clouds trivially with Ubuntu’s 12.10 (Quantal) images though.

    Verification

    You might want to verify that the encryption is operating as expected.  You should see a mount table that looks like this:

    ubuntu@ip-10-140-27-223:~$ mount
    overlayroot on / type overlayfs (rw,lowerdir=/media/root-ro/,upperdir=/media/root-rw/overlay)
    proc on /proc type proc (rw,noexec,nosuid,nodev)
    sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
    udev on /dev type devtmpfs (rw,mode=0755)
    devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
    tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
    /dev/xvda1 on /media/root-ro type ext4 (ro)
    tmpfs-root on /media/root-rw type tmpfs (rw,relatime)
    none on /sys/fs/fuse/connections type fusectl (rw)
    none on /sys/kernel/debug type debugfs (rw)
    none on /sys/kernel/security type securityfs (rw)
    none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
    none on /run/shm type tmpfs (rw,nosuid,nodev)
    /dev/xvdb on /media/root-ro/mnt type ext3 (ro)
    /media/root-ro/mnt on /mnt type overlayfs (rw,lowerdir=/media/root-ro/mnt,upperdir=/media/root-rw/overlay/mnt)


    Note that your original root volume is mounted on /media/root-ro:

    /dev/xvda1 on /media/root-ro type ext4 (ro)

    And your root disk is, in fact, an overlayfs on top of /media/root-ro/ and /media/root-rw/overlay.

    overlayroot on / type overlayfs (rw,lowerdir=/media/root-ro/,upperdir=/media/root-rw/overlay)



    If you’re using tmpfs, you should see:

    tmpfs-root on /media/root-rw type tmpfs (rw,relatime)



    If you’re using unencrypted block storage, you should see:

    /dev/xvdb on /media/root-rw type ext4 (rw,relatime,data=ordered)


    And if you’re using encrypted block storage, you should see:

    /dev/mapper/secure on /media/root-rw type ext4 (rw,relatime,data=ordered)


    To verify that your dmcrypt setup of your block storage, you can run:

    $ sudo cryptsetup luksDump /dev/xvdb
    LUKS header information for /dev/xvdb

    Version:        1
    Cipher name:    aes
    Cipher mode:    cbc-essiv:sha256
    Hash spec:      sha1
    Payload offset: 4096
    MK bits:        256
    MK digest:      bc be 66 cd 87 3d 33 6c 2c 99 72 00 b6 d2 be b0 69 6a 76 39
    MK salt:        42 14 8d da 89 15 e4 66 a0 84 6e 6a 7f bc 3e 34
                   da a7 44 3d 6b 80 bb 6f 5b 44 77 4a 9c b0 91 4c
    MK iterations:  27625
    UUID:           245e08b9-4458-4e37-a029-68ebd5f65eca

    Key Slot 0: ENABLED
           Iterations:             110583
           Salt:                   a8 fe 2a ec cd 0c 23 7d 13 ef 91 aa 05 88 24 9b
                                   2e a1 cd 54 54 96 0d 0b ce 88 aa cc 72 a8 d4 63
           Key material offset:    8
           AF stripes:             4000
    Key Slot 1: DISABLED
    Key Slot 2: DISABLED
    Key Slot 3: DISABLED
    Key Slot 4: DISABLED
    Key Slot 5: DISABLED
    Key Slot 6: DISABLED
    Key Slot 7: DISABLED

    And if you’re really paranoid, you can run `sudo strings /dev/xvdb` on the block device and look for anything recognizable as clear text data ;-)

    $ sudo strings /dev/xvdb
    LUKS cbc-essiv:sha256 sha1 \9P< kl2d44f1f1-ccff-4fc7-857c-35519faa9158 Ow\rn {(\T UK9Y> 8Ybk < 'Yb* $i"/5X BIS8
    ...
    The Matrix

    Encryption from Instance Genesis

    So the above example requires a first boot, some configuration, and then a reboot, which means that some of the data generated on first boot (like the host ssh keys) actually landed on the read-only root, before overlayroot was able to do its work.

    Michelangelo's Creation of Adam, a Genesis event 
    Well, ideally we’d capture all of those changes in the overlay as well.  Doing that is a little harder than it sounds.  We need to be able to feed that configuration information into a brand new instance at boot time, during the initramfs.  We can’t use user-data, cloud-init, or the metadata service, because networking is not yet up and running in the instance.

    Scott Moser came up with a rather inspired workaround :-)  He created a “minimal”, 1GB snapshot of a volume that has a filesystem on it, and a very basic overlayroot.conf configuration file.  If such a volume is attached to an instance, and has a volume label of OROOTCFG, the the overlayroot hooks in the initramfs will load and use the overlayroot.conf on that device.  You can simply create a new instance, attaching a volume based on this snapshot!  From the command line, you’d need something like this:

    euca-run-instances --key=ec2-keypair --instance-type=m1.small  --block-device-mapping=/dev/sdc=snap-67d20b17:10:0 ami-e300a88a

    The “:10” says that your volume should be 10GB.  And note that the “:0” denotes whether to delete the volume on termination or not (0=false/do-not-delete, 1=true/delete).
    Note that that snapshot was a test we were using and has since been deleted. I'm hoping Gazzang will be willing to host a couple of these snapshots for general usage. I'll update this post when and if that happens!

    And from the GUI, it looks like this:

    AWS Console, launching an image with an EBS snapshot attached


    Entropy and Keys

    Using mode 3 as described above, you can easily ensure that any runtime data and changes to the stock system image are encrypted before being written to disk.  This begs the question, of course, around the keys used to handle this encryption.  I’ll try to explain the process, complexity, and potential solutions now.

    Encryption of any kind -- be it disk encryption, file encryption, network transport layer encryption, etc. -- always requires high quality keys.  Generating high quality keys requires high quality entropy.  And high quality encryption keys always require “key management”.  I have more information on that in another post, including talks I’m giving on that topic to two conferences in August 2012.


    Keys in the Cloud -- not an easy problem to solve

    A user of overlayroot with encryption can, at their option, specify the dmcrypt mount passphrase itself in the overlayroot.conf file, using the pass=$PASS option.  This has two distinct advantages, and one significant disadvantage.  The advantages are that the user can control the quality/length of the passphrase, and that this passphrase can be used mount the encrypted volume repeatedly, at every boot, ensuring that the persistent encrypted data is available each time the system reboots.  The disadvantage is that the passphrase is plainly visible inside of the initramfs or in your /boot on the base partition, rendering your encryption defenseless if your attacker has access to both your read-only root volume and your encrypted root volume.  For some use cases, this might be a reasonable trade-off, though for others, it may not.

    In the default case, however, where the pass=$PASS option is not specified, the encryption mount passphrase is automatically generated at boot time.  This presents its own set of challenges for the security-minded, as computers have painfully little entropy at initial boot time, as shown in this USENIX paper.  The problem is exacerbated in cloud computing environments, where these “computers” are actually virtual machines, which, by design, have been cloned to appear quite similarly to one another.

    Knowing that this is an issue causing some consternation for security professionals, I’ll describe in detail here how our overlayroot encryption passphrase is generated in the initramfs, and invite your review, feedback, and suggestions.

    Key Generation Design

    An automatically generated overlayroot dmcrypt passphrase is a sha512sum string, which consists of 128 characters of hex [0-9a-f].  Since we’re using cryptsetup with LUKS, this forms a 512-bit wrapping passphrase.  LUKS and cryptsetup will then generate another random 256-bit volume master key.  The wrapping key can be changed easily without re-encrypting all of the data on the device, whereas the volume key cannot.  A compromise of either of these keys will compromise your encrypted data.

    Random bloke cutting keys
    To generate this key, we need to provide the sha512sum utility (which we’ve added to the initramfs) at least 512-bits of random data.  Our current implementation does the following:


    1. We perform a best-effort seeding of /dev/urandom with some psuedo-random data, from two places, if available.  First, we use /random-seed, if found in the initramfs itself.  And second, we use /var/lib/urandom/random-seed, if found in the root image.  The initramfs /random-seed file is created by our overlayroot’s initramfs hooks script, which tries to read up to 4096 bytes of data from /dev/random (true random data).  The root filesystem’s /var/lib/urandom/random-seed file is populated on each system reboot, to carry some randomness across boots.  Neither of these are definitive, but just add some spice to the pool of psuedo-random data multiplexed at /dev/urandom.
    2. Then we seed our input data with the output of `stat -L /dev/* /proc/* /sys/*` in the initramfs.  This adds entropy to our input due to the mostly unpredictable combinations of timestamps associated with the Access/Modify/Change times of each file, directory, and block device in the top level of these kernel filesystems.
    3. Next, we concatenate data from each of 3 places:
      • /proc/sys/kernel/random/boot_id
        • a 16-byte uuid, psuedo-random, “unique” per boot
      • /proc/sys/kernel/random/uuid
        • a 16-byte uuid, psuedo-random, “unique” every time it’s read
      • /dev/urandom
        • 4096-bytes of peusdo-randomness, which is jumbled by our seeding as described above
    4. Finally, we read and concatenate the first 4096 bytes from the specified backing block device
      • /dev/xvdb (presumably, in AWS)
        • Here, a sophisticated or paranoid user could write up to 4KB of their own quality random data to the block device from the same or even a different instance
        • But by default, a fresh block device from Amazon contains a filesystem, which actually has its own somewhat unique metadata written to the head of the block device
    5. Once we’ve gathered all of that data, we calculate the sha512sum of the total, and write it to a root-owned file stored in a tmpfs.  Once booted, the root user can recover this value from /dev/.initramfs/overlayroot.XXXXXX.  This string is used by overlayroot and fed to cryptsetup’s --key-file parameter.
    6. Note that cryptsetup also generates the master volume key.  We are using the default cryptsetup option, which generates a psuedo-random key from /dev/urandom.  We have tried to help that along by seeding /dev/urandom with some additional input.
      • Note that we tried using the cryptsetup --use-random option, but our test instances hung indefinitely, blocking on /dev/random.  Sorry.

    In Conclusion

    If you're still with me here, you should really be proud of yourself :-) This is a bit of a heavy topic, but oh-so-important! We're entrusting more and more of our digital lives, corporate information, trade secrets, cyber property, and critical data to remote storage scatter about the globe.

    How can you possibly know who has physical access to those bits? The sobering reality is that, frankly...you can't.

    I continue to advocate in favor of intelligent cryptographic solutions to this problem, and with overlayroot, you now have a very powerful tool at your disposal, to easily encrypt your information in Ubuntu cloud instances. This covers automatically generated system exhaust spewed to /var, as well as your precious configuration files, user data, and service information.

    :-Dustin