Friday, January 23, 2009

Daemon Challenge 3: We have a winner!

I'm pleased to announce that Robert Escriva is efficient, and the winner of Daemon Challenge 3. I shipped him my final copy of Daemon by Daniel Suarez.

There were 4 other successful submissions. In order of submission:
  1. Robert Escriva
  2. Dave Walker
  3. Neil Shepperd
  4. David Langton
  5. Chris Oattes
My sincere congratulations to all of these gentlemen!

I'll describe the solution below. Both Robert and Dave will be posting their own description of the solution to their blogs. Make sure you check out their explanations as well!

The Solution

For this challenge, I gave you 3 files, an encrypted challenge_3.txt, an encrypted wrapped_passphrase, and a snippet of an md5 shadow entry. I also told you that the wrapping passphrase was 4 alphanumeric characters long. Your goal was to decrypt the contents of challenge_3.txt and solve the riddle inside.

The wrapping passphrase in Challenge 3 is one character longer than the wrapping passphrase in Challenge 2. In the solution of Challenge 2, we roughly estimated that it would take ~40 days to crack a 4-character passphrase.

However, that was using ecryptfs-unwrap-passphrase to do the testing.

In Challenge 3, you have one more extremely valuable piece of information...you have the md5sum of the wrapping passphrase.

Now this is where I made a big goof... In the original post, I mistakenly published the md5sum of the passphrase PLUS the trailing "\n" character. I had to issue an update to the challenge (see the big red text in the Challenge 3 text).

This is an important lesson, though. Be very careful with trailing slashes and hashing algorithms. That one, mostly-invisible character yields very different results. This has bitten many people many times.

There is a bit of synchronous irony in that I made this mistake within about 10 minutes of the Obama/Roberts presidential oathe gaffe :-) We all make mistakes!

So once you had the correct shadow file, you could attack this problem in one of 3 ways:
  1. Using ecryptfs-unwrap-passphrase, trying all 62^4 (14776336 unique passphrases), we estimated last time that this would take ~40 days. If you could split this among enough CPU's, you may be able to reduce this down to a day or two
  2. Using john-the-ripper
  3. Building a table of the md5sum's of all 14776336 passphrases
You can read the documentation on john, if you like, as it is an interesting tool. But for purposes of this solution, I'm going to demonstrate approach (3).

So let's generate that map with a script like this:

#!/bin/sh
# generate_passphrases.sh
CHARS="a b c d e f g h i \
j k l m n o p q r s t u v \
w x y z A B C D E F G H I \
J K L M N O P Q R S T U V \
W X Y Z 0 1 2 3 4 5 6 7 8 9"
for i in $CHARS; do
for j in $CHARS; do
for k in $CHARS; do
for l in $CHARS; do
md5=`echo -n "$i$j$k$l" | md5sum`
echo "$i$j$k$l $md5"
done
done
done
done
My laptop can crank out about 7000 of these per second. So this should only take about 35 minutes to generate the complete table. You could also add an if-statement in there to spit out the matching passphrase, testing against the md5sum I gave you in the shadow file. And, again note that you could easily split this search across multiple CPUs for parallel scaling.

Using this script, you should easily be able to determine that the wrapping passphrase is "GaM3". Looking back, you'll notice that the username I gave in the shadow file is "sobol". In the book, Matthew Sobol is the dead, game developer who designed the Daemon, which is wreaking havoc on the world in the story.

So using "GaM3", you can unwrap the wrapping-passphrase and perform the eCryptfs mount, as described in the solution to Challenge 2.

Doing so, you will be able to view the riddle:
Start with the sha512sum of the following seed:
$ echo "Daemon" | sha512sum
Write the computed sum to the first line of a new file.
sha512sum that new file.
Write the resulting sum to the second line of the new file.
sha512sum that new file.
Write the resulting sum to the third line of the new file.
...
Continue doing this until you have exactly 1,000,000 lines in the file.
Hint: the first two lines of this file should be:
b48400ad0b48c66d6f7d538cc6aff0dab594dc78721059cae20d68322363924d6c5c6894d3eaee5c90b975188cbe0e2c2ea09d209a3dd263c425e84d0fbfd2be
c5f5f35f7661849a6a9ddaa6f8951c1bad7ea05777b15967a7f64248635a1e48153715182a95c125caf5d762417fcb3df05e6e6f279a65a860877b7def856207
...
Take the last 6 digits of the 1,000,000th line, and the first six digits of
line 961,325.
Take the difference of these two numbers.
Convert that difference from hex to decimal.
Who's number is this?
Alright! So now you just need to write a simple shell script, right? Perhaps something like:

#!/bin/sh

sha=`echo "Daemon" | sha512sum - | awk '{print $1}'`
echo $sha > /tmp/sums
i=0
while [ $i -lt 1000000 ]; do
sha=`sha512sum /tmp/sums | awk '{print $1}'`
echo $sha >> /tmp/sums
i=`expr $i + 1`
done
Nice, now watch your sums grow. It will crank through $i pretty quickly...at first. But note that with every run /tmp/sums is getting bigger. Every run is harder than the previous. Eeeek!

Let's see how long the 1,000,000th run will take... A single sha512sum is 128 characters, plus a "\n", so we need a file that is 129,000,000 bytes long.

$ dd if=/dev/zero of=/tmp/foo bs=1000 count=129000
s=1000 count=129000
129000+0 records in
129000+0 records out
129000000 bytes (129 MB) copied, 0.327751 s, 394 MB/s

And let's time how long it takes to sha512sum that entire file:

$ time sha512sum /tmp/foo
d8b81aa1485ad47faa38a9f6595556be9edcade0c3a077226773200ae91e9c71e897ef42b9ddb68e3d401ea079b5c5a6961bb50bf611f27832be8398aae3f898 /tmp/foo
real 0m0.817s

Nearly a second! That's going to add up. The early sums are going to take less time, and the later ones will take more. We can roughly estimate that average run will take about half of the longest, and we can multiply that average by the 1,000,000 times we need to do it. That's somewhere around 4 days on my laptop.

The way this problem is designed, this would be very difficult to parallelize. Every subsequent calculation builds on the previous, so we cannot split it up very easily.

However, every subsequent calculation builds on the previous, and this is something we can very much use to our advantage!

In the shell script above, each call to sha512sum starts over from scratch. This is completely wasteful!

And now we're back to the title of the challenge, How efficient are you?

Every time we perform the calculation, we just have a small new delta, a little bit of data added to the end. I would have been far more cruel, had I asked you to prepend the data, but in appending it, I gave you a tremendous chance for optimization!

So with each run, you can build on the work of the previous run, rather than starting over from scratch. You can take advantage improve both your CPU and memory utilization.

Here's a small Python script from Kees Cook, who, by the way, solved each of these challenges ahead of their release, as my 1-man QA team :-)

#!/usr/bin/python
import hashlib, os, sys

m = hashlib.sha512()
m.update("Daemon\n")
digest = m.hexdigest() + "\n"

m = hashlib.sha512()

out = file('growing.sums','w')
top = 1000000
for i in range(0,top):
out.write(digest)
m.update(digest)
digest = m.hexdigest() + "\n"
if (i % 100) == 0:
print "%d %0.1f\r" % (i, i * 100 / top),
sys.stdout.flush()
out.close()
In less than 30 seconds, this script will generate precisely the same output that the shell script above will produce 4 days later! Amazing, huh!

I know that both Robert and Dave found their solutions in a similar manner. Dave wrote his script in Perl. I'm looking forward to reading their solutions on their blogs!
With your output of all 1,000,000 sums, the two lines you're looking for are:

561093903e1a03a427bf6f9a1b6f34201d5984af7840287ebace7c2f1c0db55af9c201d9d4a6b18ab4268f28c80d2c0d4f346cf33216cc81f7c39b88d4d7dbef
and
9361cc38a6ea9f24ce5d83767e37275a8c27013be736d56f8e28f7792ed922b75f34920f02ab47da9c0b1ee79f7cf8d24418ef6b294e307227dad97e7dda7080

Strip out the specified portion and calculate:
561093-da7080 = 845FED
Convert that to decimal, and you get: 8675309.

This number happens to be a rather large prime number, and a twin-prime at that! Prime numbers are the basis for public-key cryptography.

As to "who's number is this?", well, perhaps that takes a bit of knowledge about pop culture. If you enunciate each digit, you should hear it pretty clearly...

Eight - Six - Seven - Five - Three - Oh - Niiiiieeeeeiiiiiine

And if you still don't get it, Google and Wikipedia will promptly point you to Tommy Tutone's hit song of 1982, "Jenny".

Thanks so much to everyone who participated in these challenges! It was a lot of work for me to put together, but a lot of fun, too. I will try to do something like this again, perhaps after the Jaunty release. I'm working on filename encryption in eCryptfs now, that might give me a whole new approach to challenges ;-)

Finally, let me once again thank Kees Cook for all of the time and CPU cycles he spent helping me refine these problems. Kees is one in a million!

And also, thanks to Daniel Suarez and his publishers for donating the hardback copies of Daemon. These guys have been very appreciative of grass roots promotion in the blogosphere.

Daniel emailed me back in September (after my initial blog post reviewing the book). He said:
...
I sincerely appreciate the kind words. Daemon (and the upcoming sequel) are labors of love to me, and knowing that they strike a chord with folks in the open-source community makes my day.
...
Daniel Suarez (aka Leinad Zeraus)

Cheers to that Open Source Community!
:-Dustin

2 comments:

  1. Thanks for the challenge!

    I'm surprised it took you 40 minutes to get the password - my RPython program only took 4 minutes! It probably also helped that my password generator eliminated those without an uppercase, lowercase and numeric letter (a bit of chancy second-guessing on my part), cutting the search space down by 2/3.

    But that didn't make much difference anyway, because I spent ages trying to ecrypt-unwrap-passphrase it first - which actually overheated my computer and crashed it ;)

    Looking forward to a new challenge, if one ever happens!

    ReplyDelete
  2. Using google to solve that MD5 thing is so much more effective.

    ReplyDelete

Please do not use blog comments for support requests! Blog comments do not scale well to this effect.

Instead, please use Launchpad for Bugs and StackExchange for Questions.
* bugs.launchpad.net
* stackexchange.com

Thanks,
:-Dustin