From the Canyon Edge -- :-Dustin
Showing posts with label run-one. Show all posts
Showing posts with label run-one. Show all posts

Tuesday, September 10, 2013

Introducing run-one-constantly, run-one-until-failure and run-one-until-success


Necessity is truly the mother of invention.  I was working from the Isle of Man recently, and really, really enjoyed my stay!  There's no better description for the Isle of Man than "quaint":
quaint
kwānt/
adjective1. attractively unusual or old-fashioned.
"quaint country cottages"
synonyms:picturesquecharmingsweetattractiveold-fashionedold-world
Though that description applies to the Internet connectivity, as well :-)  Truth be told, most hotel WiFi is pretty bad.  But nestle a lovely little old hotel on a forgotten little Viking/Celtic island and you will really see the problem exacerbated.

I worked around most of my downstream issues with a couple of new extensions to the run-one project, and I'm delighted as always to share these with you in Ubuntu's package!

As a reminder, the run-one package already provides:
  • run-one COMMAND [ARGS]
    • This is a wrapper script that runs no more than one unique instance of some command with a unique set of arguments.
    • This is often useful with cronjobs, when you want no more than one copy running at a time.
  • run-this-one COMMAND [ARGS]
    • This is exactly like run-one, except that it will use pgrep and kill to find and kill any running processes owned by the user and matching the target commands and arguments.
    • Note that run-this-one will block while trying to kill matching processes, until all matching processes are dead.
    • This is often useful when you want to kill any previous copies of the process you want to run (like VPN, SSL, and SSH tunnels).
  • keep-one-running COMMAND [ARGS]
    • This command operates exactly like run-one except that it respawns the command with its arguments if it exits for any reason (zero or non-zero).
    • This is useful when you want to ensure that you always have a copy of a command or process running, in case it dies or exits for any reason.
Newly added, you can now:
  • run-one-constantly COMMAND [ARGS]
    • This is simply an alias for keep-one-running.
    • I've never liked the fact that this command started with "keep-" instead of "run-one-", from a namespace and discoverability perspective.
  • run-one-until-success COMMAND [ARGS]
    • This command operates exactly like run-one-constantly except that it respawns "COMMAND [ARGS]" until COMMAND exits successfully (ie, exits zero).
    • This is useful when downloading something, perhaps using wget --continue or rsync, over a crappy quaint hotel WiFi connection.
  • run-one-until-failure COMMAND [ARGS]
    •  This command operates exactly like run-one-constantly except that it respawns "COMMAND [ARGS]" until COMMAND exits with failure (ie, exits non-zero).
    • This is useful when you want to run something until something goes wrong.
I am occasionally asked about the difference between these tools and the nohup command...
  1. First, the "one" part of run-one-constantly is important, in that it uses run-one to protect you from running more than one instances of the specified command. This is handy for something like an ssh tunnel, that you only really want/need one of.
  2. Second, nohup doesn't rerun the specified command if it exits cleanly, or forcibly gets killed. nohup only ignores the hangup signal.
So you might say that the run-one tools are a bit more resilient than nohup.

You can use all of these as of Ubuntu 13.10 (Saucy), by simply:

sudo apt-get install run-one

Or, for older Ubuntu releases:

sudo apt-add-repository ppa:run-one/ppa
sudo apt-get update
sudo apt-get install run-one

I was also asked about the difference between these tools and upstart...

Upstart is Ubuntu's event driven replacement for sysvinit.  It's typically used to start daemons and other scripts, utilities, and "jobs" at boot time.  It has a really cool feature/command/option called respawn, which can be used to provide a very similar effect as run-one-constantly.  In fact, I've used respawn in several of the upstart jobs I've written for the Ubuntu server, so I'm happy to credit upstart's respawn for the idea.

That said, I think the differences between upstart and run-one are certainly different enough to merit both tools, at least on my servers.

  1. An upstart job is defined by its own script-like syntax.  You can see many examples in Ubuntu's /etc/init/*.conf.  On my system the average upstart job is 25 lines long.  The run-one commands are simply prepended onto the beginning of any command line program and arguments you want to run.  You can certainly use run-one and friends inside of a script, but they're typically used in an interactive shell command line.
  2. An upstart job typically runs at boot time, or when "started" using the start command, and these start jobs located in the root-writable /etc/init/.  Can a non-root user write their own upstart job, and start and stop it?  Not that I can tell (and I'm happy to be corrected here)...    Turns out I was wrong about that, per a set of recently added features to Upstart (thanks, James, and Stuart for pointing out!), non-root users can now write and run their own upstart jobs..   Still, any user on the system can launch run-one jobs, and their own command+arguments namespace is unique to them.
  3. run-one is easily usable on systems that do not have upstart available; the only hard dependency is on the flock(1) utility.
Hope that helps!


Happy running,
:-Dustin

Monday, July 22, 2013

JohnJohn -- A Scalable Juju Charm Tutorial

UPDATE: I wrote this charm and blog post before I saw the unfortunate news that the UbuntuForums.org along with Apple's Developer websites had been very recently compromised and user database stolen.  Given that such events have sadly become commonplace, the instructions below can actually be used by proactive administrators to identify and disable weak user passwords, expecting that the bad guys are already doing the same.

It's been about 2 years since I've written a Juju charm.  And even those that I wrote were not scale-out applications.

I've been back at Canonical for two weeks now, and I've been spending some time bringing myself up to speed on the cloud projects that form the basis for the Cloud Solution products, for which I'm responsible. First, I deployed MAAS, and then brought up a small Ubuntu OpenStack cluster.  Finally, I decided to tackle Juju and rather than deploying one of the existing charms, I wanted to write my own.

Installing Juju

Juju was originally written in Python, but has since been ported to Golang over the last 2+ years.  My previous experience was exclusively with the Python version of Juju, but all new development is now focused on the Golang version of Juju, also known as juju-core.  So at this point, I decided to install juju-core from the 13.04 (raring) archive.

sudo apt-get install juju-core

I immediately hit a couple of bugs in the version of juju-core in 13.04 (1.10.0.1-0ubuntu1~ubuntu13.04.1), particularly Bug #1172973.  Life is more fun on the edge anyway, so I upgraded to a daily snapshot from the PPA.

sudo apt-add-repository ppa:juju/devel
sudo apt-get update
sudo apt-get install juju-core

Now I'm running juju-core 1.11.2-3~1414~raring1, and it's currently working.

Configuring Juju

Juju can be configured to use a number of different cloud backends as "providers", notably, Amazon EC2, OpenStack, MAAS, and HP Cloud.

For my development, I'm using Canonical's internal deployment of OpenStack, and so I configured my environment accordingly in ~/.juju/environments.yaml:

default: openstack
environments:
  openstack:
    type: openstack
    admin-secret: any-secret-you-choose-randomly
    control-bucket: any-bucket-name-you-choose-randomly
    default-series: precise
    auth-mode: userpass

Using OpenStack (or even AWS for that matter) also requires defining a number of environment variables in an rc-file.  Basically, you need to be able to launch instances using euca2ools or ec2-api-tools.  That's outside of the scope of this post, and expected as a prerequisite.

The official documentation for configuring your Juju environment can be found here.

Choosing a Charm-able Application

I have previously charmed two small (but useful!) webapps that I've written and continue to maintain -- Pictor and Musica.  These are both standalone web applications that allow you to organize, serve, share, and stream your picture archive and music collection.  But neither of these "scale out", really.  They certainly could, perhaps, use a caching proxy on the front end, and shared storage on the back end.  But, as I originally wrote them, they do not.  Maybe I'll update that, but I don't know of anyone using either of those charms.

In any case, for this experiment, I wanted to write a charm that would "scale out", with Juju's add-unit command.  I wanted to ensure that adding more units to a deployment would result in a bigger and better application.

For these reasons, I chose the program known as John-the-Ripper, or just john.  You can trivially install it on any Ubuntu system, with:

sudo apt-get install john

John has been used by Linux system administrators for over a decade to test the quality of their user's passwords.  A root user can view the hashes that protect user passwords in files like /etc/shadow or even application level password hashes in a database.  Effectively, it can be used to "crack" weak passwords.  There are almost certainly evil people using programs like john to do malicious things.  But as long as the good guys have access to a program like john too, they can ensure that their own passwords are impossible to crack.

John can work in a number of different "modes".  It can use a dictionary of words, and simply hash each of those words looking for a match.  The john-data package ships a word list in /usr/share/john/password.lst that contains 3,000+ words.  You can find much bigger wordlists online as well, such as this one, which contains over 2 million words.

John can also generate "twists" on these words according to some rules (like changing E's to 3's, and so on).  And it can also work in a complete brute force mode, generating every possible password from various character sets.  This, of course, will take exponentially longer run times, depending on the length of the password.

Fortunately, John can run in parallel, with as many workers as you have at your disposal.  You can run multiple processes on the same system, or you can scale it out across many systems.  There are many different approaches to parallelizing John, using OpenMP, MPI, and others.

I took a very simple approach, explained in the manpage and configuration file called "External".  Basically, in the /etc/john/john.conf configuration file, you tell each node how many total nodes exist, and which particular node they are.  Each node uses the same wordlist or sequential generation algorithm, and indexes these.  The node modulates the current index by the total number of nodes, and tries the candidate passwords that match their own id.  Dead simple :-)  I like it.

# Trivial parallel processing example
[List.External:Parallel]
/*
 * This word filter makes John process some of the words only, for running
 * multiple instances on different CPUs.  It can be used with any cracking
 * mode except for "single crack".  Note: this is not a good solution, but
 * is just an example of what can be done with word filters.
 */
int node, total;                        // This node's number, and node count
int number;                             // Current word number
void init()
{
        node = 1; total = 2;            // Node 1 of 2, change as appropriate
        number = node - 1;              // Speedup the filter a bit
}
void filter()
{
        if (number++ % total)           // Word for a different node?
                word = 0;               // Yes, skip it
}

This does, however, require some way of sharing the inputs, logs, and results across all nodes.  Basically, I need a shared filesystem.  The Juju charm collection has a number of shared filesystem charms already implemented.  I chose to use NFS in my deployment, though I could have just as easily used Ceph, Hadoop, or others.

Writing a Charm

The official documentation on writing charms can be found here.  That's certainly a good starting point, and I read all of that before I set out.  I also spent considerable time in the #juju IRC channel on irc.freenode.net, talking to Jorge and Marco.  Thanks, guys!

The base template of the charm is pretty simple.  The convention is to create a charm directory like this, and put it under revision control.

mkdir -p precise/john
bzr init .

I first needed to create the metadata that will describe my charm to Juju.  My charm is named john, which is an application known as "John the Ripper", which can test the quality of your passwords.  I list myself as the maintainer.  This charm requires a shared filesystem that implements the mount interface, as my charm will call some hooks that make use of that mount interface.  Most importantly, this charm may have other peers, which I arbitrarily called workers.  They have a dummy interface (not used) called john.  Here's the metadata.yaml:

name: john
summary: "john the ripper"
description: |
 John the Ripper tests the quality of system passwords
maintainer: "Dustin Kirkland"
requires:
  shared-fs:
    interface: mount
peers:
  workers:
    interface: john

I also have one optional configuration parameter, called target_hashes.  This configuration string will include the input data that john will work on, trying to break.  This can be one to many different password hashes to crack.  If this isn't specified, this charm actually generates some random ones, and then tries to break those.  I thought that would be nice, so that it's immediately useful out of the box.  Here's config.yaml:

options:
  target_hashes:
    type: string
    description: input password hashes

There's a couple of other simple files to create, such as copyright:

Format: http://dep.debian.net/deps/dep5/

Files: *
Copyright: Copyright 2013, Dustin Kirkland , All Rights Reserved.
License: GPL-3
 This program is free software: you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
 the Free Software Foundation, either version 3 of the License, or
 (at your option) any later version.
 .
 This program is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.
 .
 You should have received a copy of the GNU General Public License
 along with this program.  If not, see .

README and revision are also required.

And the Magic -- Hooks!

The real magic happens in a set of very specifically named hooks.  These are specially named executables, which can be written in any language.  For my purposes, shell scripts are more than sufficient.

The Install Hook

The install hook is what is run at installation time on each worker node.  I need to install the john and john-data packages, as well as the nfs-common client binaries.  I also make use of the mkpasswd utility provided by the whois package.  And I will also use the keep-one-running tool provided by the run-one package.  Finally, I need to tweak the configuration file, /etc/john/john.conf, on each node, to use all of the CPU, save results every 10 seconds (instead of every 10 minutes), and to use the much bigger wordlist that we're going to fetch.  Here's hooks/install:

#!/bin/bash
set -eu
juju-log "Installing all components"
apt-get update
apt-get install -qqy nfs-common john john-data whois run-one
DIR=/var/lib/john
mkdir -p $DIR
ln -sf $DIR /root/.john
sed -i -e "s/^Idle = .*/Idle = N/" /etc/john/john.conf
sed -i -e "s/^Save = .*/Save = 10/" /etc/john/john.conf
sed -i -e "s:^Wordlist = .*:Wordlist = $DIR\/passwords.txt:" /etc/john/john.conf
juju-log "Installed packages"

The Start Hook

The start hook defines how to start this application.  Ideally, the john package would provide an init script or upstart job that cleanly daemonizes its workers, but it currently doesn't.  But for a poor-man's daemonizer, I love the keep-one-running utility (written by yours truly).  I'm going to start two copies of john utility, one that runs in wordlist mode, trying every one of the 2 million words in my wordlist, as well as a second, which tries every combinations of characters in an incremental, brute force mode.  These binaries are going to operate entirely in the shared /var/lib/john NFS mount point.  Each copy on each worker node will need to have their own session file.  Here's hooks/start:

#!/bin/bash
set -eu
juju-log "Starting john"
DIR=/var/lib/john
keep-one-running john -incremental -session:$DIR/session-incremental-$(hostname) -external:Parallel $DIR/target_hashes &
keep-one-running john -wordlist:$DIR/passwords.txt -session:$DIR/session-wordlist-$(hostname) -external:Parallel $DIR/target_hashes &

The Stop Hook

The stop hook defines how to stop the application.  Here, I'll need to kill the keep-one-running processes which wrap john, since we don't have an upstart job or init script.  This is perhaps a little sloppy, but perfectly functional.  Here's hooks/stop:

#!/bin/bash
set -eu
juju-log "Stopping john"
killall keep-one-running || true

The Workers Relation Changed Hook

This hook defines the actions that need to be taken each time another john worker unit is added to the service.  Basically, each worker needs to recount how many total workers there are (using the relation-list command), determine their own id (from $JUJU_UNIT_NAME), update their /etc/john/john.conf (using sed), and then restart their john worker processes.  The last part is easy since we're using keep-one-running; we simply need to killall john processes, and keep-one-running will automatically respawn new processes that will read the updated configuration file.  This is hooks/workers-relation-changed:

#!/bin/bash
set -eu
DIR="/var/lib/john"
update_unit_count() {
        node=$(echo $JUJU_UNIT_NAME | awk -F/ '{print $2}')
        node=$((node+1))
        total=$(relation-list | wc -l)
        total=$((total+1))
        sed -i -e "s/^\s\+node = .*; total = .*;.*$/        node = $node; total = $total;/" /etc/john/john.conf
}
restart_john() {
        killall john || true
        # It'll restart itself via keep-one-running, if we kill it
}
update_unit_count
restart_john

The Configuration Changed Hook

All john worker nodes will operate on a file in the shared filesystem called /var/lib/john/target_hashes.  I'd like the administrator who deployed this service to be able to dynamically update that file and signal all of her worker nodes to restart their john processes.  Here, I used the config-get juju command, and again restart by simply killing the john processes and letting keep-one-running sort out the restart.  This is handled here in hooks/config-changed:

#!/bin/bash
set -e
DIR=/var/lib/john
target_hashes=$(config-get target_hashes)
if [ -n "$target_hashes" ]; then
        # Install the user's supplied hashes
        echo "$target_hashes" > $DIR/target_hashes
        # Restart john
        killall john || true
fi

The Shared Filesystem Relation Changed Hook

By far, the most complicated logic is in hooks/shared-fs-relation-changed.  There's quite a bit of work we need to here, as soon as we can be assured that this node has successfully mounted its shared filesystem.  There's a bit of boilerplate mount work that I borrowed from the owncloud charm.  Beyond that, there's a bit of john-specific work.  I'm downloading the aforementioned larger wordlist.  I install the target hash, if specified in the configuration; otherwise, we just generate 10 random target passwords to try and crack.  We also symlink a bunch of john's runtime shared data into the NFS directory.  For no good reason, john expects a bunch of stuff to be in the same directory.  Of course, this code could really use some cleanup.  Here it is again, non-perfect, but functional hooks/shared-fs-relation-changed:
#!/bin/bash
set -eu

remote_host=`relation-get private-address`
export_path=`relation-get mountpoint`
mount_options=`relation-get options`
fstype=`relation-get fstype`
DIR="/var/lib/john"

if [ -z "${export_path}" ]; then
    juju-log "remote host not ready"
    exit 0
fi

local_mountpoint="$DIR"

create_local_mountpoint() {
  juju-log "creating local mountpoint"
  umask 022
  mkdir -p $local_mountpoint
  chown -R ubuntu:ubuntu $local_mountpoint
}
[ -d "${local_mountpoint}" ] || create_local_mountpoint

share_already_mounted() {
  `mount | grep -q $local_mountpoint`
}

mount_share() {
  for try in {1..3}; do
    juju-log "mounting share"
    [ ! -z "${mount_options}" ] && options="-o ${mount_options}" || options=""
    mount  -t $fstype $options $remote_host:$export_path $local_mountpoint \
      && break

    juju-log "mount failed: ${local_mountpoint}"
    sleep 10

  done
}

download_passwords() {
  if [ ! -s $DIR/passwords.txt ]; then
    # Grab a giant dictionary of passwords, 20MB, 2M passwords
    juju-log "Downloading password dictionary"
    cd $DIR
    # http://www.breakthesecurity.com/2011/12/large-password-list-free-download.html
    wget http://dazzlepod.com/site_media/txt/passwords.txt
    juju-log "Done downloading password dictionary"
  fi
}

install_target_hashes() {
  if [ ! -s $DIR/target_hashes ]; then
    target_hashes=$(config-get target_hashes)
    if [ -n "$target_hashes" ]; then
 # Install the user's supplied hashes
 echo "$target_hashes" > $DIR/target_hashes
    else
 # Otherwise, grab some random ones
 i=0
 for p in $(shuf -n 10 $DIR/passwords.txt); do
  # http://openwall.info/wiki/john/Generating-test-hashes
  printf "user${i}:%s\n" $(mkpasswd -m md5 $p) >> $DIR/target_hashes
  i=$((i+1))
 done
    fi
  fi
  for i in /usr/share/john/*; do
   ln -sf $i /var/lib/john
  done
}

apt-get -qqy install rpcbind nfs-common
share_already_mounted || mount_share
download_passwords
install_target_hashes

Deploying the Service

If you're still with me, we're ready to deploy this service and try cracking some passwords!  We need to bootstrap our environment, and deploy the stock nfs charm.  Next, branch my charm's source code, and deploy it.  I deployed it here across a whopping 18 units!  I currently have a quota of 20 small instances I can run our private OpenStack.  Two of those instances are used by the Juju bootstrap node and by the NFS server.  So the other 18 will be NFS clients running john processes.

juju bootstrap
juju deploy nfs
bzr branch lp:~kirkland/+junk/john precise
juju deploy -n 18 --repository=precise local:precise/john
juju add-relation john nfs
juju status

Once everything is up and ready, running and functional, my status looks like this:

machines:
  "0":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.230
    instance-id: 98090098-2e08-4326-bc73-22c7c6879b95
    series: precise
  "1":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.7
    instance-id: 449c6c8c-b503-487b-b370-bb9ac7800225
    series: precise
  "2":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.193
    instance-id: 576ffd6f-ddfa-4507-960f-3ac2e11ea669
    series: precise
  "3":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.215
    instance-id: 70bfe985-9e3f-4159-8923-60ab6d9f7d43
    series: precise
  "4":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.221
    instance-id: f48364a9-03c0-496f-9287-0fb294bfaf24
    series: precise
  "5":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.223
    instance-id: 62cc52c4-df7e-448a-81b1-5a3a06af6324
    series: precise
  "6":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.231
    instance-id: f20dee5d-762f-4462-a9ef-96f3c7ab864f
    series: precise
  "7":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.239
    instance-id: 27c6c45d-18cb-4b64-8c6d-b046e6e01f61
    series: precise
  "8":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.240
    instance-id: 63cb9c91-a394-4c23-81bd-c400c8ec4f93
    series: precise
  "9":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.242
    instance-id: b2239923-b642-442d-9008-7d7e725a4c32
    series: precise
  "10":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.249
    instance-id: 90ab019c-a22c-41d3-acd2-d5d7c507c445
    series: precise
  "11":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.252
    instance-id: e7abe8e1-1cdf-4e08-8771-4b816f680048
    series: precise
  "12":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.254
    instance-id: ff2b6ba5-3405-4c80-ae9b-b087bedef882
    series: precise
  "13":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.60.255
    instance-id: 2b019616-75bc-4227-8b8b-78fd23d6b8fd
    series: precise
  "14":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.61.1
    instance-id: ecac6e11-c89e-4371-a4c0-5afee41da353
    series: precise
  "15":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.61.3
    instance-id: 969f3d1c-abfb-4142-8cc6-fc5c45d6cb2c
    series: precise
  "16":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.61.4
    instance-id: 6bb24a01-d346-4de5-ab0b-03f51271e8bb
    series: precise
  "17":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.61.5
    instance-id: 924804d6-0893-4e56-aef2-64e089cda1be
    series: precise
  "18":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.61.11
    instance-id: 5c96faca-c6c0-4be4-903e-a6233325caec
    series: precise
  "19":
    agent-state: started
    agent-version: 1.11.0
    dns-name: 10.99.61.15
    instance-id: 62b48da2-60ea-4c75-b5ed-ffbb2f8982b5
    series: precise
services:
  john:
    charm: local:precise/john-3
    exposed: false
    relations:
      shared-fs:
      - nfs
      workers:
      - john
    units:
      john/0:
        agent-state: started
        agent-version: 1.11.0
        machine: "2"
        public-address: 10.99.60.193
      john/1:
        agent-state: started
        agent-version: 1.11.0
        machine: "3"
        public-address: 10.99.60.215
      john/2:
        agent-state: started
        agent-version: 1.11.0
        machine: "4"
        public-address: 10.99.60.221
      john/3:
        agent-state: started
        agent-version: 1.11.0
        machine: "5"
        public-address: 10.99.60.223
      john/4:
        agent-state: started
        agent-version: 1.11.0
        machine: "6"
        public-address: 10.99.60.231
      john/5:
        agent-state: started
        agent-version: 1.11.0
        machine: "7"
        public-address: 10.99.60.239
      john/6:
        agent-state: started
        agent-version: 1.11.0
        machine: "8"
        public-address: 10.99.60.240
      john/7:
        agent-state: started
        agent-version: 1.11.0
        machine: "9"
        public-address: 10.99.60.242
      john/8:
        agent-state: started
        agent-version: 1.11.0
        machine: "10"
        public-address: 10.99.60.249
      john/9:
        agent-state: started
        agent-version: 1.11.0
        machine: "11"
        public-address: 10.99.60.252
      john/10:
        agent-state: started
        agent-version: 1.11.0
        machine: "12"
        public-address: 10.99.60.254
      john/11:
        agent-state: started
        agent-version: 1.11.0
        machine: "13"
        public-address: 10.99.60.255
      john/12:
        agent-state: started
        agent-version: 1.11.0
        machine: "14"
        public-address: 10.99.61.1
      john/13:
        agent-state: started
        agent-version: 1.11.0
        machine: "15"
        public-address: 10.99.61.3
      john/14:
        agent-state: started
        agent-version: 1.11.0
        machine: "16"
        public-address: 10.99.61.4
      john/15:
        agent-state: started
        agent-version: 1.11.0
        machine: "17"
        public-address: 10.99.61.5
      john/16:
        agent-state: started
        agent-version: 1.11.0
        machine: "18"
        public-address: 10.99.61.11
      john/17:
        agent-state: started
        agent-version: 1.11.0
        machine: "19"
        public-address: 10.99.61.15
  nfs:
    charm: cs:precise/nfs-3
    exposed: false
    relations:
      nfs:
      - john
    units:
      nfs/0:
        agent-state: started
        agent-version: 1.11.0
        machine: "1"
        public-address: 10.99.60.7

Obtaining the Results

And now, let's monitor the results.  To do this, I'll ssh to any of the john worker nodes, move over to the shared NFS directory, and use the john -show command in a watch loop.

keep-one-running juju ssh john/0
sudo su -
cd /var/lib/john
watch john -show target_hashes

And the results...
Every 2.0s: john -show target_hashes

user:260775
user1:73832100
user2:829171kzh
user3:pf1vd4nb
user4:7788521312229
user5:saksak
user6:rongjun2010
user7:2312010
user8:davied
user9:elektrohobbi

10 password hashes cracked, 0 left

Within a few seconds, this 18-node cluster has cracked all 10 of the randomly chosen passwords from the dictionary.  That's only mildly interesting, as my laptop can do the same in a few minutes, if the passwords are already in the wordlist.  What's far more interesting is in randomly generating a password and passing that as a new configuration to our running cluster and letting it crack that instead.

Modifying the Configuration Target Hash

Let's generate a random password using apg.  We'll then need to hash this and create a string in the form of username:pwhash that john can understand.  Finally, we'll pass this to our cluster using Juju's set action.

passwd=$(apg -a 0 -n 1 -m 6 -x 6)
target=$(printf "user0:%s\n" $(mkpasswd -m md5 $passwd))
juju set john target_hashes="$target"

This was a 6 character password, consisting of 52 random characters (a-z, A-Z), almost certainly not in our dictionary.  526 = 19,770,609,664, or about 19 billion letter combinations we need to test.  According to the john -test command, a single one of my instances can test about 12,500 MD5 hashes per second.  So with a single instance, this would take a maximum of 526 / 12,500 / 60 / 60 = 439 hours. Or 18 days :-) Well, I happen to have exactly 18 instances, so we should be able to test the entire wordspace in about 24 hours.

So I threw all 18 instances at this very problem and let it run over a weekend. And voila, we got a little lucky, and cracked the password, Uvneow, in 16 hours!

In Conclusion

I don't know if this charm will ever land in the official charm store.  That really wasn't the goal of this exercise for me.  I simply wanted to bring myself back up to speed on Juju, play with the port to Golang, experiment with OpenStack as a provider for Juju, and most importantly, write a scalable Juju charm.

This particularly application, john, is actually just one of a huge class of MPI-compatible parallelizable applications that could be charmed for Juju.  The general design, I think, should be very reusable by you, if you're interested.  Between the shared file system and the keep-one-running approach, I bet you could charm any one of a number of scalable applications.  While I'm not eligible, perhaps you might consider competing for cash prizes in the Juju Charm Championship.

Happy charming,
:-Dustin

Monday, December 12, 2011

I've Joined the Gazzang Team!


A few weeks ago, I joined a fun, new start-up company here in Austin called Gazzang.  I was a little surprised that this was published in the form of a rather flattering press release :-)  Let's just say that my Mom was very proud!

I know that some of you in the Ubuntu community are wondering how that career change will affect my responsibilities and contributions to Ubuntu.  I'm delighted to say that I'll most certainly continue to contribute to Ubuntu and many of my upstream projects.  Gazzang is quite supportive of my work in both Ubuntu and open source.

Most directly, you should see me being far more active in my regular maintenance, development, bug triage, and support of eCryptfs.  Gazzang's core business is in building information privacy and data security solutions for the Cloud.  eCryptfs is at the heart of their current products, and in my new role as Gazzang's Chief Architect, we're working on some interesting innovations in and around eCryptfs.  A healthy, high-quality, feature-filled, high-performance eCryptfs is essential to Gazzang's objectives, and I'm looking forward to working on one of my real passions in eCryptfs!

More specifically, looking at the projects I maintain, I expect to continue to be very active in:
  • eCryptfs (essential to my new job)
  • byobu (mostly around tmux, and because hacking on byobu is fun and awesome :-)
  • manpages.ubuntu.com and manpg.es (because that's how I read manpages)
  • musica (because that's how I've streamed music since 1998)
  • pictor (because that's how I've managed and shared pictures since 1998)
You'll probably see opportunistic development (nothing active, but when an opportunity or bugs spring up), including the usual bzr/launchpad dance, developing, testing, upstream releasing, packaging, and uploading to Ubuntu, of:
And finally, as prescribed by the Ubuntu Code of Conduct, I'm gracefully stepping away from a few other projects I've founded or maintained in the past.  I'll help out if and when I can, but for now I've transferred all of the necessary rights, responsibilities and ownership of:


Finally, I must say that the last 4 years have been the most amazing 4 years of my entire 12 year professional career.  It's been quite rewarding to witness the fledgling Ubuntu Server of February 2008 (when I joined Canonical), and the tiny team of 5 grow and evolve to the 20+ amazing people now working directly on the Ubuntu Server.  And that list doesn't even remotely cover the dozens (if not hundreds!) of others around Canonical and the Ubuntu Community who contribute and depend on the amazing Server and Cloud distribution that is Ubuntu.

I'm really looking forward to my new opportunities around Gazzang and eCryptfs, but you'll still most certainly see me around Ubuntu too :-)  As crooned by The Beatles...
You say "Yes", I say "No". \\ You say "Stop" and I say "Go, go, go". \\ Oh no. \\ You say "Goodbye" and I say "Hello, hello, hello". \\ I don't know why you say "Goodbye", I say "Hello, hello, hello". \\ I don't know why you say goodbye, I say hello!
 Cheers,
:-Dustinhttp://www.gazzang.com

Tuesday, August 2, 2011

Keep One SSH Tunnel to a Bip Proxy Server Running

I've been using IRC proxies on-and-off since 1999, and consistently since 2003.

I used dircproxy until February 2008, when I joined Canonical, at which point I switched to bip, as I needed support for SSL encrypted connections.

I've also helped at least a dozen friends and colleagues construct similar setups, so this blog post documentation is long, long overdue and triggered by another colleague asking me tonight to explain my setup again :-)

As you'll see below, it's not too complex, but it's really quite robust.  With this setup, all messages are logged, whether I'm attached or not.  When I'm not attached, I'm automatically marked 'away'.  All traffic between me and my server is encrypted.  Most importantly, my client marks any flagged messages/highlights that I missed each time I reconnect.

There are 4 key pieces to this setup:
  1. bip
  2. ssh
  3. keep-one-running
  4. xchat (or insert your favorite IRC client here)

The Server

I have a production, monitored Ubuntu server hosting www.divitup.com -- a side project that I authored back in college in 2000 to help split bills between roommates.  Long before the dawn of Facebook when Zuckerberg and the Winklevosses were still in high school :-)

It's an Ubuntu Server inside a VPS hosted by A2Hosting.com (with whom I've always been quite pleased!).  There's rarely downtime, but when there is, I hear about it from DivItUp users in a hurry.  It's the closest thing I have to an always-up server ;-)

Beside the DivItUp.com web service, it also runs SSH (of course) and I've installed the Bip Proxy service (though the port is not open externally).  Bip installs quite trivially on Ubuntu with:

sudo apt-get install bip

Though you may need to enable it in /etc/default/bip.

Bip can be configured globally for the server in /etc/bip.conf.  See the manpage and the inline comments in your default /etc/bip.conf, but this should give a decent idea of roughly how mine is configured:

ip = "127.0.0.1";
port = 7778;
client_side_ssl = true;
log_level = 3;
pid_file="/var/run/bip/bip.pid";
log_root = "/var/log/bip/";
log_format = "%n/%Y-%m/%c.%d.log";
log_sync_interval = 5;
backlog = true;
backlog_lines = 0;              # number of lines in backlog, 0 means no limit
backlog_always = false;         # backlog even lines already backlogged
backlog_msg_only = true;
blreset_on_talk = true;
backlog_reset_on_talk = true;

# Networks
network {
        name = "canonical";
        ssl = true;
        server { host = "irc.canonical.com"; port = 6697; };
};
network {
        name = "freenode";
        server { host = "irc.freenode.net"; port = 6667; };
};

network {
        name = "oftc";
        server { host = "irc.oftc.net"; port = 6667; };
};

# Users/channels
user { 
        name = "kirkland";      # bip user name (not IRC username)
        password = "88548dff20a3b2b72852b4256a7a3544";  # bip user password, generated by bipmkpw
        ssl_check_mode = "none";
        default_nick = "kirkland";              # IRC nick
        default_user = "kirkland";              # IRC user
        default_realname = "Dustin Kirkland";   # IRC real name

        # A user can have mutiple connections to irc networks.
        connection {
                name = "canonical";             # used by bip only
                network = "canonical";  # which ircnet to connect to
                user = "kirkland";
                realname = "Dustin Kirkland";
                password = "SomePassword";
                ignore_first_nick = true;
                no_client_away_msg = "currently disconnected";
                # Autojoined channels
                channel { name = "#a-channel,#another-channel,#maybe-a-third"; };
        };

        # another connection (optional)
        connection {
                name = "freenode";
                network = "freenode";
                ignore_first_nick = true;
                no_client_away_msg = "currently disconnected";
                on_connect_send = "PRIVMSG NickServ :IDENTIFY yourIRCpasswordHere";
                # Autojoined channels:
                channel { name = "#byobu"; };
                channel { name = "#ubuntu-devel"; };
                channel { name = "#ubuntu-meeting"; };
                channel { name = "#ubuntu-server"; };
                channel { name = "#ubuntu-cloud"; };
                # Password protected channel
                channel { name = "##the-good-stuff"; key = "zuperSekrit"; };
        };
};

Once you've installed and configured bip, start the service!

sudo service bip start

Now, let's take a look at the client...

The Client

Here, you really just need two things ... an always-running SSH tunnel to your server, and your IRC client.  I'll discuss Ubuntu/xchat here, but you can do the same with Android/AndChat.

There are several ways to configure an SSH tunnel (like stunnel), but here I'm going to show you the one that I'm partial toward :-)  I wrap an ssh port forwarding session with keep-one-running, and configure Unity to launch that automatically at boot.

My ssh command looks like this:

ssh -N -L 7778:localhost:7778 divitup.com


Now I want to make sure that there's always one, and only one of these running on my laptop client at all times.  I want it to automatically reconnect if I lose wireless connectivity, switch access points networks, suspend-and-resume, etc.  So I wrap that command with the keep-one-running utility.

keep-one-running ssh -N -L 7778:localhost:7778 divitup.com

And I set Unity (or Gnome/KDE/XFCE) to run this command at desktop login.  Alt-F2, "Startup Applications".


At login, I can run "ps -ef | grep keep-one-running" and see the command in my list.

Finally, I need to configure my IRC client, xChat, to talk to localhost:7778, rather than irc.freenode.net.

Here, you'll add a custom "network" for each of the server connections you defined in your /etc/bip.conf on the Server.   You'll use localhost/7778 for the hostname and port, since that's where you're SSH-port-forwarding to.  You'll enter your NickServ password (if you authenticate to IRC).  And you'll use the Server Password you created with bipmkpw.



Now, if you have an Android device, you can connect to the same proxy, by following my colleague, Juan Negron's supplementary post here!

Do you think you could improve your connectivity with such a setup?  Do you have a better way of solving this problem?

:-Dustin

Monday, July 18, 2011

Introducing keep-one-running!

I just added another utility to the run-one package -- keep-one-running.  It's already in Ubuntu Oneiric (11.10), or you can add it to any other supported Ubuntu release from the PPA, with:

sudo apt-add-repository ppa:run-one/ppa
sudo apt-get update
sudo apt-get install -y run-one

run-one is a very useful tool that you can use to ensure that you never have more than one invocation of a process running on a system at a time.  I now use it in every single cron job I have, to keep long running jobs from ever stepping on a subsequent one.

I use a bip proxy to keep me connected to IRC and log messages even while I'm away.  Before opening xchat, I need to establish an ssh tunnel to my bip proxy.  More importantly, I need to keep that connection up (particularly when I'm on an unreliable network). 

To solve that problem generally, I added the keep-one-running mode to run-one.  And now, I added this command to my Unity startup applications:

keep-one-running ssh -N -C -L 7778:localhost:7778 divitup.com

If I were a root user, I could perhaps use upstart and the respawn directive.  I guess you could look at keep-one-running as a poor man's respawn.  Give it a shot and let me know if it's useful to you!

Enjoy ;-)

:-Dustin

Tuesday, April 5, 2011

A Lesson Learned the Hard Way about SSDs

Everyone told me, when I started looking at SSD hard drives, "Buy Intel."

But I didn't listen.  And boy, did I pay for it.  Not once, but twice :-(



As of yesterday, my 1+ year saga with Patriot SSDs is finally over.  Stay tuned for the next post, where I'll talk about a few really important lessons learned, in terms of data backup, and some tools I now use to avoid this situation ever again.  Until then, here's a timeline, meticulously reconstructed from my email and system logs.
  • 17 December 2009
    • Paid $406.97 at Amazon.com for a Patriot SSD, expensive but, Merry Christmas to me!
    • Patriot Torqx 2.5-Inch 128 GB SATAII Solid State Drive with 220MB/s Read - PFZ128GS25SSDR
    • Received and installed Ubuntu Lucid a few days later
    • Read/write benchmarks were very close to advertised rates, and I bragged to my Intel-SSD-wielding colleagues
  • 3 March 2010
    • Hard drive simply "disappeared", doh!
    • Neither the BIOS nor kernel could see the hard drive
    • Patriot acknowledged the issue as a firmware bug, and provided a Windows executable to flash the controller on the hard drive
    • Flashing the controller would discard all data on the hard drive, no way to recover
    • There was no Linux alternative for the magic Windows executable
    • I had reasonable backups (within the last week or so), so I started the RMA process
  • 4 March 2010
    • Returned to Patriot via Fedex (at their expense)
  • 24 March 2010
    • Received replacement drive, 3+ weeks later
    • Re-installed Ubuntu Lucid
  • 19 November 2010
    • Another crash; again hard drive just "disappeared"
    • I was traveling at the time, and did not have a current backup :-(
    • I wrote the run-one utility days later (more on that in the next post), and redesigned where and how I store and backup data
  • 21 November 2010
    • Reinstalled Ubuntu Maverick onto an old, spare 5400rpm drive
    • Wow, I had not realized until now how much local hard drive performance directly affects my development productivity!
  • 22 November 2010
    • 2nd RMA filed with Patriot
  • 23 November 2010
    • Since I was traveling when the error occurred, my backups were way out of date, and I stood to lose quite a bit of valuable, irreplaceable data
    • So I shipped the dead drive (and a working 5400rpm drive for the recovered data) to a data recovery facility specializing in SSD/Flash -- A+ Perfect Computers
  • 24 November 2010
    • I paid $245.98 for a 120GB Intel SSD on Amazon.com, which is exactly what I should have done a year earlier :-(
  • 29 November 2010
    • I paid $475 for the recovery, which was explicitly not reimbursed by Patriot
    •  If A+ Perfect Computers can recover my data, I failed to see how/why Patriot could not do the same, at their expense -- very disappointing
    • I received a phone call from a friendly, knowledgeable, Linux-savvy A+ technologist, who emailed me a few of my eCryptfs encrypted files, for my verification
    • This technologist explained how their recovery worked, at a high level, bypassing Patriot's faulty on-board controller/firmware with a working one, for the duration of the recovery
    • Note that I very much appreciated having my private data encrypted, in this case, as I'm quite literally sharing my hard drive with an untrusted 3rd party
      • Ubuntu Encrypted Home for the win!!!
  • 3 December 2010
    • I received the original, broken Patriot hard drive back from A+ Perfect Computers, as well as my 5400rpm drive with a complete copy of the recovered data
    • The recovery appeared to be perfect, up until minutes before the drive disappeared
  • 5 December 2010
    • I received my 120GB Intel SSD and installed Ubuntu Natty
  • 6 December 2010
    • I shipped the broken Patriot hard drive back to the manufacturer for replacement
  • 22 November 2010 - 3 March 2011
    • 24 emails sent or received between myself and Patriot, during which I learned:
      • 128GB Torqx was no longer manufactured
      • 120GB Inferno was the only option for a replacement
      • The Inferno was in short supply, and shipments were delayed by months
  • 10 March 2011
    • 3+ months later, finally received a replacement drive
  • 4 April 2011
    • I sold my factory sealed, brand new Inferno replacement on eBay
This whole saga has cost me several hundred dollars, between the original price I paid for the Torqx, the data recovery fee, and with the huge loss at which I sold the replacement Inferno.

However, I believe my backup scheme today is absolutely better than ever!  And perhaps more importantly, the entire Ubuntu world now has the run-one and run-this-one utilities at its disposal ;-)

:-Dustin

Monday, February 7, 2011

Update on errno, ssh-import-id, and bikeshed

If you read my post from earlier today about run-one, you might notice that I used a new source and binary package to publish the run-one utility.  This is a new practice that I'm going to use for stand-alone tools like this.

errno

It's worth mentioning that the errno utility has also moved out of ubuntu-dev-tools, at the strong request of the maintainer of ubuntu-dev-tools.  I tried (in vain) to get errno into various other packages and upstream projects, and failed in all cases.  As of Natty, you can:

 apt-get install errno

For older releases:

 sudo apt-add-repository ppa:errno/ppa
 sudo apt-get update
 sudo apt-get install errno

As a reminder, you can use errno in these ways:

 $ errno font
 EBFONT          59      /* Bad font file format */
 $ errno 36
 ENAMETOOLONG    36      /* File name too long */
 $ errno EPERM
 EPERM            1      /* Operation not permitted */


You can find the sources with:

 bzr branch lp:errno

And the launchpad project is at http://launchpad.net/errno.

ssh-import-id

Similarly, the maintainer of the openssh package in Ubuntu urged the removal of the ssh-import-id utility.  Once again, I offered the tool to the upstream openssh project, to no avail.  So ssh-import-id now lives in its own source and binary packages.  As of Natty, you can:

 apt-get install ssh-import-id


For older releases:

 sudo apt-add-repository ppa:ssh-import-id/ppa
 sudo apt-get update
 sudo apt-get install ssh-import-id


As a reminder, you can use ssh-import-id in this way:

  $ ssh-import-id kirkland smoser
 INFO: Successfully authorized [kirkland]
 INFO: Successfully authorized [smoser]

You can find sources with:

 bzr branch lp:ssh-import-id

And the launchpad project is at http://launchpad.net/ssh-import-id.
bikeshed

"So why didn't you just use bikeshed?"  Great question!  When I showed run-one to one of my colleagues, he said, "Neat, I'd use that, where can I get it?"  And I pointed him to install bikeshed, to which he responded, "Oh, well, I just want run-one, but not all the other cruft you put into bikeshed."  :-)

I tried not to be offended, but in the end, he was right.  I thought about splitting bikeshed into a series of bikeshed-$FOO binary packages.  This wasn't ideal, though, in my opinion, from the perspective of developing code or handling bugs/questions.

Thus, I've decided to create a new Launchpad project and team, and Ubuntu package for each of these stand-alone utilities.

I'll continue to use bikeshed to incubate new tools, and as soon as they're ready to stand alone, then I'll split them out to their own branch/project/team/package.

Cheers,
:-Dustin

Sunday, February 6, 2011

Introducing: run-one and run-this-one

I love cronjobs!  They wake me up in the morning, fetch my mail, backup my data, sync my mirrors, update my systems, check the health of my hardware and RAIDs, transcode my MythTV recordings, and so many other things...

The robotic precision of cron ensures that each subsequent job runs, on time, every time.

But cron doesn't check that the previous execution of that same job completed first -- and that can cause big trouble.

This often happens to me when I'm traveling and my backup cronjob fires while I'm on a slow up-link.  It's bad news when an hourly rsync takes longer than an hour to run, and my system heads down a nasty spiral, soon seeing 2 or 3 or 10 rsync's all running simultaneously.  Dang.

For this reason, I found myself putting almost all of my cronjobs in a wrapper script, managing and respecting a pid file lock according to the typical UNIX sysvinit daemon method.  Unfortunately, this led to extensively duplicated lock handling code spread across my multiple workstations and servers.

I'm proud to say, however, that I have now solved this problem on all of my servers, at least for myself, and perhaps for you too!

In Ubuntu 11.04 (Natty), you can now find a pair of utilities in the run-one package: run-one and run-this-one.

run-one

You can simply prepend the run-one utility on the beginning of any command (just like time or sudo).  The tool will calculate the md5sum $HASH of the rest of $0 and $@ (the command and its arguments), and then try to obtain a lock on a file in $HOME/.cache/$HASH using flock.  If it can obtain the lock, then your command is simply executed, releasing the lock when done.  And if not, then another copy of your command is already running, and it quietly exits non-zero.

I can now be safely assured that there will only ever be one copy of this cronjob running on my local system as $USER at a time:

  */60 * * * *   run-one rsync -azP $HOME example.com:/srv/backup

If a copy is already running, subsequent calls of the same invocation will quietly exit non-zero.

run-this-one

run-this-one is a slightly more forceful take on the same idea.  Using pgrep, it finds any matching invocations owned by the user in the process table and kills those first, then continues, behaving just as run-one (establishing the lock and executing your command).

I rely on a handful of ssh tunnels and proxies, but I often suspend and resume my laptop many times a day, which can cause those ssh connections to go stale and hang around for a while before the connection times out.  For these, I want to kill any old instances of the invocation, and then start a fresh one.

I now use this code snippet in a wrapper script to establish my ssh socks proxy, and a pair of local port forwarding tunnels for (squid and bip proxies):

  run-this-one ssh -N -C -D 1080 -L 3128:localhost:3128 \
    -L 7778:localhost:7778 example.com

Have you struggled with this before?  Do you have a more elegant solution?  Would you use run-one and/or run-this-one to solve a similar problem?

You can find the code in Launchpad/bzr here, and packages for Lucid, Maverick, and Natty in a PPA here.

 bzr branch lp:run-one
 sudo apt-add-repository ppa:run-one/ppa
 sudo apt-get update
 sudo apt-get install run-one


Cheers,
:-Dustin

Printfriendly