From the Canyon Edge -- :-Dustin

Sunday, February 6, 2011

Introducing: run-one and run-this-one

I love cronjobs!  They wake me up in the morning, fetch my mail, backup my data, sync my mirrors, update my systems, check the health of my hardware and RAIDs, transcode my MythTV recordings, and so many other things...

The robotic precision of cron ensures that each subsequent job runs, on time, every time.

But cron doesn't check that the previous execution of that same job completed first -- and that can cause big trouble.

This often happens to me when I'm traveling and my backup cronjob fires while I'm on a slow up-link.  It's bad news when an hourly rsync takes longer than an hour to run, and my system heads down a nasty spiral, soon seeing 2 or 3 or 10 rsync's all running simultaneously.  Dang.

For this reason, I found myself putting almost all of my cronjobs in a wrapper script, managing and respecting a pid file lock according to the typical UNIX sysvinit daemon method.  Unfortunately, this led to extensively duplicated lock handling code spread across my multiple workstations and servers.

I'm proud to say, however, that I have now solved this problem on all of my servers, at least for myself, and perhaps for you too!

In Ubuntu 11.04 (Natty), you can now find a pair of utilities in the run-one package: run-one and run-this-one.

run-one

You can simply prepend the run-one utility on the beginning of any command (just like time or sudo).  The tool will calculate the md5sum $HASH of the rest of $0 and $@ (the command and its arguments), and then try to obtain a lock on a file in $HOME/.cache/$HASH using flock.  If it can obtain the lock, then your command is simply executed, releasing the lock when done.  And if not, then another copy of your command is already running, and it quietly exits non-zero.

I can now be safely assured that there will only ever be one copy of this cronjob running on my local system as $USER at a time:

  */60 * * * *   run-one rsync -azP $HOME example.com:/srv/backup

If a copy is already running, subsequent calls of the same invocation will quietly exit non-zero.

run-this-one

run-this-one is a slightly more forceful take on the same idea.  Using pgrep, it finds any matching invocations owned by the user in the process table and kills those first, then continues, behaving just as run-one (establishing the lock and executing your command).

I rely on a handful of ssh tunnels and proxies, but I often suspend and resume my laptop many times a day, which can cause those ssh connections to go stale and hang around for a while before the connection times out.  For these, I want to kill any old instances of the invocation, and then start a fresh one.

I now use this code snippet in a wrapper script to establish my ssh socks proxy, and a pair of local port forwarding tunnels for (squid and bip proxies):

  run-this-one ssh -N -C -D 1080 -L 3128:localhost:3128 \
    -L 7778:localhost:7778 example.com

Have you struggled with this before?  Do you have a more elegant solution?  Would you use run-one and/or run-this-one to solve a similar problem?

You can find the code in Launchpad/bzr here, and packages for Lucid, Maverick, and Natty in a PPA here.

 bzr branch lp:run-one
 sudo apt-add-repository ppa:run-one/ppa
 sudo apt-get update
 sudo apt-get install run-one


Cheers,
:-Dustin

11 comments:

  1. I've had some similar problems with backups and rsync. run-one, in particular, looks like it should make my backups go much more smoothly. I look forward to taking a look at the code. Thanks!

    ReplyDelete
  2. Hi Dustin,

    Just what I was looking for. I did notice it handles the locking on files in $HOME. Would it be difficult to make it user-independant by ie optionally moving the lock files to /tmp or /var/lock or something?

    Fred

    ReplyDelete
  3. I have been saying I should write something similar myself for quite some time now :) Will give it a try!

    ReplyDelete
  4. Very cool! My (sometimes broken) offlineimap cron job thanks you.

    ReplyDelete
  5. Fred,

    This is to prevent one user from DoS'ing another. Ie, one user could prevent another from running their backup job.

    I am thinking about adding some special handling for the root user in the run-this-one tool, though.

    ReplyDelete
  6. Thanks!

    I've had trouble with offlineimap running forever if it is in-process when I suspend my machine. run-this-one seems like it would provide a better solution to doing mail-sync for me (ever 5 minutes, run a new offlineimap, killing any previous one).

    Previously, I was using a utility I made to provide a hard time limit to a command, and use 'at' to schedule a kill -9 a couple minutes down the road.

    http://gitorious.org/chrisirwin-projects/chrisirwin-meta/blobs/master/standard-files/usr/bin/time-limit

    ReplyDelete
  7. I've always used "lckdo" for this kind of behaviour before. Look forward to seeing how run-once compares

    ReplyDelete
  8. Thanks for the pointer, Adam, I was not aware of that tool.

    From the manpage, I see:
    "Now that util-linux contains a similar command named flock, lckdo is deprecated, and will be removed from some future version of moreutils."

    ReplyDelete
  9. Eeek, good to know it's on the way out I guess. Will have to add that to the mental "stuff that needs changing when we dist-upgrade" list!

    ReplyDelete
  10. Hey Dustin,

    I used to solve this with Tim Kay's solo: http://timkay.com/solo/

    which locks by opening a local port instead of writing a lockfile. I'd never thought of that :)

    ReplyDelete
  11. Perhapse something line RUN_ONE_LOCK_DIR enviroment variable could be used to tell run-one where the lockfiles are kept.

    ReplyDelete

Please do not use blog comments for support requests! Blog comments do not scale well to this effect.

Instead, please use Launchpad for Bugs and StackExchange for Questions.
* bugs.launchpad.net
* stackexchange.com

Thanks,
:-Dustin

Printfriendly