The robotic precision of cron ensures that each subsequent job runs, on time, every time.
But cron doesn't check that the previous execution of that same job completed first -- and that can cause big trouble.
This often happens to me when I'm traveling and my backup cronjob fires while I'm on a slow up-link. It's bad news when an hourly rsync takes longer than an hour to run, and my system heads down a nasty spiral, soon seeing 2 or 3 or 10 rsync's all running simultaneously. Dang.
For this reason, I found myself putting almost all of my cronjobs in a wrapper script, managing and respecting a pid file lock according to the typical UNIX sysvinit daemon method. Unfortunately, this led to extensively duplicated lock handling code spread across my multiple workstations and servers.
I'm proud to say, however, that I have now solved this problem on all of my servers, at least for myself, and perhaps for you too!
In Ubuntu 11.04 (Natty), you can now find a pair of utilities in the run-one package: run-one and run-this-one.
You can simply prepend the run-one utility on the beginning of any command (just like time or sudo). The tool will calculate the md5sum $HASH of the rest of $0 and $@ (the command and its arguments), and then try to obtain a lock on a file in $HOME/.cache/$HASH using flock. If it can obtain the lock, then your command is simply executed, releasing the lock when done. And if not, then another copy of your command is already running, and it quietly exits non-zero.
I can now be safely assured that there will only ever be one copy of this cronjob running on my local system as $USER at a time:
If a copy is already running, subsequent calls of the same invocation will quietly exit non-zero.
run-this-one is a slightly more forceful take on the same idea. Using pgrep, it finds any matching invocations owned by the user in the process table and kills those first, then continues, behaving just as run-one (establishing the lock and executing your command).
I rely on a handful of ssh tunnels and proxies, but I often suspend and resume my laptop many times a day, which can cause those ssh connections to go stale and hang around for a while before the connection times out. For these, I want to kill any old instances of the invocation, and then start a fresh one.
I now use this code snippet in a wrapper script to establish my ssh socks proxy, and a pair of local port forwarding tunnels for (squid and bip proxies):
Have you struggled with this before? Do you have a more elegant solution? Would you use run-one and/or run-this-one to solve a similar problem?
You can find the code in Launchpad/bzr here, and packages for Lucid, Maverick, and Natty in a PPA here.