From the Canyon Edge -- :-Dustin

Tuesday, November 16, 2010

Yet another Ubuntu Archive Proxy Solution (approx)

Many developers of Ubuntu find it useful to cache all (or at least some) of the Ubuntu Archive locally.

I certainly do.

I have maintained a full copy of the Ubuntu archive for the last ~3 years. Originally, I just used rsync and slapped logic around it to make sure it did the right thing. It did most of the time.

Eventually, Jonathan Davies' ubumirror project/package simplified my mirror situation, and really made it easy to filter out some of the architectures I didn't need.

Still, this required about 400GB of disc space, and quite a bit of overnight bandwidth to keep it perfectly in sync.

Earlier this year, I learned about the approx package, and it has become my new favorite proxy solution. I did look at apt-cacher-ng, but the configuration was complicated that I could figure out in 5 minutes, so if you can show me how to do exactly what I've done with approx, I'm all ears ;-) I also looked at squid-deb-proxy, but I didn't want to have to install additional packages on my clients, and I really wanted this to work well for network installations of Ubuntu servers.

Here's my solution...

To install, simply:
sudo apt-get install approx
Then set the URLs you want to proxy, in /etc/approx/approx.conf:
ubuntu http://archive.ubuntu.com/ubuntu
ubuntu-security http://security.ubuntu.com/ubuntu
I configured my proxy machine to listen on port 80:
sudo dpkg-reconfigure approx
Next, I took a little shortcut on my dd-wrt router's DNSMasq options, so that I don't have to configure to each and every one of my guests to point to my local mirror. I want that to happen automatically and transparently to my guests. So I set my router to authoritatively serve my local proxy's IP address as the resolution for archive.ubuntu.com and security.ubuntu.com. The additional DNSMasq options for me are:
address=/archive.ubuntu.com/security.ubuntu.com/10.1.1.11
where "10.1.1.11" is my proxy's static IP address.

This ensures that all of my guests transparently use my local proxy, without having to perform custom configuration on each.

Now on the proxy itself, I don't want archive.ubuntu.com to point to the localhost, as that won't work very well at all! So for that one machine, I changed its DNS to point to Google's Public DNS at 8.8.8.8.
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
Alternatively, I could manually set the IP address of archive.ubuntu.com and security.ubuntu.com in that machine's /etc/hosts.

Moreover, if I ever need to disable the use of the caching proxy on a single guest, I can simply and temporarily change that machine's DNS to 8.8.8.8 as above.

I'm really finding this to be a handy way of speeding up my network installs and package upgrades on my set of Ubuntu machines at home. I'm not wasting nearly as much disk space or network bandwidth, and I don't have to configure anything on each and every client or installation.

And now that I no longer need a 500GB local disk, I will probably move my proxy into a virtual machine very soon.

I also added a custom byobu status script to track the size of the approx cache, as well as the number of files in the cache, ~/.byobu/bin/61_approx:
#!/bin/sh
dir=/var/cache/approx
du=$(du -sh $dir | awk '{print $1}')
count=$(find $dir -type f -name "*.deb" | wc -l)
printf "Prox:%s,%s" "$du" "$count"

Cheers,
:-Dustin

5 comments:

  1. Boooooooooooo....

    I have been trying to convince mvo to have apt just automagically find zeroconfed squid-deb-proxy'es and I even got him talking to kees and it all seems to be totally doable. He's got it all finished, the question is if we want this by default.

    Any help in this area would be totally welcome. If apt just does this ootb then the network install works itself out.

    ReplyDelete
  2. Hey Jorge,

    Well, my only problem with squid-deb-proxy is that it doesn't scale to many clients, if you have to configure every client to use it.

    If you can convince the powers that be to make this JFW, I'd happily use it.

    Until that time, I'm a pretty happy camper with the solution outlined above...

    :-Dustin

    ReplyDelete
  3. The squid-deb-proxy "client" is nothing more than 5 lines of bash and one line of APT config. It's not *that* unscalable.

    ReplyDelete
  4. Sorry, Ryan, I don't mean the scalability of the code.

    I mean the scalability of "me", having to go to each and every machine on my network (or when a friend comes over and uses my network) and ask them to configure something special so that they use my proxy. I want all of that to happy transparently.

    Furthermore, it must happen just as transparently when I do a network install of Ubuntu on new hardware or a virtual machine as well.

    :-Dustin

    ReplyDelete
  5. I like the DNS trick; it's pretty clever.
    I maintain an rsynced mirror of Ubuntu (and much more) for my company & I'd really like to trim our mirror size by eliminating versions of packages only used by old releases. Do I need the version of glibc last used my Warty? No. I've yet to find a way to mirror only recent/supported releases. Sigh.

    ReplyDelete

Please do not use blog comments for support requests! Blog comments do not scale well to this effect.

Instead, please use Launchpad for Bugs and StackExchange for Questions.
* bugs.launchpad.net
* stackexchange.com

Thanks,
:-Dustin

Printfriendly