From the Canyon Edge -- :-Dustin

Tuesday, January 19, 2016

Data Driven Analysis: /tmp on tmpfs

tl;dr

  • Put /tmp on tmpfs and you'll improve your Linux system's I/O, reduce your carbon foot print and electricity usage, stretch the battery life of your laptop, extend the longevity of your SSDs, and provide stronger security.
  • In fact, we should do that by default on Ubuntu servers and cloud images.
  • Having tested 502 physical and virtual servers in production at Canonical, 96.6% of them could immediately fit all of /tmp in half of the free memory available and 99.2% could fit all of /tmp in (free memory + free swap).

Try /tmp on tmpfs Yourself

$ echo "tmpfs /tmp tmpfs rw,nosuid,nodev" | sudo tee -a /etc/fstab
$ sudo reboot

Background

In April 2009, I proposed putting /tmp on tmpfs (an in memory filesystem) on Ubuntu servers by default -- under certain conditions, like, well, having enough memory. The proposal was "approved", but got hung up for various reasons.  Now, again in 2016, I proposed the same improvement to Ubuntu here in a bug, and there's a lively discussion on the ubuntu-cloud and ubuntu-devel mailing lists.

The benefits of /tmp on tmpfs are:
  • Performance: reads, writes, and seeks are insanely fast in a tmpfs; as fast as accessing RAM
  • Security: data leaks to disk are prevented (especially when swap is disabled), and since /tmp is its own mount point, we should add the nosuid and nodev options (and motivated sysadmins could add noexec, if they desire).
  • Energy efficiency: disk wake-ups are avoided
  • Reliability: fewer NAND writes to SSD disks
In the interest of transparency, I'll summarize the downsides:
  • There's sometimes less space available in memory, than in your root filesystem where /tmp may traditionally reside
  • Writing to tmpfs could evict other information from memory to make space
You can learn more about Linux tmpfs here.

Not Exactly Uncharted Territory...

Fedora proposed and implemented this in Fedora 18 a few years ago, citing that Solaris has been doing this since 1994. I just installed Fedora 23 into a VM and confirmed that /tmp is a tmpfs in the default installation, and ArchLinux does the same. Debian debated doing so, in this thread, which starts with all the reasons not to put /tmp on a tmpfs; do make sure you read the whole thread, though, and digest both the pros and cons, as both are represented throughout the thread.

Full Data Treatment

In the current thread on ubuntu-cloud and ubuntu-devel, I was asked for some "real data"...

In fact, across the many debates for and against this feature in Ubuntu, Debian, Fedora, ArchLinux, and others, there is plenty of supposition, conjecture, guesswork, and presumption.  But seeing as we're talking about data, let's look at some real data!

Here's an analysis of a (non-exhaustive) set of 502 of Canonical's production servers that run Ubuntu.com, Launchpad.net, and hundreds of related services, including OpenStack, dozens of websites, code hosting, databases, and more. These servers sampled are slightly biased with more physical machines than virtual machines, but both are present in the survey, and a wide variety of uptime is represented, from less than a day of uptime, to 1306 days of uptime (with live patched kernels, of course).  Note that this is not an exhaustive survey of all servers at Canonical.

I humbly invite further study and analysis of the raw, tab-separated data, which you can find at:
The column headers are:
  • Column 1: The host names have been anonymized to sequential index numbers
  • Column 2: `du -s /tmp` disk usage of /tmp as of 2016-01-17 (ie, this is one snapshot in time)
  • Column 3-8: The output of the `free` command, memory in KB for each server
  • Column 9-11: The output of the `free` command, sway in KB for each server
  • Column 12: The number of inodes in /tmp
I have imported it into a Google Spreadsheet to do some data treatment. You're welcome to do the same, or use the spreadsheet of your choice.

For the numbers below, 1 MB = 1000 KB, and 1 GB = 1000 MB, per Wikipedia. (Let's argue MB and MiB elsewhere, shall we?)  The mean is the arithmetic average.  The median is the middle value in a sorted list of numbers.  The mode is the number that occurs most often.  If you're confused, this article might help.  All calculations are accurate to at least 2 significant digits.

Statistical summary of /tmp usage:

  • Max: 101 GB
  • Min: 4.0 KB
  • Mean: 453 MB
  • Median: 16 KB
  • Mode: 4.0 KB
Looking at all 502 servers, there are two extreme outliers in terms of /tmp usage. One server has 101 GB of data in /tmp, and the other has 42 GB. The latter is a very noisy django.log. There are 4 more severs using between 10 GB and 12 GB of /tmp. The remaining 496 severs surveyed (98.8%) are using less than 4.8 GB of /tmp. In fact, 483 of the servers surveyed (96.2%) use less than 1 GB of /tmp. 454 of the servers surveyed (90.4%) use less than 100 MB of /tmp. 414 of the servers surveyed (82.5%) use less than 10 MB of /tmp. And actually, 370 of the servers surveyed (73.7%) -- the overwhelming majority -- use less than 1MB of /tmp.

Statistical summary of total memory available:

  • Max: 255 GB
  • Min: 1.0 GB
  • Mean: 24 GB
  • Median: 10.2 GB
  • Mode: 4.1 GB
All of the machines surveyed (100%) have at least 1 GB of RAM.  495 of the machines surveyed (98.6%) have at least 2GB of RAM.   437 of the machines surveyed (87%) have at least 4 GB of RAM.   255 of the machines surveyed (50.8%) have at least 10GB of RAM.    157 of the machines surveyed (31.3%) have more than 24 GB of RAM.  74 of the machines surveyed (14.7%) have at least 64 GB of RAM.

Statistical summary of total swap available:

  • Max: 201 GB
  • Min: 0.0 KB
  • Mean: 13 GB
  • Median: 6.3 GB
  • Mode: 2.96 GB
485 of the machines surveyed (96.6%) have at least some swap enabled, while 17 of the machines surveyed (3.4%) have zero swap configured. One of these swap-less machines is using 415 MB of /tmp; that machine happens to have 32 GB of RAM. All of the rest of the swap-less machines are using between 4 KB and 52 KB (inconsequential) /tmp, and have between 2 GB and 28 GB of RAM.  5 machines (1.0%) have over 100 GB of swap space.

Statistical summary of swap usage:

  • Max: 19 GB
  • Min: 0.0 KB
  • Mean: 657 MB
  • Median: 18 MB
  • Mode: 0.0 KB
476 of the machines surveyed (94.8%) are using less than 4 GB of swap. 463 of the machines surveyed (92.2%) are using less than 1 GB of swap. And 366 of the machines surveyed (72.9%) are using less than 100 MB of swap.  There are 18 "swappy" machines (3.6%), using 10 GB or more swap.

Modeling /tmp on tmpfs usage

Next, I took the total memory (RAM) in each machine, and divided it in half which is the default allocation to /tmp on tmpfs, and subtracted the total /tmp usage on each system, to determine "if" all of that system's /tmp could actually fit into its tmpfs using free memory alone (ie, without swap or without evicting anything from memory).

485 of the machines surveyed (96.6%) could store all of their /tmp in a tmpfs, in free memory alone -- i.e. without evicting anything from cache.

Now, if we take each machine, and sum each system's "Free memory" and "Free swap", and check its /tmp usage, we'll see that 498 of the systems surveyed (99.2%) could store the entire contents of /tmp in tmpfs free memory + swap available. The remaining 4 are our extreme outliers identified earlier, with /tmp usages of [101 GB, 42 GB, 13 GB, 10 GB].

Performance of tmpfs versus ext4-on-SSD

Finally, let's look at some raw (albeit rough) read and write performance numbers, using a simple dd model.

My /tmp is on a tmpfs:
kirkland@x250:/tmp⟫ df -h .
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           7.7G  2.6M  7.7G   1% /tmp

Let's write 2 GB of data:
kirkland@x250:/tmp⟫ dd if=/dev/zero of=/tmp/zero bs=2G count=1
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB) copied, 1.56469 s, 1.4 GB/s

And let's write it completely synchronously:
kirkland@x250:/tmp⟫ dd if=/dev/zero of=./zero bs=2G count=1 oflag=dsync
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB) copied, 2.47235 s, 869 MB/s

Let's try the same thing to my Intel SSD:
kirkland@x250:/local⟫ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/dm-0       217G  106G  100G  52% /

And write 2 GB of data:
kirkland@x250:/local⟫ dd if=/dev/zero of=./zero bs=2G count=1
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB) copied, 7.52918 s, 285 MB/s

And let's redo it completely synchronously:
kirkland@x250:/local⟫ dd if=/dev/zero of=./zero bs=2G count=1 oflag=dsync
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB) copied, 11.9599 s, 180 MB/s

Let's go back and read the tmpfs data:
kirkland@x250:~⟫ dd if=/tmp/zero of=/dev/null bs=2G count=1
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB) copied, 1.94799 s, 1.1 GB/s

And let's read the SSD data:
kirkland@x250:~⟫ dd if=/local/zero of=/dev/null bs=2G count=1
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB) copied, 2.55302 s, 841 MB/s

Now, let's create 10,000 small files (1 KB) in tmpfs:
kirkland@x250:/tmp/foo⟫ time for i in $(seq 1 10000); do dd if=/dev/zero of=$i bs=1K count=1 oflag=dsync ; done
real    0m15.518s
user    0m1.592s
sys     0m7.596s

And let's do the same on the SSD:
kirkland@x250:/local/foo⟫ time for i in $(seq 1 10000); do dd if=/dev/zero of=$i bs=1K count=1 oflag=dsync ; done
real    0m26.713s
user    0m2.928s
sys     0m7.540s

For better or worse, I don't have any spinning disks, so I couldn't repeat the tests there.

So on these rudimentary read/write tests via dd, I got 869 MB/s - 1.4 GB/s write to tmpfs and 1.1 GB/s read from tmps, and 180 MB/s - 285 MB/s write to SSD and 841 MB/s read from SSD.

Surely there are more scientific ways of measuring I/O to tmpfs and physical storage, but I'm confident that, by any measure, you'll find tmpfs extremely fast when tested against even the fastest disks and filesystems.

Summary

  • /tmp usage
    • 98.8% of the servers surveyed use less than 4.8 GB of /tmp
    • 96.2% use less than 1.0 GB of /tmp
    • 73.7% use less than 1.0 MB of /tmp
    • The mean/median/mode are [453 MB / 16 KB / 4 KB]
  • Total memory available
    • 98.6% of the servers surveyed have at least 2.0 GB of RAM
    • 88.0% have least 4.0 GB of RAM
    • 57.4% have at least 8.0 GB of RAM
    • The mean/median/mode are [24 GB / 10 GB / 4 GB]
  • Swap available
    • 96.6% of the servers surveyed have some swap space available
    • The mean/median/mode are [13 GB / 6.3 GB / 3 GB]
  • Swap used
    • 94.8% of the servers surveyed are using less than 4 GB of swap
    • 92.2% are using less than 1 GB of swap
    • 72.9% are using less than 100 MB of swap
    • The mean/median/mode are [657 MB / 18 MB / 0 KB]
  • Modeling /tmp on tmpfs
    • 96.6% of the machines surveyed could store all of the data they currently have stored in /tmp, in free memory alone, without evicting anything from cache
    • 99.2% of the machines surveyed could store all of the data they currently have stored in /tmp in free memory + free swap
    • 4 of the 502 machines surveyed (0.8%) would need special handling, reconfiguration, or more swap

Conclusion


  • Can /tmp be mounted as a tmpfs always, everywhere?
    • No, we did identify a few systems (4 out of 502 surveyed, 0.8% of total) consuming inordinately large amounts of data in /tmp (101 GB, 42 GB), and with insufficient available memory and/or swap.
    • But those were very much the exception, not the rule.  In fact, 96.6% of the systems surveyed could fit all of /tmp in half of the freely available memory in the system.
  • Is this the first time anyone has suggested or tried this as a Linux/UNIX system default?
    • Not even remotely.  Solaris has used tmpfs for /tmp for 22 years, and Fedora and ArchLinux for at least the last 4 years.
  • Is tmpfs really that much faster, more efficient, more secure?
    • Damn skippy.  Try it yourself!
:-Dustin

Printfriendly