From the Canyon Edge: A Statistical Analysis of Potential PowerNap Energy Savings

Tuesday, August 18, 2009

A Statistical Analysis of Potential PowerNap Energy Savings

I was asked recently how much energy savings administrators might actually expect from an Ubuntu Enterprise Cloud, powered by Eucalyptus and using PowerNap.

I have blogged before about how much I enjoy mathematics and statistical analysis, and this is in fact another exciting, hard question!

The question was presented as the following hypothetical...

Given 10 quad-core servers in a cloud running 40 virtual machines. 20 virtual machines are simultaneously discarded. What power savings would you expect from PowerNap?

This is actually a rather complex combinatorics problem. I'll restate it as such:

Given 10 buckets, each of which can hold 0 - 4 items, how many ways can 20 items be distributed among those 10 buckets? Furthermore, what is the statistical distribution of empty buckets?

It's been 10 years since my university statistics classes, and my brain melted trying to derive the permutation formula. But a big thanks to my good friend Matt Dobson for solving that! If you're interested, you can view his combinatorics formula here.

It took me about an hour to hack a Python script that could empirically solve this problem by brute force, generating a comprehensive list of all of the permutations. You can now download and run /usr/bin/powernap_calculator provided by the powernap package in Karmic.

Here are the results running on the parameters above:

$ time powernap_calculator --hosts 10 --guests-per-host 4 --guests 20
Calculating...99%

In a cloud with [10] hosts, which can handle [4] guests-per-host, currently running [20] guests,
you may expect the following:

[ 5.2%] likely that [0/10] of your hosts would powernap, for a [0%] power savings
[ 27.5%] likely that [1/10] of your hosts would powernap, for a [10%] power savings
[ 42.5%] likely that [2/10] of your hosts would powernap, for a [20%] power savings
[ 21.8%] likely that [3/10] of your hosts would powernap, for a [30%] power savings
[ 2.9%] likely that [4/10] of your hosts would powernap, for a [40%] power savings
[ <1%] likely that [5/10] of your hosts would powernap, for a [50%] power savings

The overall expected value is [19.0%] power savings.

real 0m46.919s
user 0m46.227s
sys 0m0.276s

So at this snapshot in time, when your cloud suddenly dropped to 50% utilization, the expected value is a 19% power savings. Note that expected value is a very specific statistics term. Wikipedia says:

In probability theory and statistics, the expected value (or expectation value, or mathematical expectation, or mean, or first moment) of a random variable is the integral of the random variable with respect to its probability measure. For discrete random variables this is equivalent to the probability-weighted sum of the possible values, and for continuous random variables with a density function it is the probability density-weighted integral of the possible values.

Right, so we might expect about a 19% power savings for this random moment in time, when we simultaneously reduce our utilization by 50%.

However, if we observe the cloud over time, and with perhaps more realistic usage patterns, the distribution should be much better than random. VMs will come and go at more staggered intervals than simultaneous destruction of half the instances.

And you will have another tremendous factor working in your favor -- Eucalyptus' greedy scheduling algorithm. This ensures that each time a new VM is launched, it will land on a system that's already running. This is known as an annealing system -- one that's constantly correcting itself -- since under-utilized systems will automatically powernap, and new VMs will fill in the gaps on running hardware. Pretty cool, I think.

So I'm curious...

What does (or would) your cloud look like?
How many -h|--hosts, -p|--guests-per-host, and -g|--guests do you usually have?
What does powernap_calculator say about your parameters?
Post your results!

Note that the powernap_calculator algorithm is exponential, O((p+1)^h), so large values of (p, h) will take a very long time to compute! I'm totally open to code review of powernap_calculator and any performance enhancements ;-)

In my cloud, I have 8 dual-core hosts. I typically limit my usage to 2 guests-per-host. And I rarely run more than 4 VMs at a time.

$ powernap_calculator -h 8 -p 2 -g 4
In a cloud with [8] hosts, which can handle [2] guests-per-host, currently running [4] guests,
you may expect the following power savings:
[ 26.3%] likely that [4/8] of your hosts would powernap, for a [50%] power savings
[ 63.2%] likely that [5/8] of your hosts would powernap, for a [62%] power savings
[ 10.5%] likely that [6/8] of your hosts would powernap, for a [75%] power savings
The overall expected value is [60.5%] power savings.

My servers run at about 80 Watts fully loaded. My electricity is about $0.10 per kilowatt-hour. So a year's worth of electricity with all 8 servers running all of the time is 8 * .08 KW * 24 hr/day * 365 days/year * $0.10/KW-hr = $560/year.

I like the prospects of saving approximately 60.5% of that, or $339/year!

So how does the calculator work?

Let's use a slightly smaller example: hosts,h=4, guests-per-host,p=2, guests,g=3

Since each system can hold 0, 1, or 2 virtual machines, we're going to use a base-3 number (which is p+1). And we're going to generate all possible 4-digit base-3 numbers (which is (p+1)^h). In our case, that's 3⁴ or 81 scenarios to consider. For each of those scenarios, we convert the decimal integer to a list of each of the digits 0-2, and sum the list, throwing out any "invalid scenarios", where the sum does not add up to the target number of guests, 3, g. Our valid scenarios is actually much smaller, the following 16:

[2, 1, 0, 0]
[1, 2, 0, 0]
[2, 0, 1, 0]
[1, 1, 1, 0]
[0, 2, 1, 0]
[1, 0, 2, 0]
[0, 1, 2, 0]
[2, 0, 0, 1]
[1, 1, 0, 1]
[0, 2, 0, 1]
[1, 0, 1, 1]
[0, 1, 1, 1]
[0, 0, 2, 1]
[1, 0, 0, 2]
[0, 1, 0, 2]
[0, 0, 1, 2]

Now, of these, we are interested in how many 0's are in each row, as this indicates a host that has no guests, and can therefore be powernapped.

Our calculations yield: [0, 4, 12, 0, 0], or:

0 possible scenarios have no unused hosts
4 possible scenarios have 1 unused hosts
12 possible scenarios have 2 unused hosts
0 possible scenarios have 3 unused hosts
0 possible scenarios have 4 unused hosts

From this we can generate the following probabilities:

[ 25.0%] likely that [1/4] of your hosts would powernap, for a [25%] power savings
[ 75.0%] likely that [2/4] of your hosts would powernap, for a [50%] power savings

And weighting these probabilities, we can generate the expected value:

25%*.25 + 50%*.75 = 43.75%
The overall expected value is [43.8%] power savings.

Comments appreciated!

:-Dustin

3 comments:

Jef SpaletaAugust 20, 2009 at 10:58 AM
Quick once over and it looks pretty reasonable.

If you wanted to geek out, you could probably model the expected savings in a dynamic system using an exponential probability distribution function for the times between virtual machine creation and again for virtual machine termination. You could even probably run a few simulations of a dynamic cloud using those probability functions in pretty short order. This would give you an idea of how the greedy algorithm interacts with the powernap over some time window.
ReplyDelete
Replies
Dustin KirklandAugust 20, 2009 at 11:01 AM
Jef-

I've been thinking about this. What do you propose are the parameters of such a model?

Hosts, Guests-per-Host, Average-Guest-Life, "Churn", etc.?

I'm having trouble quantifying the configurables.

:-Dustin
ReplyDelete
Replies
Jef SpaletaAugust 20, 2009 at 3:45 PM
Parameters:
1) Number of hosts
2) Number of clients per host
3) Time period of observation: ex. one week,one month.
4) Probability distribution for wait time since last
client created -- first moment is average period between client births, so equivalent to an average birth rate.
5) Probability distribution for lifetime of active host. First moment is is average lifetime, sort of equivalent to death rate..but its complicated because its implicitly depends on parameter 3).

So pick sane distributions for 4 and 5. Use some intuition concerning expected work load...just make sure the limits are sane. If you have any real world logs you can use those logs to model these distributions. Don't worry too much about it for the first sims. Once these are in place, you use them to randomly pick client lifetimes and connection waits to let the number of clients dynamically grow instead of enforcing a number of clients.

Start with 10 hosts, with 4 clients per host...run multiple sims for the specified time window and build up stats on the behavior.

Increment the number of hosts and repeat. Keep doing that. Hopefully you'll see the power savings reach an obvious limiting trend after a few hosts.

Go back increment the time window and do it again. Hopefully you'll see the power savings reach an obvious limiting trend as a function of time window for long times.

Go back and choose 3 and 5 clients per host. See if the limits have changed much. Hopefully they haven't.

Go back and do and make a small tweak to the distributions and see if it perturbs the long time, many hosts limiting trends. Hopefully small changes in workload are not magnified in the power savings limits...that would indicate highly nonlinear behavior..and generalized observations would seldom be a good measure for a particular workload. I'm not expecting to see this as there's nothing obvious in setup that would drive a feedback loop.

The key is running "enough" sims at the same host number settings using those probability distributions. You get some mean and variance stats on your power savings.

-jef
ReplyDelete
Replies

Add comment

Please do not use blog comments for support requests! Blog comments do not scale well to this effect.

Instead, please use Launchpad for Bugs and StackExchange for Questions.
* bugs.launchpad.net
* stackexchange.com

Thanks,
:-Dustin

About the Author

Dustin Kirkland (Twitter, LinkedIn) is an engineer at heart, with a penchant for reducing complexity and solving problems at the cross-sections of technology, business, and people.

With a degree in computer engineering from Texas A&M University (2001), his full-time career began as a software engineer at IBM in the Linux Technology Center working on the Linux kernel and security certifications, including a one-year stint as an dedicated engineer-in-residence at Red Hat in Boston (2005). Dustin was awarded the title Master Inventor at IBM, in recognition of his prolific patent work as an inventor and reviewer with IBM's intellectual property attorneys.

Dustin then first joined Canonical (2008) as an engineer (eventually, engineering manager), helping create the Ubuntu Server distribution and establishing Ubuntu as the overwhelming favorite Linux distribution in Amazon, Google, and Microsoft's cloud platforms, as well as authoring and maintaining dozens of new open source packages.

Dustin joined Gazzang (2011), a venture-backed start-up built around an open source project that he co-authored (eCryptFS), as Chief Technology Officer, and helped dozens of enterprise customers encrypt their data at rest and securely manage their keys. Gazzang was acquired by Cloudera (2014).

Having effectively monetized eCryptFS as an open source project at Gazzang, Dustin returned to Canonical (2013) as the VP of Product for Ubuntu and spent the next several years launching a portfolio of products and services (Ubuntu Advantage, Extended Security Maintenance, Canonical Livepatch, MAAS, OpenStack, Kubernetes) that continues to deliver considerable annual recurring revenue. With Canonical based in London, an 800+ work-from-home employee roster and customers spread across 40+ countries, Dustin traveled the world over, connecting with clients and colleagues steeped in rich cultural experiences.

Google Cloud (2018) recruited Dustin from Canonical to product manage Google's entrance into on-premises data centers with its GKE On-Prem (now, Anthos) offering, with a specific focus on the underlying operating system, hypervisor, and container security. This work afforded Dustin a view deep into the back end data center of many financial services companies, where he still sees tremendous opportunities for improvements in security, efficiencies, cost-reduction, and disruptive new technology adoption.

Seeking a growth-mode opportunity in the fintech sector, Dustin joined Apex Clearing (now, Apex Fintech Solutions) as the Chief Product Officer (2019), where he led several organizations including product management, field engineering, data science, and business partnerships. He drastically revamped Apex's product portfolio and product management processes, retooling away from a legacy "clearing house and custodian", and into a "software-as-a-service fintech" offering instant brokerage account opening, real-time fractional stock trading, a secure closed-network crypto solution, and led the acquisition and integration of Silver's tax and cost basis solution.

Drawn back into a large cap, Dustin joined Goldman Sachs (2021) as a Managing Director and Head of Platform Product Management, within the Consumer banking division, which included Marcus, and the Apple and GM credit cards. He built a cross-functional product management community and established numerous documented product management best practices, processes, and anti-patterns.

Dustin lives in Austin, Texas, with his wife Kim and their wonderful two daughters.

Tuesday, August 18, 2009

A Statistical Analysis of Potential PowerNap Energy Savings

3 comments:

Printfriendly

About the Author

Blog Archive

Github Activity

Google Plus

Twitter

StackExchange

Solar Output

Labels