I gave a small talk to a couple of smart C-level people last week around my concept of Fluid Infrastructure.

Below are my slides with commentary:


Before we dive into the fun stuff, lets first deal with this big scary concept called scale. I fear that it's terribly misunderstood, and that most people imagine that only the hugest of tech companies would ever really need it. I think that's wrong.

For me, scale is a mindset around simple economics. Matching supply to meet demand.

I believe that traditional infrastructure teams tend to over-supply on our capacity by at least an order of magnitude (yes we do, and I will come back to this). And we tend to do this for good reasons, such as redundancy and planning for future growth. However with the availability of high quality cloud providers (such as AWS, Azure, Google, etc) the game is changing, and if we return to our simple goal of matching supply directly to demand we can swing our cost models back in our favour.

Lets dive into a bit more detail and see if we can unpack this a little.

Lets start with something that should look very familiar to everyone - a typical load average graph. This could be anything, amount of CPU, amount of visitors to our website, transactions per second processed through a back-office, whatever makes sense.

Although they are all very different - they share a couple of similarities. Lots of peaks and lots of troughs (white-space) during idle periods.

This graph presents a challenge during infrastructure planning. We need to ensure we have enough capacity to process the load, but we also want to optimize our costs during idle periods. Of course, sizing for an average capacity would be bad - we would experience bad performance during high peak loads which might affect our customer and on the flip-size sizing for our peak would give us a great customer experience at the cost of very high levels of waste.

It's clear that we need to re-think this model. And here we can quite clearly see our order of magnitude under-utilization (over-supply).

The problem gets even worse when we start to model for redundancy. Even more under-utilization is locked in as we double and triple our infrastructure.

Is it fair that we should double or triple our costs in order to achieve a simple level of redundancy? That's kind of like buying 2 extra cars, just in case your first car breaks down. Makes no sense.

A typical scaling up experience. We have an app that has 2 resources allocated (green - think hosts). This seems to deal with the demanded load OK initially.

Over time the usage of our app increases, and during a heavy business period one day we find ourselves in a position where we can't handle the load (oops #1). We quickly scramble to resolve the issue by doubling our capacity, and using our human resources we are up and running within a few days/weeks.

And for a growing business we simply repeat the process - double our capacity as our demand grows.

This scenario highlights a couple of inefficiencies:

  1. A slow response to demand
  2. Doubling capacity after each incident is too coarse, effectively doubling our costs, when really all we needed was a small burst of capacity over a short period
  3. We tend to never scale back down after these events. The damage to business, plus human pain and effort involved tends to justify the new higher costs. I've never met an infrastructure team that decided to reduce capacity back down to nominal levels once they were over the high load period

2 dimensions are evident when looking at how to optimize this situation.

  1. Deployment time (x-axis) - the time taken to respond to an increase or decrease in demand, or how quickly we can react
  2. Size of divisible unit (y-axis) - the size of each incremental adjustment to capacity (typically the infrastructure resource, such as host, network, memory, etc)

Clearly, the smaller we can make these 2 individual dimensions, the quicker we can react to demand at a more efficient cost.

Lets look at demonstrating this in our example:

Large machines, monthly response time.

Smaller machines (1 3rd the size of previous), with a weekly response time.

Already in this basic example we have saved approx 28% (represented by non-green whitespace).

Starting to look like a game of Tetris? :)

Even smaller, tiny machines (1 6th the size of original), with a shorter (daily?) response time.

We've done better than before, with a total 33% saved overall.

Very nice. A total 3rd chopped off the top of our infrastructure bill.

This stuff is also very easy - with the right kind of support from the application itself (for another blog post) this pattern can be fairly simply achieved using AWS AutoScalingGroups in EC2 with the right rules in place.

Although we have done well, we still haven’t achieved that order of magnitude saving that we know is possible, and this is very evident when we look at some other application usage examples.

It turns out that even when we reduce our divisible unit down to a tiny resource, most applications still spend most of the time idling.

Unsurprisingly, the applications that idle the most are the low volume, back-office, batch type applications which make up the bulk of most organisations. They are the most wasteful, simply because they don't have enough work to do, and therefore cannot make any use of their provisioned capacity.

The real challenge is to attempt to collapse this infrastructure down into a single fabric. Allocating a pool of resources to many applications.

Of course, if we simply went ahead and tried to implement this, we would soon realize that application dependencies get in the way (i.e. each individual application relies on the base system to have dependencies installed, such as Java, .NET, etc, and these dependencies don't always play nicely in the same sandpit).

The good news is that encapsulating of application dependencies is no longer a problem - and easily solved using containers!

Quick elevator pitch on containers:

In the old-world managing an application meant managing the virtual machine operating system. Virtual machines are big heavy monsters that take long to provision (humans), hard to move around and are generally plugged into the underlying infrastructure (well known IP addresses, file-systems, etc).

Containers on the other hand are able to stack up on top of container hosts fairly generically, they have zero dependencies on the underlying operating system and because they are extremely lightweight we can spin them up, shut them down and move them around our infrastructure very easily.

However - the most important aspect is that containers include applications and their dependencies, but zero operating system kernel. Meaning that when we move them around, they move around carrying absolutely everything they need to launch.

A simple container architecture, lots of homogeneous (identical) hosts with various heterogeneous (varied) containers deployed above.

Take a look at the combined usage (supply/demand) of deploying all 3 applications to the new single container fabric. We have managed to reduce our total usage down to 15% waste, simply by allowing our low usage apps consume in a shared resource environment. Pretty impressive.

From a scalability perspective, we have an architecture that is both vertically and horizontally flexible (up & down).

Launching additional containers within the same resource group gives us our vertical scalability - and enables us to make better use of existing resources, plus provide for redundancy.

Launching additional hosts within the resource group gives us our horizontal scalability - and enables us to grow or shrink the TOTAL capacity of the group.

A couple of summary points above that I believe can help to optimize your costs, the first 2 fairly easy to implement and the second 2 moving you closer to a fully fluid infrastructure using containers.

Would love to hear what you guys think? Drop me a comment below.