How to use AWS Spot Fleet to lower costs and improve reliability

Introduction

A lot of companies don’t fully utilize Spot Instances because they are worried about making their service unreliable. AWS Spot Fleet is a service that lets you create a fleet of EC2 instances that can help make your application more reliable and significantly cheaper to run. A fleet is composed of both Spot and On-Demand Instances. In this article, we will learn more about Spot Fleet, how they are different from Spot Instances and how they can be used to make your infrastructure cheaper and more reliable.

Spot Instances

Before we go deeper into Spot Fleet, it is useful to understand how Spot instances work. AWS has three tiers of EC2 instances. They are:

On-Demand Instances: This is generally the default option when you create an EC2 instance. Most of the time, when you see any EC2 pricing online, it refers to the On-Demand pricing. On-Demand instances are the most expensive because you have no commitment to AWS. You can terminate the instance at any time and only pay for how long you use the instance.
Reserved Instances (RIs): RIs are a billing construct instead of being an actual EC2 instance tier. RIs can be significantly (40-60%) cheaper than On-Demand instances. However, in return for the discount, you have to make a long term commitment (1-year or 3-year).
Spot Instances: AWS has a lot of spare compute capacity (unused EC2 instances) which they offer as Spot Instances. Spot Instances are generally the cheapest option (50%-80% lower than On-Demand) amongst the three. However, since Spot Instance availability is predicated upon the demand, Amazon provides no guarantee that once you get a Spot Instance you can run it indefinitely. This is the biggest difference between On-Demand and Spot Instances.

Spot Instances: Pros and Cons

Pro: Spot Instances are cheap: Spot instances can be almost 90% cheaper than On-Demand prices and 30-60% cheaper than Reserved Instances. This can lead to significant savings when it comes to your EC2 infrastructure expenses.

Con: Spot Instances can get terminated: AWS can terminate your Spot Instances with a 2-minute termination warning. This requires that the applications using Spot Instances need to fault-tolerant.

A lot of applications hesitate to use Spot Instances becauseusing Spot Instances can end up make the application more unreliable.

More information about spot instances are available here and here.

Spot Fleet: Cheap + Reliable

One of the main differences between Spot Instances and Spot Fleets is that for Spot Instances you place a bid for a specific instance type in a specific AZ. This requests gets fulfilled if there is enough spare capacity. However, for a Spot Fleet, you can request a combination of different instance types that can be spread across multiple AZs. This feature helps to ensure that your request will get fulfilled.

Let’s look at how Spot Fleet leverages Spot Instances to make your service both cheaper and more reliable.

You can create a new Spot Fleet, by clicking on Request Spot Instances under EC2 > Spot Requests. spot-fleet-1

Now, let’s take a look at the various options available when creating a new Spot Fleet.

First, you can choose what kind of application you are running and what the requirements are for that application. You can choose multiple instance types for the fleet and have it run in any Availability Zone. These options help make your Fleet more robust and fault-tolerant. If Amazon suddenly has a surge in demand for a particular instance type, if your application is agnostic to the type of instance you can easily fall back upon a different instance type and still be highly available.

The general recommendation here is to create a Spot Fleet which supports multiple instance types in any Availability Zone. spot-fleet-2

In this section, you can choose your AMI, Availability Zone as well as any Key pairs that you might have. spot-fleet-3

Next, we need to select how much capacity we want for our Spot Fleet. This is one of the most critical configurations for the Spot Fleet so it’s important to understand what these settings mean.

Total target capacity: The total capacity refers to either the total number of instances or vCPUs you would like to provision for your fleet. You can choose instances from different instance families and tier and you can also specify a custom weighting (next section) to choose how to compare these instances to each other.

Optional On-Demand portion: You can choose to run a subset of your Spot Fleet as On-Demand instances. This can be useful if you have already purchased RIs for some instance types and would like to use them as part of the fleet. This can also help make sure you always have a certain number of instances running at all times even if you lose all the Spot instances.

Maintain target capacity: If selected, Amazon will replenish any terminated spot instances, using another spot instance (either in a different AZ or a different instance type)

Maintain target cost for Spot: This can be used to control the maximum amount spent on the fleet per hour. If selected, your fleet could be underprovisioned.

spot-fleet-4

Next, let’s look at the Fleet settings and how to choose the optimal Fleet allocation strategy based on your requirements.

Which Instance types to use?

It is ideal to have a broad selection of instance types to improve the strength of your fleet and increase the chances of being able to fulfill the spot requests. Since different instance types have different configurations (such as vCPUs and Memory), it is important to make sure your application can work on the different instance types.

Fleet Allocation Strategy

The following choices are available for the Fleet Allocation Strategy:

Cost-optimization: In this allocation strategy, AWS would always pick the least expensive combination of instance types and Availability Zones from the list of instance types you chose above. This strategy is good is if your fleet is small, needs to run for a short time or your application can deal with some unavailability. You would end up paying the least when using this strategy.

Diversified across instance pools: In this strategy, AWS would optimize for both the cost as well as keeping the fleet as diversified as possible. You can choose how many different instance types you want to diversify your fleet across and AWS will pick the cheapest combination. This strategy is good when your fleet is larger or you need higher availability guarantees while still saving cost. E.g. if one of your instance pools gets terminated by AWS, your fleet will still keep running (at slightly lower capacity) since your fleet is diversified.

Capacity optimization: In this strategy, AWS optimizes the capacity of your fleet. AWS will optimize for the instance types which have the most availability by looking at real-time data. This strategy works well for workloads which might have a higher cost of interruption such as machine learning training workflows or data processing.

spot-fleet-5

If you want to learn more about Spot Fleet and the different ways to optimize it for your application, this guide is a good resource.

Conclusion

Spot Fleet makes it much easier to run an application in a Highly-Available manner while also being able to run it much cheaper. They provide a better experience than using Spot Instances directly since AWS does a lot of the heavy lifting of making sure your fleet always has capacity. Also, AWS doesn’t charge anything for using Spot Fleet. You only pay for the underlying resources.