As you build a highly scaled application, one important step for deploying the application is allocating the resources necessary to operate the application. Resources can be anything from computer instances to data storage. How you allocate those resources, and how you determine what the allocation should be, matters to your application. If you allocate too few resources to an application, you can starve the application and create an availability problem. If you allocate too many resources, you can waste money by having too many resources lying around idle and unused.
This is the struggle with all highly scaled applications, and it is especially a problem for highly spiky application usage. If your application has relatively short periods of time with extremely high usage, and significantly lower usage at other times, deciding how to allocate resources efficiently can be a problem.
This is one of the key advantages of the cloud. With the cloud, you can dynamically allocate resources on an as-needed basis in order to handle these spiky needs efficiently, without leaving a significant amount of unused resources lying around during nonbusy times.
But managing cloud resources is not a simple task and takes care and consideration. Successfully managing your cloud resource allocation needs without creating waste or starvation requires knowledge of how resource allocation works in the cloud. You must understand how cloud resources are allocated, consumed, and, most importantly, charged.
Cloud resources can be divided reasonably into two categories:
· Usage-based resources
· Allocated-capacity resources
All cloud resources fall into one of these two general categories, and the process you use to manage those resources varies considerably depending on which category a resource falls into.
Let’s talk about each of these two types of resource usage categories.
Usage-Based Resources Allocation
Usage-based resources are cloud resources that are not allocated but are consumed at whatever rate your application requires. You are charged only for the amount of the resource you consume. There is no allocation that is required for the resource.
You can recognize usage-based cloud resources by the following characteristics:
· There is no allocation step involved, and hence no capacity planning is required.
· If your application needs fewer resources, you use fewer resources and your cost is lower.
· If your application needs more resources, you use more resources and your cost is higher.
· Within reason, you can scale from a very tiny amount consumed to a huge amount consumed without taking any steps to scale your application or the cloud resource it is consuming.
· The phrase “within reason” is defined entirely by the cloud provider and its abilities.
· You typically have no visibility into how the resources are allocated or scaled. It is all invisible to you.
A classic example of usage-based cloud resources is Amazon S3. With S3, you are charged for the amount of data you are storing and the amount of data you transfer. You do not need to determine ahead of time how much data storage you require or how much transfer capacity you require. Whatever amount you require (within system limits) is available to you whenever you require it, and you pay only for the amount you use.
Here are additional examples of usage-based resources:
· Azure Cloud Storage
· AWS Lambda
· Azure Functions
· Amazon Simple Email Service
These services are easy to manage and scale because no capacity planning is required. This seemingly “magic” materialization of the resources necessary for your application using a usage-based resource is one of the true benefits of the cloud. It is made possible by the multi-tenant nature of these cloud services.
Behind a service like Amazon S3 is a huge amount of disk storage and a huge number of servers. These resources are allocated as needed to individual requests from individual users. If your application has a spike in the number of requests it requires, the necessary resources are automatically allocated from a shared availability pool.
This availability pool is shared by all customers, and so it is a potentially huge pool of resources. As your application’s resource spike ebbs, another user’s application might begin to spike, and those resources are then allocated to that user’s application. This is done completely transparently.
As long as the pool of available capacity is large enough to handle all the requests and all the resource usage spikes occurring across all users, there is no starvation by any consumer. The larger the scale of the service (the more users that are using the service), the greater the ability of the cloud provider to average out the usage spikes and plan enough capacity for all the users’ needs.
LARGE CONSUMERS
This model works as long as no single user consumes a significant portion of the total resources available from the cloud provider. If a single customer is large enough to represent a significant portion of the resources made available for the service by the cloud provider, that customer can experience resource starvation during peak usage and potentially affect the capacity available to other customers as well.
For services like Amazon S3, the scale of the service is so massive that no single customer represents a significant portion of usage, and the resource allocation of S3 remains magical.1
However, even Amazon S3 has its limits. If you run an application that uses significant quantities of data transferred or stored, you can run into some of the limits S3 imposes in order to keep other users from experiencing resource starvation. As such, a large consumer of S3 resources can reach these artificial limits and experience resource starvation itself. This typically happens only if you are talking about data storage and transfer in the petabyte range.
Even if you do consume S3 resources at these huge levels, there are ways you can move your usage around to reduce the impact of the limits. Additionally, you can contact Amazon and request that these limits be increased. They will increase those limits in specific areas as you require, and these limit increases are then fed into Amazon’s capacity planning process so they can ensure that there are sufficient resources available to meet your needs and everyone else’s.
Allocated-Capacity Resource Allocation
Allocated-capacity resources are cloud resources that are allocated in discrete units. You specify how much of a specific type of resource you need, and you are given that amount. This amount is allocated to your use, and you are allocated that amount independent of what your real needs are at the moment.
Allocated-capacity cloud resources can be recognized by the following characteristics:
· They are allocated in discrete units.
· You specify how many units you want, and they are allocated for your use.
· If your application uses less of the resource, the allocated resources remain idle and unused.
· If your application needs more of the resource, the application becomes resource starved.
· Proper capacity planning is important to avoid both over- and underallocation.
The classic example of allocated-capacity cloud resources is servers, such as Amazon EC2 instances. You specify how many instances you want as well as the size of the servers, and the cloud allocates them for your use. Additionally, managed infrastructure components such as cloud databases often use an allocated capacity model. In each of these cases, you specify the number of units and their size, and the cloud provider allocates the units for your use.
Here are additional examples of allocated-capacity resources:
· Amazon RDS
· Amazon Aurora
· Azure SQL
· Amazon ElastiCache
· Amazon Elasticsearch Service
· Azure Cache
But there are other examples of allocated-capacity cloud resources that operate a bit differently—for example, Amazon DynamoDB. Using this service, you can specify how much capacity you want available for your DynamoDB tables.2 Capacity is not measured in units of servers but in units of throughput capacity units. You allocate how much capacity you want to provide to your tables, and that much capacity is available for your use to that table. If you don’t use that much capacity, the capacity goes unused. If your application uses more than the capacity you have allocated, your application will be resource starved until you allocate more capacity. As such, these capacity units are allocated and consumed in a manner very similar to servers, even though on the surface they look very different. Table 17-1 shows several major AWS allocated-capacity resource services and the units of allocation utilized by each service.
AWS service |
Capacity allocation unit |
Allocation attributes |
Amazon EC2 |
Instance-Hours |
Instance size |
Amazon RDS |
Instance-Hours |
Database size |
Amazon Aurora |
Instance-Hours |
Database size |
Amazon ElastiCache |
Instance-Hours |
Cache size |
Amazon DynamoDB |
Throughput Capacity Units |
Allocated writes |
Amazon DynamoDB |
Request Units |
Utilized writes |
Amazon DynamoDB |
GB Stored |
On-demand storage consumed |
a Data storage for DynamoDB is a usage-based resource. It’s included in this table to illustrate that a service may use multiple types of allocation mechanisms simultaneously. |
||
Table 17-1. Allocated-capacity resource services’ units of allocation |
Changing Allocations
Typically, capacity is allocated in discrete steps (a server costs a certain amount per hour; DynamoDB capacity units cost a certain amount per hour). You can change the number of servers allocated to your application or the number of capacity units allocated to your DynamoDB table, but only in discrete steps (the size of your server or the size of a capacity unit). Although there can be steps of various sizes available (such as different server sizes), you must allocate a whole number of units at a time.
It is your responsibility to ensure that you have enough capacity at hand. This might involve performing capacity planning exercises similar to those that you perform for traditional data center–based servers. You may very well allocate capacity based on expected demand and leave the number alone until you perform a review and determine that your capacity requirements have changed. This is typical of non-cloud-based server allocation but can also be used in cloud-based server allocation. However, there are other, more automated methods for changing allocation capacity.
Automated Allocation of Resource Capacity
Cloud allocation changes are easier to perform than traditional capacity changes in a data center. As such, algorithms can be used to perform your allocation automatically. For example:
On demand
You can use a static allocation and then wait until you have consumed most of your allocated capacity. At that point, you can increase your capacity allocation as needed.
Fixed schedule
You can automatically change your allocation based on a fixed schedule that matches your usage patterns. For instance, you could increase the number of servers available during heavily used daylight hours and decrease the number of servers during lesser-used nighttime hours.
Automatic (autoscaled)
You can monitor specific metrics of your resources and determine when they are heavily utilized and when they are lightly utilized. Then, based on this data, you can dynamically and automatically allocate additional resources or remove excessive resources as needed. You could build this auto scale into your application or make use of cloud-provided auto scale mechanisms, such as Amazon EC2 Auto Scaling, which automatically allocates and frees EC2 instances based on configured metrics and criteria.
Whichever mechanism you choose to determine and change capacity, it is important to note that whatever capacity you currently have allocated is all that is available to you, and you could still end up with capacity allocated (and charged) to you that is not being used. Even worse, you could find yourself resource starved because you do not have enough capacity.
Issues with Automatic Allocation
Even if you use an automated allocation scheme such as Amazon EC2 Auto Scaling to give your application additional capacity when it is needed, that does not mean that the algorithm auto scaling uses to change your capacity can notice the need fast enough before your application becomes resource starved. This is especially problematic when your resource needs are extremely spiky in nature. This phenomenon is called capacity allocation skew, and it can lead to resource starvation or idle wasted resources, even when using an automatically scaled (auto scaled) allocation method.
As an example, consider Amazon’s Elastic Load Balancer (ELB). This is a service that provides a load balancer to your application that automatically scales in size to handle whatever quantity of traffic has been sent to it. If you are receiving very little traffic, ELB will change the servers it is using for your load balancer to be smaller and fewer in number. If you are receiving a lot of traffic, ELB will automatically change the servers used for your load balancer to larger servers and put more of them into service. All of this is automatic and transparent to you as the application owner. This is how ELB is able to provide a load balancer at a very low entry price point, yet let the same load balancer scale to handle huge quantities of traffic (with a corresponding price increase), and all automatically. This saves you money when your traffic is light yet scales to your higher traffic needs when necessary.
However, there are places where the specifics of how this ELB automated allocation mechanism becomes visible in a negative way. If you receive a sudden spike in traffic, say, because your site suddenly goes viral due to a social media campaign, your load balancer might not be able to resize itself fast enough. The result? For a period of time after the traffic increase starts, your load balancer might be resource starved, causing page requests to be slow or to fail, creating a poor user experience. This situation will automatically correct as ELB determines your increased capacity needs and scales your load balancer up to larger servers and more of them. This scaling, though, can take a few minutes to complete. In the meantime, your users are having a poor experience, and availability suffers.
To combat this effect, Amazon lets you contact representatives and warn them of a coming change in traffic use patterns, allowing them to prewarm your load balancer.3 This process of prewarming effectively scales your load balancer to use larger servers (and more of them) early, before the traffic spike occurs. This prewarming process, however, works only if you know you will experience a sudden rise in traffic. It doesn’t help at all if the traffic spike is sudden or unexpected.
Dynamic Allocation, Dynamic Cost
You typically can change your allocated capacity as often as you want,4 increasing and decreasing it as your needs require.
This is one of the advantages of the cloud. If you need five hundred servers one hour and only two hundred the next hour, you are charged for five hundred servers for one hour and only for two hundred servers for the next hour. It’s clean and simple.
However, because of this essentially infinite flexibility in the amount of capacity you can allocate, you typically pay a premium price for these resources. Flexibility costs money.
But what if your needs are more stable? What if you will always need at least two hundred servers allocated? Why pay for the ability to be flexible in the number of servers you need on an hour-by-hour basis when your needs are much more stable and fixed?
Reserved capacity
This is where reserved capacity comes into play. Reserved capacity is the ability for you to commit to your cloud provider up front that you will consume a certain quantity of resources for a period of time (such as one to three years). In exchange, you receive a favorable rate for those resources.
Reserved capacity does not limit your flexibility in allocating resources; it only guarantees to your cloud provider that you will consume a certain quantity of resources.
Suppose, for example, that you have an application that requires two hundred servers continuously, but sometimes your traffic spikes so that you need to have up to five hundred servers allocated at times. You can use auto scaling to automatically adjust the number of servers dynamically. Your usage in servers, therefore, varies from a minimum of two hundred servers to a maximum of five hundred servers.
Because you will always be using at least two hundred servers, you can purchase two hundred servers’ worth of reserved capacity. Let’s say you purchase two hundred servers for one full year. You will pay a lower rate for those two hundred servers, but you will be paying for those servers all the time. That’s fine, because you are using them all the time.
For the additional three hundred servers, you can pay the normal (higher) hourly rate, and you pay only for the time you are using those servers.
Reserved capacity provides a way for you to receive capacity at a lower cost in exchange for committed allocation of those resources.5
Pros and Cons of Usage-Based Versus Allocated-Capacity
As outlined in Table 17-2, usage-based resource allocation methods and allocated-capacity resource allocation methods have some advantages and disadvantages.
Allocated-capacity |
Usage-based |
|
Service examples (Amazon AWS) |
EC2, ELB, RDS, DynamoDB, Azure SQL, Azure Servers |
S3, Lambda, SES, SQS, SNS, Azure Functions |
Requires capacity planning |
Yes |
No |
Charges based on |
Capacity allocated |
Capacity consumed |
What happens when underutilized |
Capacity is idle (wasted) |
N/A |
What happens when overutilized |
Application is starved (not enough capacity, potential availability outage) |
N/A |
Can capacity be reserved to save money? |
Yes |
No |
How can capacity be scaled? |
Manual or automated allocation change controlled by you |
N/A |
How are usage spikes handled? |
Potential usage starvation during spike or capacity ramp-up |
Automatic and transparent |
What happens with excess capacity? |
Excess capacity goes unused |
Used by other customers |
Table 17-2. Cloud resource allocation comparison |
The allocated-capacity method requires forward-based capacity planning, while the usage-based method does not. With allocated-capacity resource allocation, you are charged based on how much capacity you have requested rather than on how much you are actually consuming. This means that you may end up with wasted capacity, or you can resource-starve your application.
1 According to the most recent published Amazon data I could find, in 2013 S3 stored two trillion objects. That’s five objects for every star in the Milky Way. (See “Amazon S3–Two Trillion Objects, 1.1 Million Requests/Second”.)
2 DynamoDB also supports an on-demand pricing model, which behaves more like a usage-based resource.
3 For more information, see “Best Practices in Evaluating Elastic Load Balancing” in the AWS ELB documentation.
4 There are sometimes restrictions, such as on DynamoDB, for which there are limitations to how often you can change capacity.
5 Using reserved capacity also guarantees that the specific type of instance will be available in your specific desired availability zone, when you want it. Without having reserved capacity, it is possible that you could request a specific type of instance in a specific availability zone, and AWS would not be able to honor the request.