In this article by Yohan Wadia, the author of the book AWS Administration – The Definitive Guide, we are going continue where we last dropped off and introduce an amazing and awesome concept called as Auto Scaling! AWS has been one of the first Public Cloud providers to provide this feature and really it is something that you must try out and use in your environments! This chapter will teach you the basics of Auto Scaling, its concepts and terminologies, and even how to create an auto scaled environment using AWS. It will also cover Amazon Elastic Load Balancers and how you can use them in conjuncture with Auto Scaling to manage your applications more effectively! So without wasting any more time, let’s first get started by understanding what Auto Scaling is and how it actually works!
(For more resources related to this topic, see here.)
An overview of Auto Scaling
We have been talking about AWS and the concept of dynamic scalability a.k.a. Elasticity in general throughout this book; well now is the best time to look at it in depth with the help of Auto Scaling!
Auto Scaling basically enables you to scale your compute capacity (EC2 instances) either up or down, depending on the conditions you specify. These conditions could be as simple as a number that maintains the count of your EC2 instances at any given time, or even complex conditions that measures the load and performance of your instances such as CPU utilization, memory utilization, and so on. But a simple question that may arise here is why do I even need Auto Scaling? Is it really that important? Let’s look at a dummy application’s load and performance graph to get a better understanding of things, let’s take a look at the following screenshot:
The graph to the left depicts the traditional approach that is usually taken to map an application’s performance requirements with a fixed infrastructure capacity. Now to meet this application’s unpredictable performance requirement, you would have to plan and procure additional hardware upfront, as depicted by the red line. And since there is no guaranteed way to plan for unpredictable workloads, you generally end up procuring more than you need. This is a standard approach employed by many businesses and it doesn’t come without its own sets of problems. For example, the region highlighted in red is when most of the procured hardware capacity is idle and wasted as the application simply does not have that high a requirement. Whereas there can be cases as well where the procured hardware simply did not match the application’s high performance requirements, as shown by the green region. All these issues, in turn, have an impact on your business, which frankly can prove to be quite expensive. That’s where the elasticity of a Cloud comes into play. Rather than procuring at the nth hour and ending up with wasted resources, you grow and shrink your resources dynamically as per your application’s requirements, as depicted in the graph on the right. This not only helps you in saving overall costs but also makes your application’s management a lot more easy and efficient. And don’t worry if your application does not have an unpredictable load pattern! Auto Scaling is designed to work with both predictable and unpredictable workloads so that no matter what application you may have, you can also be rest assured that the required compute capacity is always going to be made available for use when required. Keeping that in mind, let us summarize some of the benefits that AWS Auto Scaling provides:
- Cost Savings: By far the biggest advantage provided by Auto Scaling, you can actually gain a lot of control over the deployment of your instances as well as costs by launching instances only when they are needed and terminating them when they aren’t required.
- Ease of Use: AWS provides a variety of tools using which you can create and manage your Auto Scaling such as the AWS CLI and even using the EC2 Management Dashboard. Auto Scaling can be programmatically created and managed via a simple and easy to use web service API as well.
- Scheduled Scaling Actions: Apart from scaling instances as per a given policy, you can additionally even schedule scaling actions that can be executed in the future. This type of scaling comes in handy when your application’s workload patterns are predictable and well known in advance.
- Geographic Redundancy and Scalability: AWS Auto Scaling enables you to scale, distribute, as well as load balance your application automatically across multiple Availability Zones within a given region.
- Easier Maintenance and Fault Tolerance: AWS Auto Scaling replaces unhealthy instances automatically based on predefined alarms and threshold.
With these basics in mind, let us understand how Auto Scaling actually works out in AWS.
Auto scaling components
To get started with Auto Scaling on AWS, you will be required to work with three primary components, each described briefly as follows.
Auto scaling group
An Auto Scaling Group is a core component of the Auto Scaling service. It is basically a logical grouping of instances that share some common scaling characteristics between them. For example, a web application can contain a set of web server instances that can form one Auto Scaling Group and another set of application server instances that become a part of another Auto Scaling Group and so on. Each group has its own set of criteria specified that includes the minimum and maximum number of instances that the Group should have along with the desired number of instances that the group must have at all times.
Note: The desired number of instances is an optional field in an Auto Scaling Group. If the desired capacity value is not specified, then the Auto Scaling Group will consider the minimum number of instance value as the desired value instead.
Auto Scaling Groups are also responsible for performing periodic health checks on the instances contained within them. An instance with a degraded health is then immediately swapped out and replaced by a new one by the Auto Scaling Group, thus ensuring that each of the instances within the Group work at optimum levels.
A Launch Configuration is a set of blueprint statements that the Auto Scaling Group uses to launch instances. You can create a single Launch Configuration and use it with multiple Auto Scaling Groups; however, you can only associate one Launch Configuration with a single Auto Scaling Group at a time. What does a Launch Configuration contain? Well to start off with, it contains the AMI ID using which Auto Scaling launches the instances in the Auto Scaling Group. It also contains additional information about your instances such as instance type, the security group it has to be associated with, block device mappings, key pairs, and so on. An important thing to note here is that once you create a Launch Configuration, there is no way you can edit it again. The only way to make changes to a Launch Configuration is by creating a new one in its place and associating that with the Auto Scaling Group.
With your Launch Configuration created, the final step left is to create one or more Scaling Plans. Scaling Plans describe how the Auto Scaling Group should actually scale. There are three scaling mechanisms you can use with your Auto Scaling Groups, each described as follows:
- Manual Scaling: Manual Scaling by far is the simplest way of scaling your resources. All you need to do here is specify a new desired number of instances value or change the minimum or maximum number of instances in an Auto Scaling Group and the rest is taken care of by the Auto Scaling service itself.
- Scheduled Scaling: Scheduled Scaling is really helpful when it comes to scaling resources based on a particular time and date. This method of scaling is useful when the application’s load patterns are highly predictable, and thus you know exactly when to scale up or down. For example, an application that process a company’s payroll cycle is usually load intensive during the end of each month, so you can schedule the scaling requirements accordingly.
- Dynamic Scaling: Dynamic Scaling or scaling on demand is used when the predictability of your application’s performance is unknown. With Dynamic Scaling, you generally provide a set of scaling policies using some criteria, for example, scale the instances in my Auto Scaling Group by 10 when the average CPU Utilization exceeds 75 percent for a period of 5 minutes. Sounds familiar right? Well that’s because these dynamic scaling policies rely on Amazon CloudWatch to trigger scaling events. CloudWatch monitors the policy conditions and triggers the auto scaling events when certain thresholds are beached. In either case, you will require a minimum of two such scaling polices: one for scaling in (terminating instances) and one for scaling out (launching instances).
Before we go ahead and create our first Auto Scaling activity, we need to understand one additional AWS service that will help us balance and distribute the incoming traffic across our auto scaled EC2 instances. Enter the Elastic Load Balancer!
Introducing the Elastic Load Balancer
Elastic Load Balancer or ELB is a web service that allows you to automatically distribute incoming traffic across a fleet of EC2 instances. In simpler terms, an ELB acts as a single point of contact between your clients and the EC2 instances that are servicing them. The clients query your application via the ELB; thus, you can easily add and remove the underlying EC2 instances without having to worry about any of the traffic routing or load distributions. It is all taken care of by the ELB itself!
Coupled with Auto Scaling, ELB provides you with a highly resilient and fault tolerant environment to host your applications. While the Auto Scaling service automatically removes any unhealthy EC2 instances from its Group, the ELB automatically reroutes the traffic to some other healthy instance. Once a new healthy instance is launched by the Auto Scaling service, ELB will once again re-route the traffic through it and balance out the application load as well. But the work of the ELB doesn’t stop there! An ELB can also be used to safeguard and secure your instances by enforcing encryption and by utilizing only HTTPS and SSL connections. Keeping these points in mind, let us look at how an ELB actually works.
Well to begin with, when you create an ELB in a particular AZ, you are actually spinning up one or more ELB nodes. Don’t worry, you cannot physically see these nodes nor perform any much actions on them. They are completely managed and looked after by AWS itself. This node is responsible for forwarding the incoming traffic to the healthy instances present in that particular AZ. Now here’s the fun part! If you configure the ELB to work across multiple AZs and assume that one entire AZ goes down or the instances in that particular AZ become unhealthy for some reason, then the ELB will automatically route traffic to the healthy instances present in the second AZ.
How does it do the routing? The ELB by default is provided with a Public DNS name, something similar to MyELB-123456789.region.elb.amazonaws.com. The clients send all their requests to this particular Public DNS name. The AWS DNS Servers then resolve this public DNS name to the public IP addresses of the ELB nodes. Each of the nodes has one or more Listeners configured on them which constantly checks for any incoming connections. Listeners are nothing but a process that are configured with a combination of protocol, for example, HTTP and a port, for example, 80. The ELB node that receives the particular request from the client then routes the traffic to a healthy instance using a particular routing algorithm. If the Listener was configured with a HTTP or HTTPS protocol, then the preferred choice of routing algorithm is the least outstanding requests routing algorithm.
Note: If you had configured your ELB with a TCP listener, then the preferred routing algorithm is Round Robin.
Confused? Well don’t be as most of these things are handled internally by the ELB itself. You don’t have to configure the ELB nodes nor the routing tables. All you need to do is set up the Listeners in your ELB and point all client requests to the ELB’s Public DNS name, that’s it! Keeping these basics in mind, let us go ahead and create our very first ELB!
Creating your first Elastic Load Balancer
Creating and setting up an ELB is a fairly easy and straightforward process provided you have planned and defined your Elastic Load Balancer’s role from the start. The current version of ELB supports HTTP, HTTPS, TCP, as well as SSL connection protocols; however, for the sake of simplicity, we will be creating a simple ELB for balancing HTTP traffic only. I’ll be using the same VPC environment that we have been developing since Chapter 5, Building your Own Private Clouds using Amazon VPC; however, you can easily substitute your own infrastructure in this place as well.
To access the ELBDashboard, you will have to first access the EC2ManagementConsole. Next, from the navigation pane, select the LoadBalancers option, as shown in the following screenshot. This will bring up the ELBDashboard as well using which you can create and associate your ELBs. An important point to note here is that although ELBs are created using this particular portal, you can, however, use them for both your EC2 and VPC environments. There is no separate portal for creating ELBs in a VPC environment.
Since this is our first ELB, let us go ahead and select the CreateLoadBalancer option. This will bring up a seven-step wizard using which you can create and customize your ELBs.
Step 1 – Defining Load Balancer
To begin with, provide a suitable name for your ELB in the LoadBalancername field. In this case, I have opted to stick to my naming convention and named the ELB as US-WEST-PROD-LB-01. Next up, select the VPC option in which you wish to deploy your ELB. Again, I have gone ahead and selected the US-WEST-PROD-1 (192.168.0.0/16) VPC that we created in Chapter 5, Building your Own Private Clouds using Amazon VPC. You can alternatively select your own VPC environment or even select a standalone EC2 environment if it is available. Do not check the Create an internal load balancer option as in this scenario we are creating an Internet-facing ELB for our Web Server instances.
There are two types of ELBs that you can create and use based on your requirements. The first is an Internet-facing Load Balancer, which is used to balance out client requests that are inbound from the Internet. Ideally, such Internet-facing load balancers connect to the Public Subnets of a VPC. Similarly, you also have something called as Internal Load Balancers that connect and route traffic to your Private Subnets. You can use a combination of these depending on your application’s requirements and architecture, for example, you can have one Internet-facing ELB as your application’s main entry point and an internal ELB to route traffic between your Public and Private Subnets; however, for simplicity, let us create an Internet-facing ELB for now.
With these basic settings done, we now provide our ELB’s Listeners. A Listener is made up of two parts: a protocol and port number for your frontend connection (between your Client and the ELB), and a protocol and a port number for a backend connection (between the ELB and the EC2 instances).
In the ListenerConfiguration section, select HTTP from the Load Balancer Protocol dropdown list and provide the port number 80 in the Load Balancer Port field, as shown in the following screenshot. Provide the same protocol and port number for the Instance Protocol and Instance Port field as well.
What does this mean? Well this listener is now configured to listen on the ELB’s external port (Load Balancer Port) 80 for any client’s requests. Once it receives the requests, it will then forward it out to the underlying EC2 instances using the Instance Port, which in this case is port 80 as well. There is no thumb rule as such that both the port values must match; in fact, it is actually a good practice to keep them different. Although your ELB can listen on port 80 for any client’s requests, it can use any ports within the range of 1-65,535 for forwarding the request to the instances. You can optionally add additional listeners to your ELB such as a listener for the HTTPS protocol running on port 443 as well; however, that is something that I will leave you do to later.
The final configuration item left in step 1 is where you get to select the Subnets option to be associated with your new Load Balancer. In my case, I have gone ahead and created a set of subnets each in two different AZs so as to mimic a high availability scenario.
Select any particular Subnets and add them to your ELB by selecting the adjoining + sign. In my case, I have selected two Subnets, both belonging to the web server instances; however, both present in two different AZs.
Note: You can select a single Subnet as well; however, it is highly recommended that you go for a high available architecture, as described earlier.
Once your subnets are added, click on Next: Assign Security Groups to continue over to step 2.
Step 2 – Assign Security Groups
Step 2 is where we get to assign our ELB with a Security Group. Now here a catch: You will not be prompted for a Security Group if you are using an EC2-Classic environment for your ELB. This Security Group is only necessary for VPC environments and will basically allow the port you designated for inbound traffic to pass through.
In this case, I have created a new dedicated Security Group for the ELB. Provide a suitable Security group name as well as Description, as shown in the preceding screenshot. The new security group already contains a rule that allows traffic to the port that you configured your Load Balancer to use, in my case its port 80. Leave the rule to its default value and click on Next: Configure Security Settingsto continue.
Step 3 – Configure Security Settings
This is an optional page that basically allows you to secure your ELB by using either the HTTPS or the SSL protocol for your frontend connection. But since we have opted for a simple HTTP-based ELB, we can ignore this page for now. Click on Next: Configure Health Check to proceed on to the next step.
Step 4 – Configure Health Check
Health Checks are a very important part of an ELB’s configuration and hence you have to be extra cautious when setting it up. What are Health Checks? To put it in simple terms, these are basic tests that the ELB conducts to ensure that your underlying EC2 instances are healthy and running optimally. These tests include simple pings, attempted connections, or even some send requests. If the ELB senses either of the EC2 instances in an unhealthy state, it immediately changes its Health Check Status to OutOfService. Once the instance is marked as OutOfService, the ELB no longer routes any traffic to it. The ELB will only start sending traffic back to the instance only if its Health Check State changes to InService again.
To configure the Health Checks for your ELB, fill in the following information as described here:
- Ping Protocol: This field indicates which protocol the ELB should use to connect to your EC2 instances. You can use the TCP, HTTP, HTTPS, or the SSL options; however, for simplicity, I have selected the HTTP protocol here.
- Ping Port: This field is used to indicate the port which the ELB should use to connect to the instance. You can supply any port value from the range 1 to 65,535; however, since we are using the HTTP protocol, I have opted to stick with the default value of port 80. This port value is really essential as the ELB will periodically ping the EC2 instances on this port number. If any instance does not reply back in a timely fashion, then that instance will be deemed unhealthy by the ELB.
- Ping Path: This value is usually used for the HTTP and HTTPS protocols. The ELB sends a simple GET request to the EC2 instances based on the Ping Port and Ping Path. If the ELB receives a response other than an “OK,” then that particular instance is deemed to be unhealthy by the ELB and it will no longer route any traffic to it. Ping Paths generally are set with a forward slash “/”, which indicates the default home page of a web server. However, you can also use a /index.html or a /default.html value as you seem fit. In my case, I have provided the /index.php value as my dummy web application is actually a PHP app.
Besides the Ping checks, there are also a few advanced configuration details that you can configure based on your application’s health check needs:
- Response Time: The Response Time is the time the ELB has to wait in order to receive a response. The default value is 5 seconds with a max value up to 60 seconds. Let’s take a look at the following screenshot:
- Health Check Interval: This field indicates the amount of time (in seconds) the ELB waits between health checks of an individual EC2 instance. The default value is 30 seconds; however, you can specify a max value of 300 seconds as well.
- Unhealthy Threshold: This field indicates the number of consecutive failed health checks an ELB must wait before declaring an instance unhealthy. The default value is 2 with a max threshold value of 10.
- Healthy Threshold: This field indicates the number of consecutive successful health checks an ELB must wait before declaring an instance healthy. The default value is 2 with a max threshold value of 10.
Once you have provided your values, go ahead and select the Next: Add EC2 Instances option.
Step 5 – Add EC2 Instances
In this section of the Wizard, you can select any running instance from your Subnets to be added and registered with the ELB. But since we are setting this particular ELB for use with Auto Scaling, we will leave this section for now. Click on Next: Add Tags to proceed with the wizard.
Step 6 – Add Tags
We already know the importance of tagging our AWS resources, so go ahead and provide a suitable tag for categorizing and identifying your ELB. Note that you can always add/edit and remove tags at a later time as well using the ELB Dashboard. With the Tags all set up, click on Review and Create.
Step 7 – Review and Create
The final steps of our ELB creation wizard is where we simply review our ELB’s settings, including the Health Checks, EC2 instances, Tags, and so on. Once reviewed, click on Create to begin your ELB’s creation and configuration.
The ELB takes a few seconds to get created, but once it’s ready, you can view and manage it just like any other AWS resource using the ELBDashboard, as shown in the following screenshot:
Select the newly created ELB and view its details in the Description tab. Make a note of the ELB’s public DNS Name as well. You can optionally even view the Status as well as the ELBScheme (whether Internet-facing or internal) using the Description tab. You can also view the ELB’s Health Checks as well as the Listeners configured with your ELB.
Before we proceed with the next section of this chapter, here are a few important pointers to keep in mind when working with ELB. Firstly, the configurations that we performed on our ELB are all very basic and will help you to get through the basics; however, ELB also provides us with additional advanced configuration options such as Cross-Zone Load Balancing, Proxy Protocols, and Sticky Sessions, and so on, which can all be configured using the ELB Dashboard. To know more about these advanced settings, refer to http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-configure-load-balancer.html. Second important thing worth mentioning is the ELB’s costs. Although it is free (Terms and Conditions apply) to use under the Free Tier eligibility, ELBs are charged approximately $0.025 per hour used. There is a nominal charge on the data transferring charge as well, which is approximately $0.008 per GB of data processed.
I really hope that you have got to learn about Amazon ELB as much as possible. We talked about the importance of Auto Scaling and how it proves to be super beneficial when compared to the traditional mode of scaling infrastructure. We then learnt a bit about AWS Auto Scaling and its core components. Next, we learnt about a new service offering called as Elastic Load Balancers and saw how easy it is to deploy one for our own use.
Resources for Article:
- Achieving High-Availability on AWS Cloud [article]
- Amazon Web Services [article]
- Patterns for Data Processing [article]