In this article, by Aurobindo Sarkar and Amit Shah, author of the book Learning AWS, we will introduce some key design principles and approaches to achieving high availability in your applications deployed on the AWS cloud. As a good practice, you want to ensure that your mission-critical applications are always available to serve your customers. The approaches in this article will address availability across the layers of your application architecture including availability aspects of key infrastructural components, ensuring there are no single points of failure. In order to address availability requirements, we will use the AWS infrastructure (Availability Zones and Regions), AWS Foundation Services (EC2 instances, Storage, Security and Access Control, Networking), and the AWS PaaS services (DynamoDB, RDS, CloudFormation, and so on).

(For more resources related to this topic, see here.)

Defining availability objectives

Achieving high availability can be costly. Therefore, it is important to ensure that you align your application availability requirements with your business objectives. There are several options to achieve the level of availability that is right for your application. Hence, it is essential to start with a clearly defined set of availability objectives and then make the most prudent design choices to achieve those objectives at a reasonable cost. Typically, all systems and services do not need to achieve the highest levels of availability possible; at the same time ensure you do not introduce a single point of failure in your architecture through dependencies between your components. For example, a mobile taxi ordering service needs its ordering-related service to be highly available; however, a specific customer's travel history need not be addressed at the same level of availability.

The best way to approach high availability design is to assume that anything can fail, at any time, and then consciously design against it.

"Everything fails, all the time."

- Werner Vogels, CTO, Amazon.com

In other words, think in terms of availability for each and every component in your application and its environment because any given component can turn into a single point of failure for your entire application. Availability is something you should consider early on in your application design process, as it can be hard to retrofit it later. Key among these would be your database and application architecture (for example, RESTful architecture). In addition, it is important to understand that availability objectives can influence and/or impact your design, development, test, and running your system on the cloud.

Finally, ensure you proactively test all your design assumptions and reduce uncertainty by injecting or forcing failures instead of waiting for random failures to occur.

The nature of failures

There are many types of failures that can happen at any time. These could be a result of disk failures, power outages, natural disasters, software errors, and human errors. In addition, there are several points of failure in any given cloud application. These would include DNS or domain services, load balancers, web and application servers, database servers, application services-related failures, and data center-related failures. You will need to ensure you have a mitigation strategy for each of these types and points of failure. It is highly recommended that you automate and implement detailed audit trails for your recovery strategy, and thoroughly test as many of these processes as possible.

In the next few sections, we will discuss various strategies to achieve high availability for your application. Specifically, we will discuss the use of AWS features and services such as:

VPC
Amazon Route 53
Elastic Load Balancing, auto-scaling
Redundancy
Multi-AZ and multi-region deployments

Setting up VPC for high availability

Before setting up your VPC, you will need to carefully select your primary site and a disaster recovery (DR) site. Leverage AWS's global presence to select the best regions and availability zones to match your business objectives. The choice of a primary site is usually the closest region to the location of a majority of your customers and the DR site could be in the next closest region or in a different country depending on your specific requirements. Next, we need to set up the network topology, which essentially includes setting up the VPC and the appropriate subnets. The public facing servers are configured in a public subnet; whereas the database servers and other application servers hosting services such as the directory services will usually reside in the private subnets.

Ensure you chose different sets of IP addresses across the different regions for the multi-region deployment, for example 10.0.0.0/16 for the primary region and 192.168.0.0/16 for the secondary region to avoid any IP addressing conflicts when these regions are connected via a VPN tunnel. Appropriate routing tables and ACLs will also need to be defined to ensure traffic can traverse between them. Cross-VPC connectivity is required so that data transfer can happen between the VPCs (say, from the private subnets in one region over to the other region). The secure VPN tunnels are basically IPSec tunnels powered by VPN appliances—a primary and a secondary tunnel should be defined (in case the primary IPSec tunnel fails). It is imperative you consult with your network specialists through all of these tasks.

An ELB is configured in the primary region to route traffic across multiple availability zones; however, you need not necessarily commission the ELB for your secondary site at this time. This will help you avoid costs for the ELB in your DR or secondary site. However, always weigh these costs against the total cost/time required for recovery. It might be worthwhile to just commission the extra ELB and keep it running.

Gateway servers and NAT will need to be configured as they act as gatekeepers for all inbound and outbound Internet access. Gateway servers are defined in the public subnet with appropriate licenses and keys to access your servers in the private subnet for server administration purposes. NAT is required for servers located in the private subnet to access the Internet and is typically used for automatic patch updates. Again, consult your network specialists for these tasks.

Elastic load balancing and Amazon Route 53 are critical infrastructure components for scalable and highly available applications; we discuss these services in the next section.

Using ELB and Route 53 for high availability

In this section, we describe different levels of availability and the role ELBs and Route 53 play from an availability perspective.

Instance availability

The simplest guideline here is to never run a single instance in a production environment. The simplest approach to improving greatly from a single server scenario is to spin up multiple EC2 instances and stick an ELB in front of them. The incoming request load is shared by all the instances behind the load balancer.

ELB uses the least outstanding requests routing algorithm to spread HTTP/HTTPS requests across healthy instances. This algorithm favors instances with the fewest outstanding requests.

Even though it is not recommended to have different instance sizes between or within the AZs, the ELB will adjust for the number of requests it sends to smaller or larger instances based on response times. In addition, ELBs use cross-zone load balancing to distribute traffic across all healthy instances regardless of AZs. Hence, ELBs help balance the request load even if there are unequal number of instances in different AZs at any given time (perhaps due to a failed instance in one of the AZs).

There is no bandwidth charge for cross-zone traffic (if you are using an ELB).

Instances that fail can be seamlessly replaced using auto scaling while other instances continue to operate.

Though auto-replacement of instances works really well, storing application state or caching locally on your instances can be hard to detect problems.

Instance failure is detected and the traffic is shifted to healthy instances, which then carries the additional load. Health checks are used to determine the health of the instances and the application. TCP and/or HTTP-based heartbeats can be created for this purpose. It is worthwhile implementing health checks iteratively to arrive at the right set that meets your goals. In addition, you can customize the frequency and the failure thresholds as well. Finally, if all your instances are down, then AWS will return a 503.

Zonal availability or availability zone redundancy

Availability zones are distinct geographical locations engineered to be insulated from failures in other zones. It is critically important to run your application stack in more than one zone to achieve high availability. However, be mindful of component level dependencies across zones and cross-zone service calls leading to substantial latencies in your application or application failures during availability zone failures. For sites with very high request loads, a 3-zone configuration might be the preferred configuration to handle zone-level failures. In this situation, if one zone goes down, then other two AZs can ensure continuing high availability and better customer experience.

In the event of a zone failure, there are several challenges in a Multi-AZ configuration, resulting from the rapidly shifting traffic to the other AZs. In such situations, the load balancers need to expire connections quickly and lingering connections to caches must be addressed. In addition, careful configuration is required for smooth failover by ensuring all clusters are appropriately auto scaled, avoiding cross-zone calls in your services, and avoiding mismatched timeouts across your architecture.

ELBs can be used to balance across multiple availability zones. Each load balancer will contain one or more DNS records. The DNS record will contain multiple IP addresses and DNS round-robin can be used to balance traffic between the availability zones. You can expect the DNS records to change over time. Using multiple AZs can result in traffic imbalances between AZs due to clients caching DNS records. However, ELBs can help reduce the impact of this caching.

Regional availability or regional redundancy

ELB and Amazon Route 53 have been integrated to support a single application across multiple regions. Route 53 is AWS's highly available and scalable DNS and health checking service. Route 53 supports high availability architectures by health checking load balancer nodes and rerouting traffic to avoid the failed nodes, and by supporting implementation of multi-region architectures. In addition, Route 53 uses Latency Based Routing (LBR) to route your customers to the endpoint that has the least latency. If multiple primary sites are implemented with appropriate health checks configured, then in cases of failure, traffic shifts away from that site to the next closest region.

Region failures can present several challenges as a result of rapidly shifting traffic (similar to the case of zone failures). These can include auto scaling, time required for instance startup, and the cache fill time (as we might need to default to our data sources, initially). Another difficulty usually arises from the lack of information or clarity on what constitutes the minimal or critical stack required to keep the site functioning as normally as possible. For example, any or all services will need to be considered as critical in these circumstances.

The health checks are essentially automated requests sent over the Internet to your application to verify that your application is reachable, available, and functional. This can include both your EC2 instances and your application. As answers are returned only for the resources that are healthy and reachable from the outside world, the end users can be routed away from a failed application. Amazon Route 53 health checks are conducted from within each AWS region to check whether your application is reachable from that location.

The DNS failover is designed to be entirely automatic. After you have set up your DNS records and health checks, no manual intervention is required for failover. Ensure you create appropriate alerts to be notified when this happens. Typically, it takes about 2 to 3 minutes from the time of the failure to the point where traffic is routed to an alternate location. Compare this to the traditional process where an operator receives an alarm, manually configures the DNS update, and waits for the DNS changes to propagate.

The failover happens entirely within the Amazon Route 53 data plane.

Depending on your availability objectives, there is an additional strategy (using Route 53) that you might want to consider for your application. For example, you can create a backup static site to maintain a presence for your end customers while your primary dynamic site is down. In the normal course, Route 53 will point to your dynamic site and maintain health checks for it. Furthermore, you will need to configure Route 53 to point to the S3 storage, where your static site resides. If your primary site goes down, then traffic can be diverted to the static site (while you work to restore your primary site). You can also combine this static backup site strategy with a multiple region deployment.

Setting up high availability for application and data layers

In this section, we will discuss approaches for implementing high availability in the application and data layers of your application architecture.

The auto healing feature of AWS OpsWorks provides a good recovery mechanism from instance failures. All OpsWorks instances have an agent installed. If an agent does not communicate with the service for a short duration, then OpsWorks considers the instance to have failed. If auto healing is enabled at the layer and an instance becomes unhealthy, then OpsWorks first terminates the instance and starts a new one as per the layer configuration.

In the application layer, we can also do cold starts from preconfigured images or a warm start from scaled down instances for your web servers and application servers in a secondary region. By leveraging auto scaling, we can quickly ramp up these servers to handle full production loads. In this configuration, you would deploy the web servers and application servers across multiple AZs in your primary region while the standby servers need not be launched in your secondary region until you actually need them. However, keep the preconfigured AMIs for these servers ready to launch in your secondary region.

The data layer can comprise of SQL databases, NoSQL databases, caches, and so on. These can be AWS managed services such as RDS, DynamoDB, and S3, or your own SQL and NoSQL databases such as Oracle, SQL Server, or MongoDB running on EC2 instances. AWS services come with HA built-in, while using database products running on EC2 instances offers a do-it-yourself option. It can be advantageous to use AWS services if you want to avoid taking on database administration responsibilities. For example, with the increasing sizes of your databases, you might choose to share your databases, which is easy to do. However, resharding your databases while taking in live traffic can be a very complex undertaking and present availability risks. Choosing to use the AWS DynamoDB service in such a situation offloads this work to AWS, thereby resulting in higher availability out of the box.

AWS provides many different data replication options and we will discuss a few of those in the following several paragraphs.

DynamoDB automatically replicates your data across several AZs to provide higher levels of data durability and availability. In addition, you can use data pipelines to copy your data from one region to another. DynamoDB streams functionality that can be leveraged to replicate to another DynamoDB in a different region. For very high volumes, low latency Kinesis services can also be used for this replication across multiple regions distributed all over the world.

You can also enable the Multi-AZ setting for the AWS RDS service to ensure AWS replicates your data to a different AZ within the same region. In the case of Amazon S3, the S3 bucket contents can be copied to a different bucket and the failover can be managed on the client side. Depending on the volume of data, always think in terms of multiple machines, multiple threads and multiple parts to significantly reduce the time it takes to upload data to S3 buckets.

While using your own database (running on EC2 instances), use your database-specific high availability features for within and cross-region database deployments. For example, if you are using SQL Server, you can leverage the SQL Server Always-on feature for synchronous and asynchronous replication across the nodes. If the volume of data is high, then you can also use the SQL Server log shipping to first upload your data to Amazon S3 and then restore into your SQL Server instance on AWS. A similar approach in case of Oracle databases uses OSB Cloud Module and RMAN. You can also replicate your non-RDS databases (on-premise or on AWS) to AWS RDS databases. You will typically define two nodes in the primary region with synchronous replication and a third node in the secondary region with asynchronous replication. NoSQL databases such as MongoDB and Cassandra have their own asynchronous replication features that can be leveraged for replication to a different region.

In addition, you can create Read Replicas for your databases in other AZs and regions. In this case, if your master database fails followed by a failure of your secondary database, then one of the read replicas can be promoted to being the master. In hybrid architectures, where you need to replicate between on-premise and AWS data sources, you can do so through a VPN connection between your data center and AWS. In case of any connectivity issues, you can also temporarily store pending data updates in SQS, and process them when the connectivity is restored.

Usually, data is actively replicated to the secondary region while all other servers like the web servers and application servers are maintained in a cold state to control costs. However, in cases of high availability for web scale or mission critical applications, you can also choose to deploy your servers in active configuration across multiple regions.

Implementing high availability in the application

In this section, we will discuss a few design principles to use in your application from a high availability perspective. We will briefly discuss using highly available AWS services to implement common features in mobile and Internet of Things (IoT) applications. Finally, we also cover running packaged applications on the AWS cloud.

Designing your application services to be stateless and following a micro services-oriented architecture approach can help the overall availability of your application. In such architectures, if a service fails then that failure is contained or isolated to that particular service while the rest of your application services continue to serve your customers. This approach can lead to an acceptable degraded experience rather than outright failures or worse. You should also store user or session information in a central location such as the AWS ElastiCache and then spread information across multiple AZs for high availability. Another design principle is to rigorously implement exception handling in your application code, and in each of your services to ensure graceful exit in case of failures.

Most mobile applications share common features including user authentication and authorization, data synchronization across devices; user behavior analytics; retention tracking, storing, sharing, and delivering media globally; sending push notifications; store shared data; stream real-time data; and so on. There are a host of highly available AWS services that can be used for implementing such mobile application functionality. For example, you can use Amazon Cognito to authenticate users, Amazon Mobile Analytics for analyzing user behavior and tracking retention, Amazon SNS for push notifications and Amazon Kinesis for streaming real-time data. In addition, other AWS services such as S3, DynamoDB, IAM, and so on can also be effectively used to complete most mobile application scenarios.

For mobile applications, you need to be especially sensitive about latency issues; hence, it is important to leverage AWS regions to get as close to your customers as possible.

Similar to mobile applications, for IoT applications you can use the same highly available AWS services to implement common functionality such as device analytics and device messaging/notifications. You can also leverage Amazon Kinesis to ingest data from hundreds of thousands of sensors that are continuously generating massive quantities of data.

Aside from your own custom applications, you can also run packaged applications such as SAP on AWS. These would typically include replicated standby systems, Multi-AZ and multi-region deployments, hybrid architectures spanning your own data center, and AWS cloud (connected via VPN or AWS Direct Connect service), and so on. For more details, refer to the specific package guides for achieving high availability on the AWS cloud.

Summary

In this article, we reviewed some of the strategies you can follow for achieving high availability in your cloud application. We emphasized the importance of both designing your application architecture for availability and using the AWS infrastructural services to get the best results.