This is the second part of the series discussing AWS Route 53 service. In the first part, we saw how you can register a domain with Route 53 and how you can create a resource record.

In the second part we will discuss Route 53 health checks and DNS failover.

Amazon Route 53 allows DNS failover with active-active, active-passive, and mixed configurations to implement high availability of the applications. For instance, if there are many resources that perform the same function, then Route 53 can check the state (health) of those resources and respond to DNS queries only the resources that are healthy.

VMware Training – Resources (Intense)

These are the three types of DNS failovers:

  • Active-active failover: in this mode, Route 53 uses all resources. In the situation that one of them becomes unhealthy, Route 53 does not include it in the answers to DNS queries.
  • Active-passive failover: in this mode, some resources are always used (called primary) and some are kept in standby (secondary). Route 53 monitors the primary resources, and in the situation that they become unavailable, Route 53 starts responding to DNS queries using the secondary resources.
  • Active-active-passive and other mixed configurations: This is a mode that combines alias and non-alias resource records in order to provide different Route 53 behaviors.

The health checks done by Route 53 are:

  • HTTP and HTTPS health checks—Route 53 must establish a TCP connection in less than four seconds. The endpoint should respond with a status code higher than 200 (including), but lower than 400 within 2 seconds.
  • TCP health checks—Route 53 must establish a TCP connection in less than 4 seconds.
  • HTTP and HTTPS health checks with string matching—This is the same as HTTP and HTTPS health checks. In addition, in the response, Route 53 must find the string that is specified when the health check is configured.

So, let’s configure a health check that can trigger a DNS failover.

As said, the failover can occur between two servers that are in different regions as well.

I have one EC2 instance running in US EAST region (N. Virginia) and one EC2 instance running in US WEST (Oregon). Each EC2 instance is running a web server. Each web server is displaying different content, so we know to which web server we are directed when we are accessing the link.

Each EC2 instance has an elastic IP so that, after a stop/start of the instance, we will receive the same public IP address.

This is the EC2 instance from US EAST:

And this is the EC2 instance from US WEST:

If you recorded the public IP addresses of each EC2 instance, you will see that we are getting different output when we access each of them. Of course, in real life, you will need to have the exact content on both servers so that the users will not notice that they were redirected to another server.

This is when we access the HTTP server from US EAST EC2 instance:

And this is from US WEST EC2 instance:

As we saw in Part I, we have the domain vtep.net that we can use for testing purposes.

What we want to do is the following. We will assume that the website from US EAST EC2 instance is our main website. In case that website is down for whatever reasons (the HTTP server has stopped or the EC2 instance is down or there are connectivity issues to the primary server), then the website from US WEST EC2 instance will take over and serve the users.

We will create two resource sets, www.vtep.net, which will point to two IP addresses.

To track the availability of the primary server, we need to configure a health check.

So let’s connect to Route 53 in Amazon Management Console and you will see that we have one domain and one hosted zone. From the left menu, choose “Health Checks.”

Click on “Create Health Check” to start the process:

In the next window, you will need to fill in some details:

Name—This is something that should be descriptive to you. You will reference this later.

Protocol—This will be the protocol used for testing. Because we need to test if the HTTP server is up, we will use HTTP.

IP Address—The IP address of the HTTP server.

Port—The port used for availability checking.

Request Interval—How often the health will be checked.

Failure Threshold—After how many failures the health check will be considered as failure.

As you can see, we used pretty aggressive intervals/thresholds because we need to detect a possible failure as soon as possible.

Click on “Create” to create the health check.

As you can see, the health check was created:

After a while, we will see the CloudWatch metrics showing that everything is ok with the health check. Basically this means that the HTTP server from US EAST EC2 instance is working fine:

Our purpose is to demonstrate that www.vtep.net will not be down in case the EC2 instance from US EAST region is down and that the recovery will be automated.

Select “Hosted Zones” and then “Go to Record Sets”:

Click on “Create Record Set” to create the first record set. The record set will be for www.vtep.net:

Everything is the same as explained in Part I of the series. In the value field, you will need to put the IP address of EC2 instance from US EAST region. The interesting stuff starts after this.

In the “Routing Policy” choose “Failover” and in the “Failure Record Type” choose “Primary” because this will be the primary server. On “Set ID,” fill in something that is meaningful for you. Further we need to Route 53 how to track the availability of this server. Choose “Yes” from “Associate with Health Check” and then, from the drop down menu, we will choose the health check that we configured earlier. Then click on “Create” to create the record set.

Then we need to create another record set with the same name, www.vtep.net. Fill in the IP address of the EC2 instance from US EAST instance. In the “Routing Policy” choose “Failover” and in the “Failure Record Type” choose “Secondary” because this will be the secondary server. We don’t need to specify any health check because we don’t have a server that will be backup to this one. In case we would have, we need to configure the health check and then reference it. Click on “Create” to continue.

Let’s check from the browser and see what output is returned. We should see the output saying that we landed on US EAST region:

And this is working.

Let’s test the failure of the primary server. We will stop the EC2 instance from US EAST:

As you can see, the CloudWatch metrics are saying that the health check has failed:

Checking the browser, we can see that we landed on US WEST region:

Starting again the US EAST EC2 instance, we can see that the health check is fine:

And now the browser is bringing us back on the EC2 instance from US EAST region:

And we reached the end of the second part of the series.

By reaching this point of the article, you now know what Route 53 DNS failover is and how you can configure it.

The process is simple: You need to create a health check and then define which server is the primary one and which one is the secondary.

After that, you can rely on the fact that Route 53 will trigger the failover if something should happen to the primary server.

References

  1. Amazon Route 53
  2. Amazon Route 53 Health Checks and DNS Failover