This article will discuss Amazon CloudWatch. Amazon CloudWatch is monitoring the AWS resources and the applications running in AWS. CloudWatch is tracking metrics that the user wants to monitor. CloudWatch can send alarms or make changes to the resources monitored based on your definitions.
VMware Training – Resources (Intense)
In this article we will:
Discuss CloudWatch concepts
Select and view metrics
View statistics for an EC2 instance
Create alarms and monitor them
Basically, Cloudwatch is a collection of metrics. Each AWS service is making its metrics available through this collection. User defined metrics are available through the same meaning.
Let’s discuss some key notions in CloudWatch.
The first one is the metric which is a variable whose values over time are being monitored. For instance, write operation on disk is a metric. Each metric is defined uniquely by a name, a namespace and one or more dimensions. Each value of the metric has a timestamp. Optionally there can be a unit of measure. The data returned when the statistics are requested is identified by the metric name, dimension, namespace and, if it exists, the unit. The metric data is available for two weeks.
Another concept is namespace. A namespace is a container for metric. This means that the metrics are isolated and they cannot be aggregated in the same statistics. Each AWS service that provides data to CloudWatch uses a name space starting with “AWS/”. For instance, Auto Scaling service has a namespace of AWS/AutoScaling.
Dimension is a name/value pair that uniquely identifies a metric. You can assign up to ten dimensions to a metric. It is very important to know that CloudWatch considers a unique combination of dimensions as a new metric.
Time Stamp is another concept in CloudWatch. Every value of a metric must have a time stamp.
Unit represent the statistical unit of measure. For instance, the metric that track the bytes that an instance sends out has a different unit than a metric that tracks the average CPU usage.
Statistics are data aggregations over time intervals. You can have as statistics “Minimum”, “Maximum”, “Sum”, “Average”, “SampleCount”. All of them are self explanatory. Maybe except the last one. “SampleCount” specifies how many values were used for that statistical data.
The period is the length of time over which the statistic data has been calculated. The minimum granularity for a period is one minute. CloudWatch alarms use periods.
An alarm monitors a single metric over a specified period of time and then performs an action. The action is a notification sent to Amazon Simple Notification Service (Amazon SNS) topic or Auto Scaling policy.
CloudWatch can monitor many AWS services. The complete list is here Supported AWS Services.
So, let’s get to the practical part of this article.
You can access the CloudWatch service from the AWS Console Management, by clicking on “CloudWatch”:
As you can see, I have only one alarm configured and that one is in “OK” state, which means that the threshold configured by me wasn’t breached.
Let’s discuss CloudWatch metrics. By default, there are few free metrics provided for EC2 instances: EBS volumes, RDS DB instances and ELB.
To view all the metrics choose “Metrics” from the left menu:
For instance, I have an EC2 instance for which I would like to check a specific metric:
\Also, for this EC2 instance, I enabled detailed monitoring by clicking on “Enable Detailed Monitoring”:
\I would like to select the “CPUUtilization” metric for this instance ID (i-3baa6bd1). From the left menu, under “Metrics”, choose “EC2” and then scroll down to the instance ID and then select the metric that you are looking for:
Now, let’s see the statistics for the maximum CPU usage for a period of five minutes:
Now, let’s create a CloudWatch alarm. You can access the alarms section by clicking on “Alarms” menu. As you can see, I have only one alarm created and this is to get a notification in case my AWS costs are going above 100USD. So far, this threshold hasn’t been crossed, hence the “OK” state:
Click on “Create Alarm”:
Then click on “Per-Instance Metrics” and you can either scroll down to the metric for which you want to create an alarm or you can type it in the search field. Then click on “Next”:
On the next step, choose your preferred name for the alarm. The alarm should be triggered when the CPU utilization is more than 40 percent for three periods. The statistic should be average and one period should be one minute. The action is to send a notification to an email address that I configured earlier. The email address is configured through the Simple Notification Service.
Click on “Create Alarm” to create the alarm. Because the threshold is not crossed for three consecutive periods, the alarm will be in the “OK” state:
Now, let’s make the CPU from this instance very busy to go over the threshold. In a few minutes, we can see that the alarm moved to “ALARM” category:
You can see some details about the alarm and what were the last two values of the metric:
Also, this is a small part of the notification email received:
After stopping the script that kept the CPU busy, the alarm should be back in “OK” state soon.
Basically you can do a few things with CloudWatch: view a metric, create an alarm. However, the power of CloudWatch resides in the number of metrics that you can have and how granular can be the monitoring of the resources. There are plenty of namespaces, dimensions and metrics that you can use with CloudWatch. You can find the full list here: Amazon CloudWatch Namespaces, Dimensions, and Metrics Reference. What we did in this article was just a glimpse of what we can do.
But this article should give you the basics and from now on, you can use this as a reference to monitor the metrics and create the alarms that you need.