In the fifth part of the series covering the Amazon Storage Gateway we will discuss how we can monitor the gateway. We will see how we can:
Measure the performance between the application and the gateway
Measure the performance between the gateway and AWS
Using the CloudWatch metrics one can track the health of the gateway and set up alarms in case the metrics cross the defined thresholds.
VMware Training – Resources (Intense)
Using one of the two tools, we can do the monitoring: AWS Management Console or the Amazon CloudWatch API. The data provided by the AWS Management Console comes as graphs that are built with the data from the Amazon CloudWatch API. One can gather directly the data from API using either the AWS SDKs or the CloudWatch API tools.
As with the other metrics that can be used, there is specific information to AWS Storage Gateway that can help identify the correct metrics. The metric dimension, which is a name-value pair used to identify a metric, for AWS Storage Gateway are GatewayId, GatewayName and VolumeId. To select gateway-specific or volume-specific dimensions, two views are provided in CloudWatch Console: Gateway Metrics and Volume Metrics. The other characteristic of the metric is the metric name.
GatewayId and GatewayName describe the characteristics of the gateway. When throughput and latency data of a gateway are measured, all the volumes are considered. Data is collected in 5-minutes periods.
VolumeId describes a storage volume. Each volume has its own VolumeId. Again the data is collected in 5-minutes periods.
To understand better how a gateway is performing there are a few useful measures that can be used and these are data throughput, data latency and operations per second. By using the metrics provided by AWS Storage Gateway, the values for data throughput, data latency and operations per second can be measured accurately if the correct aggregation statistic is used. A statistic is a metric that has been aggregated over a certain period of time.
To see the data throughput, data latency and operations per second, the Sum statistic must be used for data throughput, the Average statistic must be used for data latency and the Data Samples statistic must be used for operations per second.
To measure the throughput, the ReadBytes and WriteBytes metrics must be used. Because the metric is returning a value over a 5-minute interval, to get the throughput, the metric values must be divided to 300 seconds.
To measure the latency, the ReadTime and WriteTime metrics must be used.
To measure the operations per second, the ReadBytes and WriteBytes metrics must be used like for throughput. But keep in mind that a different aggregation statistic is used.
So let’s see in action some metrics that can measure the performance between the application and the gateway.
But first, to see the metrics, you will need to connect to the CloudWatch Console. You will see what metrics you have available. Select “Storage Gateway” from the left side menu:
As you can see, you have volume metrics and gateway metrics. Below there are few of the volume metrics:
Let’s see the data latency, the throughput and the operations per second values.
This is for throughput:
So we have 8,088,317,952 bytes over 300 seconds. That is 26,961,059 bytes per second. And that is 26.3MB/s.
Let’s see the latency when the data was written:
As you can see, the latency was 26.5 ms.
Let’s see how many operations per second were done during the backing up.
So there were 31,373 bytes. Divided by 300, it means that there around 100 operations per second.
Let’s move on and see some measure of the performance between the gateway and AWS. We will see lower values, as the traffic has to use the Internet connection and based on the bandwidth available, the data can be written slower or faster.
Besides the measures that we mentioned previously, the performance between the gateway and AWS considers also the throughput to AWS and latency to AWS.
The throughput to AWS must use the CloudBytesDownloaded and CloudBytesUploaded metrics with the Sum as aggregation statistic. Again the value reported will need to be divided by 300.
The latency to AWS must use CloudDownloadLatency metric with the Average statistic.
Let’s check the throughput to AWS:
So we have 285,111,778 bytes over 300 seconds which is 0.9MB/s.
Let’s check the latency:
So we have 1,742 ms latency that is almost 2 seconds.
Similarly we can monitor the upload buffer and the cache storage.
To get relevant data about the performance of the upload buffer, the upload buffer usage is measured. Using UploadBufferPercentUsed, UploadBufferUsed and UploadBufferFree metrics with the Average statistic does this.
To get relevant data about the performance of the cache storage, a few things are measured:
Total usage of cache: CachePercentUsed and TotalCacheSize metrics with the Average statistic are used
The read requestsin percents served from the cache: CacheHitPercent metric with the Average statistic is used. Ideally, CacheHitPercent should stay high
Percentage of cache that is dirty(has content that wasn’t yet uploaded): CachePercentDirty metric with the Average statistic is used. Ideally, CachePercentDirty should stay low
As mentioned, alarms can be configured to send notifications in case some of the metrics go above or below the specified thresholds.
And we reached the end of the article altogether with the end of the series.
It was a pretty long series with a lot of notions discussed and a lot of information shared. Let’s recap what we did throughout the series.
First we discussed basic notions about types of architectures and their specifics. Then in the second part we saw how to deploy a gateway on-premises using a VM. In the third part we finalized the gateway activation and we used an iSCSI initiator to mount the storage volume on a Windows 7. In the fourth part we discussed snapshots and how can they be restored. In the last part of the series we saw how the gateway’s performance can be monitored.
I hope you enjoyed the series and I advise you to go through the links from the reference section of each part to get more details that you can use for your specific deployment.