This is the first part of the series that will discuss Storage Gateway services available in AWS.

The series will contain multiple parts where introductory notions will be presented and it will focus on showing how to make use of this service.

In this article we will discuss:

  • What AWS Storage Gateway is
  • Volume gateway architecture
  • Gateway-VTL architecture
  • Volume gateways

AWS Storage Gateway provides integration between the on-premises IT environment and the AWS storage infrastructure. The user can store data in the AWS cloud benefiting from the data security features that AWS provides. There are two storage solutions available: volume based and tape based.

The volume gateways allow us to mount as Internet Small Computer System Interface (iSCSI) from the on-premises application to the cloud storage volume. The volumes can be configured in two ways:

  • gateway-cached volumes – the data is stored in Amazon S3 and the most accessed data is kept locally for low latency if the data needs to be accessed frequently. This type of configuration gives the option to avoid expanding the local infrastructure.
  • gateway-stored volumes – the complete data is kept locally and only snapshots of the data are stored in Amazon S3. The data stored in Amazon S3 can be used as recovery backup to either the local data center or to Amazon EC2.

The other storage solution is gateway-virtual tape library (VTL). The data is stored in Amazon Glacier. Gateway-VTL is an equivalent of the physical tape infrastructure and because is a virtual solution it can scale easily in parallel with the benefit that the provisioning, scaling and maintaining seen with a physical infrastructure are eliminated.

The AWS Storage Gateway can run either on-premises as a virtual machine (VM) or as an EC2 instance directly in AWS.

Let’s discuss the volume gateway architecture in more detail and we will start with gateway-cached volumes. The gateway-cached volumes can be between 1GB and 32TB. There can be up to 20 volumes with a total storage of 150TB.

This is an overview of the gateway-cached volumes deployment:

For the VM deployed, two disks must be allocated.

  • Cache storage disk – this is the disk used to initially store the data when it has to be written to the storage volumes in AWS. The data from the cache storage disk is waiting to be uploaded to Amazon S3 from the upload buffer. The cache storage disk keeps the most recently accessed data for low-latency access. When the application needs data, the cache storage disk is first checked before checking the Amazon S3. There are a few best practice guidelines when the size of the cache storage disk has to be set: it should be at least 20 percent of the existing file store and it should be larger than the upload buffer. The latter advise ensures that the cache storage disk has all the data from the upload buffer disk that wasn’t uploaded to Amazon S3.
  • Upload buffer disk – this is the staging area where the data is stored before it is uploaded to Amazon S3. The storage gateway uploads the data from the upload buffer over an SSL connection to AWS. Then the data is stored encrypted in Amazon S3.

There is the possibility to back up the storage volumes in Amazon S3. The backups are incremental and they are called snapshots. The snapshots are stored in Amazon S3 as Amazon Elastic Block Storage (EBS) snapshots. Incremental backup means that a new snapshot is backing up only the data that has changed since the last snapshot. The snapshots can be taken either at a scheduled interval or as per needed.

Next is gateway-stored volume architecture. Gateway-stored volumes can be between 1GB and 16TB. There can be up to 12 volumes per gateway with maximum volume storage of 192TB.

This is the gateway-stored volume deployment:

Once the VM has been activated, the gateway volumes can be created and mapped to the on-premises direct-attached storage or storage area network disks.

The storage volumes are mounted as iSCSI devices to the on-premises application servers and when the applications write/read the data from the gateway storage volumes, they actually read and write the data from the mapped on-premises disk.

Before the data is uploaded to Amazon S3, the data is stored in a staging area that is called upload buffer. Like for gateway-cached volumes, the data is uploaded to Amazon S3 over a SSL connection and the data is stored in an encrypted form in Amazon S3.

Similar to gateway-cache volumes deployment, you can take snapshots on a scheduled basis or per-needed basis.

The last storage gateway architecture is gateway-VTL architecture. The gateway-VTL architecture allows the extension of the existing tape-based backup infrastructure to store the data on the virtual tapes that are created by the gateway-VTL. Each gateway-VTL comes preconfigured with media changes and tape devices that are made available as iSCSI devices to the existing backup application. A media change is the virtual equivalent of the robot that moves the tapes in a physical library’s storage slots and tape drives. A VTL drive is the virtual equivalent of the physical tape drive that can perform I/O operations.

Let’s talk about the volume gateways.

The gateway-cached deployment allows the user to create storage volumes and mount them as iSCSI devices from the application server. As mentioned, the gateway stores the data in the gateway-cached volume in Amazon S3 and keeps only the frequently accessed data on the on-premises storage hardware.

The gateway-stored deployment allows the user to keep all the data locally and to take incremental backups(snapshots) that are stored in Amazon S3.

When the storage gateway needs to be deployed, there is a long list of requirements. I’m not going over the details as you can find the information by checking the Reference section at the end of the article.

And this would be the introductory article of the series. Now you should have a good understanding of the basic terms of Amazon Storage Gateway and what the three Storage Gateway architectures are.

In the next two articles we will see how you can deploy a gateway-cached volume architecture using VMware ESXi as the host for the Storage Gateway VM and how you can make use of the storage volume created in Amazon S3.