AWS debuts cheap and slow Glacier Deep Archive

Amazon Web Services has announced its Glacier Deep Archive is available at about $1/TB/month, the lowest data storage cost in the cloud.

AWS claims this is significantly cheaper than storing and maintaining data in on-premises magnetic tape libraries or archiving data off-site. Of course data retrieval is faster from an on-site tape.

Glacier Deep Archive was previewed in November last year. The service has eleven ’nines’ data durability – 99.999999999 per cent. Ordinary data can be retrieved within 12 hours or less while bulk data at the PB level can take up to 48 hours. The data thaw time is s-l-o-w.

Retrieval costs a standard $0.02/GB or $0.0025/GB for bulk data.

Bulk data is a retrieval speed option. The AWS S3 FAQ says: “There are three ways to restore data from Amazon S3 Glacier – Expedited, Standard, and Bulk Retrievals – and each has a different per-GB retrieval fee and per-archive request fee (i.e. requesting one archive counts as one request).“

In AWS’s US East (Ohio) region standard retrievals cost $0.01/GB, expedited retrievals  $0.03/GB and bulk retrievals $0.0025/GB. Expedited retrievals typically complete in five minutes, standard in five hours and bulk in 12 hours

Check out AWS GDA pricing details here.

Data retrieval

Amazon says GDA is suitable for data that is accessed once or twice a year with either 12 or 48 hour latencies to the first byte.

The Restore (Retrieval) request, effected by an API call through the S3 management console, makes a temporary copy of the data, leaving the GDA-held data intact. The thawed out GDA data is not streamed direct to you on-premises or inside AWS for use as it comes in. You have to wait for the copy to be made.

The copy is accessed through an S3 GET request and you can set a limit for the retention of this temporary copy.

GDA data upload is done through an S3 PUT request or via the AWS management console, or AWS Direct Connect, Storage Gateway, Command Line Interface or SDK.

Tape Gateway

AWS’s Storage Gateway is a virtual tape library (VTL) in the AWS cloud Glacier Deep Archive, with virtual tapes despatched to GDA from your on-premises system. The Tape Gateway code is deployed as a virtual or hardware appliance. No changes are needed to existing backup workflows. The backup application is connected to the VTL on the Tape Gateway and streams backup data to this target device.

Commvault is one backup application that definitely supports the Glacier Deep Archive. Veritas is another.

The Tape Gateway compresses and encrypts the data and sends it off to a VTL in Glacier or the Glacier Deep Archive.

A cache on the Tape Gateway ensures recent backups remain local, reducing restore times.

S3 storage classes

As a reminder AWS offers;

  • Simple Storage Service (S3) Standard
  • S3 Intelligent Tiering
  • S3 Standard Infrequent Access (IA)
  • S3 One Zone IA
  • S3 Glacier for archive
  • S3 Glacier Deep Archive

You can  use S3 Lifecycle policies to transfer data between any of the S3 Storage Classes for active data (S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, and S3 One Zone-IA) and S3 Glacier.

Net:net

Should you give up your on-site tape library? Blocks & Files thinks this needs careful analysis of the costs of tape library storage and retrieval, with retrieval speed taken into account.

SpectraLogic TFinity ExaScale library

There is a crossover at a certain level of data stored and data retrieval frequency and size. And calculating that crossover point for your tape installation is key. Although your on-site tape library costs may be predictable into the future, AWS’s GDA costs are driven by marketing needs. They may dip a little over the next year or three and then rise again if AWS grabs enough tape library market share.

Monopolists see little need to lower prices. Of course, GDA is a good idea if you don’t have an on-site tape library and need to store data long-term with one or two accesses a year.