Amazon has announced ways of accessing stored data faster with new options for its Redshift data warehouse such as caching, storage tiering and federated queries.
It comes, AWS says, against a background of AWS customers storing more and more data and wanting to process it faster and with lower costs.
The RedShift upgrade is on of multiple storage announcements delivered this week at re:Invent in Las Vegas. Here is our round up.
Redshift data warehouse
Redshift Managed Storage uses large, high-performance SSDs in each Redshift RA3 instance for fast local storage and Amazon S3 for longer-term durable storage. If the data in an instance grows beyond the size of the SSD storage, Redshift Managed Storage automatically offloads that data to S3. Customers pay the same rate for Redshift Managed Storage whether the data is in SSD or S3. They also only pay for the amount of SSD storage they use.
Existing Amazon Redshift customers using Dense Storage (DS2) instances will now get up to 2x better performance and 2x more storage capacity at the same cost. RA3 16xlarge instances are generally available today to support workloads with petabytes of data (up to 8 PB compressed), with RA3 4xlarge instances coming early next year.
RA3 instance customers can scale compute and storage separately, and get a claimed deliver 3x better performance than other cloud data warehouse providers. We think this refers to Snowflake and Yellowbrick Data. It’s available now.
Redshift gets hardware-accelerated and distributed caching with AQUA (Advanced Query Accelerator) giving a claimed up to 10x better query performance than other cloud data warehouse providers. It is layered on top of S3 and can scale out and process data in parallel across many nodes. A node’s hardware module has AWS-designed analytics processors to accelerate compression, encryption, and processing of filtering and aggregation functions. It will be available in mid-2020.
Redshift Data Lake Export allows customers to export data directly (known as a join) from the Redshift data warehouse to S3 (data lake) in the Apache Parquet open data format optimised for analytics. AWS claims No other cloud data warehouse makes it as easy to both query data and write data back to a data lake in open formats. It’s available now.
Redshift Federated Query lets customers analyse stored data across Redshift data warehouse, S3 (Simple Storage Service), RDS and Aurora (PostgreSQL) databases. It is available in preview mode.
Ultrawarm
UltraWarm offers a new warm storage tier for Elasticsearch Service at up to one-tenth the current cost. This makes it easier for customers to retain any amount of current and historical log data. UltraWarm offers a distributed cache for more frequently accessed data, while using placement techniques to determine which blocks of data are less frequently accessed and should be moved outside of the cache to Amazon S3.
It uses high-performance EC2 instances to interact with data stored in S3, providing a claimed 50 per cent faster query execution versus competing products. Customers can manage up to 3PB of log data within a single Elasticsearch Service cluster and can query across multiple clusters.
AWS said UltraWarm allows developers, devops engineers, and infosec experts to use Elasticsearch Service to analyse operational data spanning years, without needing to spend days restoring data from S3 or Glacier archives to an active searchable state in an Elasticsearch cluster.
Storing log data will now become less expansive as UltraWarm reduces costs by up to 90 per cent to store the same amount of data in Elasticsearch today. It is 80 per cent lower than the cost of warm-tier storage from other managed Elasticsearch offerings. The service is currently available in preview.
Other AWS news
Amazon Managed (Apache) Cassandra Service is a scalable, highly available, and fully-managed database service that supports Cassandra workloads. Developers can use the same Cassandra application code, Apache 2.0 licensed drivers, and tools as they do today.
AWS has announced the general availability of AWS Outposts. These are fully managed and configurable compute and storage racks built with AWS-designed hardware that allow customers to run compute and storage on-premises, while seamlessly connecting to public cloud AWS.
An Outpost rack is conceptually equivalent to a converged infrastructure rack, such as Dell EMC’s PowerONE and VxBlock systems.
AWS is offering new services inside its existing infrastructure and also extending that infrastructure outwards with edge data centres, called Local Zones. These AWS data centres are closer to cities than the typical AWS regional centre.
They offer single digit millisecond access latencies and have a high-bandwidth connection to the regional AWS data centre. Think of them as AWS regional edge data centres and as competition for third-party city colocation operations.
AWS Wavelength enables developers to build applications that deliver single-digit millisecond latencies to mobile devices and users by deploying AWS compute and storage at the edge of the 5G network. This is pretty similar to AWS Local Zones but specific to 5G. AWS is working with Verizon to make AWS Wavelength available across the United States.