Home Blog Page 308

ScaleFlux computational storage drives better than vanilla SSDs at PostgreSQL

ScaleFlux’s computational storage drives have slightly better and much less variable multi-threaded performance on a PostgreSQL database application than ordinary SSDs. They also use up substantially less drive capacity than a vanilla SSD because of their integrated compression.

These are the findings of Percona, a database testing firm, which has published its evaluations in a white paper.

ScaleFlux CSD 2000 drives are SSDs with an integrated FPGA processing and compression engine that operate directly on the data stored in the drive. ScaleFlux contracted Percona to test a 4TB CSD 200 against a 3.2TB Intel DC P6410 64-layer TLC SSD with PostgreSQL. 

Robert Bernier, Percona’s PostgreSQL consultant, wrote: “At peak loading, the ScaleFlux CSD 2000 Drive demonstrated less performance variance than that of the Intel DC P4610,” over time.

The significance here is “server predictability. This becomes important when, for example, finely tuned application processes depend upon consistent performance with the RDBMS,” Bernier wrote. “Many a time I’ve seen applications get upset when response times between inserting, updating, or deleting data and getting the resultant queries would suddenly change.”

he also noted “remarkable” space savings when the Postgres fillfactor was reduced. 

The fillfactor for a table is a percentage that determines how full the index method packs the index pages. Percona tested the. CSD 2000 drive with fill factors of 70 and 100 per cent and a variety of threads: 1, 8, 16, 32, 64, 96, 128 and 256. The company measured transactions per second (TPS) and queries per second (QPS) scores read-only, write-only and mixed read/write operations for each test run.

According to Bernier, fillfactor can “become a critical runtime parameter in regards to performance when high-frequency UPDATE and DELETE operations take place on the same tuple over and over again.” A tuple is a database object containing two or more components.

Two read/write TPS charts illustrate the ScaleFlux drive’s greater performance consistency at high thread counts, compared to the vanilla SSD: 

Note the narrower performance spread bands with the SCaleFlux drive. The fillfactor was 100.

A table shows the ScaleFlux drive’s lower capacity take-up;:

The rightmost column shows the ScaleFlux drive’s compression ratios; 1.23:1 with the 100 fillfactor and 2.43:1 at the 70 fillfactor level. On the Intel drive, using the default setting of fillfactor 100, the database consumed 918GB per 1 million records. The CSD 2000 consumed 718GB per 1M records at that level and dropped to 515GB per 1 million records at fillfactor 70.

NetApp makes its big Kubernetes play with Astra

NetApp
NetApp CloudJumper

NetApp today officially released NetApp Astra, the company’s data management suite for Kubernetes workloads.

Astra is a SaaSy NetApp-managed service that protects, recovers and moves containerised applications. There is no software to download, install, manage, or upgrade.

Eric Han, NetApp’s product management VP for public cloud services, said in the launch statement: “Backup, cloning, disaster recovery, data lifecycle operations, data optimisation, compliance, and security are all critical to any organisation. Taken together, these challenges increase complexity. That’s directly at odds with Kubernetes’ goal of simpler, faster and more flexible application development and deployment – a vision that NetApp Astra will help to realise.”

Rushi NS, chief architect of SAP, is a fan: “Astra will make it dramatically easier to manage, protect, and move data-rich Kubernetes workloads across public clouds and on-premises. I look forward to collaborating with NetApp as they continue to develop Astra.” 

Astra details

Astra supports Kubernetes applications hosted on Google Kubernetes Engine clusters in Google Cloud, with NetApp’s fully-managed Cloud Volumes Service (CVS) for Google Cloud acting as the persistent storage provider. 

Astra screenshot.

Customers register their Kubernetes clusters with Astra. The service automatically discovers all applications running in these clusters, and provisions storage and storage classes, using NetApp’s Trident storage orchestrator. It then makes data management functions available for use. These include:

  • Protecting data with snapshots – to recover data inside a single Kubernetes cluster
  • Disaster recovery, using application-aware backup – backups can be sent to a remote Kubernetes cluster in the same region or a different one
  • Application migration with active clones – move entire applications and data to a different Kubernetes cluster, regardless of distance.

Today’s launch marks the completion of the initial phase of NetApp’s Project Astra. The company said it will soon add support for Azure, AWS and on-premises. Astra will support numerous persistent storage providers, application-awareness for popular cloud-native applications, and enhanced data management functionality.

Sayan Saha, NetApp’s senior director of product management, has laid out the direction of Astra development in a blog post: “We view Kubernetes as the next-generation cluster operating system that will run all workloads, both modern and traditional, over time.” 

NetApp Astra.

Comment

Kubernetes is heading to the heart of Enterprise IT. As NetApp’s Sayan Saha, writes: Kubernetes is now the strategic platform of choice for running next-generation workloads that include CI/CD pipelines, scale-out database clusters, machine learning, financial risk modelling, genome sequencing, oil and gas exploration, and media processing.”

Data wrangling for Kubernetes-orchestrated containers is the new storage frontier. Ionir, Pure Storage’s Portworx, Veeam’s Kasten, Commvault’s Hedvig Robin.io, StorageOS, SUSE’s Rancher, MayaData’s Mayastor, Rook and many others are building products and services to mine this very rich, new seam.

MemVerge delivers big genomic sequencing speed boost, with a little help from Optane

MemVerge has announced its BigMemory Optane-boosted DRAM system delivered a 25X increase in genomic sequencing speed at Analytical BioSciences.

Analytical BioSciences (ABio) is a single cell genomics company focusing on accelerating the development of therapeutics with faster sequencing. Genomic sequencing is used to identify virus strains such as SARS-CoV-2 variants, and computing a single-cell RNA sequence can take many hours.

The process is compute-intensive and entails many stages. For instance, large matrices need to fit in the server memory; intermediate stage results are saved and then reloaded for other stages. This introduces storage and recovery bottlenecks that are exacerbated by stage repetition for parameter tuning. However, overall execution time drops by more than half when Optane Persistent Memory is used to increase overall memory capacity.

Chris Kang, Head of Bioinformatics Operations at ABio, issued a statement: “The Big Memory platform that MemVerge and Intel developed accelerates our workflows and helps us generate results much faster, which will lead to more efficient ways to gain greater insights and knowledge in diseases mechanisms and improve healthcare.”

Benchmarks

MemVerge and ABio compared single-cell sequencing runs on a server incorporating two Xeon Gold 18-core CPUS and 192GB of DRAM, and the same server with equipped with with MemVerge’s MemoryMachine software plus 1,536GB of Optane DIMMs.

MemoryMachine combines DRAM and Optane into a single memory pool. The matrix used for the test run was 31,787 x 813,348 cells in size and we have charted and tabulated the results using data from the MemVerge case study

The intermediate stage result data is stored and accessed in the Optane memory pool, instead of slow access storage drives, which means that overall and stage execution times are reduced. Step 5 (above) shows the greatest improvement, with the Optane-boosted system completing more than 25 times faster.

The overall execution time using only DRAM was 23,107 seconds (6.4 hours) and the Optane using server completed the task in 9,015 seconds (2.5 hours), a 60 per cent reduction. The case study does not say what storage was used by the server i.e. whether it was SSD (fast) or hard disk drives (slow).

Slow disk drive storage increases the DRAM-only execution times due the relatively long time it takes to write intermediate results to the disks and then read them back.

MemVerge CEO Charles Fan said: “Until now, memory infrastructure did not offer a viable alternative to storage for genomic sequencing. Big Memory offers the same high-performance as DRAM at a dramatically lower cost, and with the persistence and agility needed for complex data pipelines.” 

By extension, other bioinformatics analyses that use large matrices derived from next-generation sequencing techniques can also be accelerated.

Commvault plays nicely with Qumulo at the backup party

Qumulo has integrated its scale-out file system with Commvault’s Backup & Recovery to store, manage and protect billions of files.

Commvault’s product backs up data from physical, virtual and containerised servers and store backups on target appliances such as Commvault’s HyperScale X – and now Qumulo – or send them to the public cloud.

Qumulo’s scale-out and parallel access Core system can store billions of files on-premises with its own flash and hybrid flash/disk filer hardware or in the public cloud. Qumulo’s Protect software provides snapshot and replication-based backup and disaster recovery for files stored on the Core system.

How they work together

  • Ability for Commvault B&R users to create, compare and delete snapshots stored on the Qumulo Core system
  • Management of file data changes in real-time via REST APIs to know which files within data sets have changed
  • Enabling backup jobs to begin in minutes instead of hours – up to 28 times faster than previously possible, the companies say
  • Management of backup and restore jobs in Commvault’s Command Centre facility, with job status available in real-time
  • Elimination of slow ‘tree-walks’ (file/folder scans) to identify file changes

Backup jobs can be scheduled to run at regular intervals and job file backup actions automatically set up by detecting file data set changes. This detection necessitates a full data set folder scan, called a tree walk. This can take so long in large data sets that it exceeds the time between the scheduled backups, which means changed files may not be backed up, putting data at risk. Qumulo’s software technology speeds up file system scans and avoids that problem. 

The specific software versions for the integration are Commvault Backup & Recovery R11.22 and Qumulo Core v3.0.0.

What the companies say

Wenceslao Lada, Commvault’s VP for Technical Alliances, said the partnership “will offer our customers the simplicity of Qumulo’s data management portfolio while leveraging the reliability and robustness of our premium quality/market leading data protection, backup, and recovery solutions across hybrid cloud environments.”

Ben Gitenstein, VP for Product Management at Qumulo, had this to say: “Whether customers are storing and backing up data in the cloud or on-premises, with Qumulo and Commvault, they can easily manage and protect massive unstructured data sets with ease, furthering their ability to accelerate innovation and unleash the power of their data, wherever it resides.”

ATTO re-invents the RAM Disk with block storage device

ATTO is prepping new generation RAM disk technology in the shape of its SiliconDisk storage appliance. “This is not addressable memory. This is storage. It’s a block storage device,” Peter Donnelly, ATTO Products Group Director, told us.

NY-based ATTO develops high performance storage connectivity products for OEMs and end users. The company’s new SiliconDisk connects to host servers using RDMA across 100Gbit Ethernet, to provide latency at the sub-100 nanosecond level. The appliance is directly attached to servers or shared by several servers across the network, using either direct connections or passing through a switch. It saves data to internal backing NAND if power is lost.

Here are some top-line performance numbers, supplied by ATTO.

  • 6,400,000 4k IOPS
  • 25GB/sec sustained throughput
  • <600ns latency (<0.6μs)

B&F asked Donnelly if ATTO had considered building an Optane-based product. “It was actually one of the options that we considered. And what we found is that the performance wasn’t what we thought it could be. So we actually went down a different path,” he replied. “We actually compare ourselves to the Optane Persistent Memory storage. And we bench better against it.”

We also asked if the SiliconDisk could be used as Nvidia GPUDirect storage and feed data faster to DGX GPU servers? Donnelly said this could be a potential use case “because we are moving data in a RDMA type of environment. So, yes, this could all be pieced together.”

The product is not yet ready and is expected to launch later this year. Pricing was not disclosed.

RAM disk explanation

RAM disks or drives constitute legacy technology. They are DRAM-based and access to data takes place at DRAM speed, many times faster than either SSD or disk drive access speed.

A RAM disk can be a portion of a Windows server’s memory that is set aside to store files that a) need as fast access as possible, and b) where normal caching doesn’t cut the mustard. (SoftPerfect RAM disk is an example.) However, RAM is volatile and the file content is lost when power is switched off or the system crashes.

Another problem is that internal RAM disk capacity is constrained by the amount of the host system’s DRAM and the applications needing to use it. It relies on the availability of spare DRAM. An external RAM disk with a NAND backing store overcomes these problems but introduces another issue – the latency and bandwidth of the network link.

Initially, RAM disks used slow disk drive-based interfaces;

  • 2005 – Gigabyte Technology i-RAM 4GB – SATA I interface at 1.5GB/sec.
  • 2006 – Gigabyte Technology i-RAM 8GB – SATA II interface. At 3GB/sec
  • 2009 – DDRdrive – DDRDrive X1 – 4GB DRAM & 4GB SLC NAND – PCIe Gen 2.0 with x1, x4, x8 or x16 lanes and 25MB/se per lane

The DDRDrive provided 50,000+ 4k random read IOPS (300,000+ 512B random reads) and had a latency around 51μs. Its bandwidth was 215+ MB/sec when writing and 155+ MB/sec reading.

This was not that fast, and take up-was limited. Much faster interfaces and network speeds are available today, giving ATTO the opportunity to reinvent and improve the technology.

Mayastor is fastest open source storage, Intel says. So where are the numbers?

Intel declares “OpenEBS Mayastor is the fastest open source storage for Kubernetes,” but the documentation lacks any details that could allow comparison with other Kubernetes storage products suchs open source Ceph and Gluster or proprietary Portworx and StorageOS.

Mayastor, like Portworx, is an example of container-attached storage (CAS). The idea behind container-attached storage is that the storage controllers are orchestrated by Kubernetes – just like it orchestrates containers – as opposed to via a link to an external array. The data itself is accessed via containers, rather than an off-platform, shared storage system. MayaData’s developing open source OpenEBS CAS software runs within Kubernetes and orchestrates open source storage facilities for containers using any Kubernetes stack.

MayaData CEO Evan Powell writes: “With Container Attached Storage the default mode of deployment is hyperconverged, with data kept locally and then replicated to other hosts which has other benefits as well… The second reason that Container Attached Storage is being adopted is due to the strategic decision of many enterprises and organisations to avoid lock-in.”

Storage engines

MayaData has developed different storage engines for OpenEBS. It defines a storage engine as the data plane component of a persistent volume for a Kubernetes-orchestrated container. The cStor (container Storage) engine became available in 2018 with v0.7 of OpenEBS. It provides iSCSI block storage and supports snapshots, clones and thin provisioning. By 2019, OpenEBS recommended cStor if a cloud-native application was in production and required storage-level replication.

Mayastor was developed in 2019-2020 to take the workflow developed for cStor and make it faster, by using NVMe-oF concepts, for example. This was intended to increase IOPS capability and lower latency. Mayastor entered beta test in November and hasn’t yet been released in a production-ready state.

The CAS performance test scene is developing in fits and starts, and OpenEBS, itself in beta test, started from a trailing position in both StorageOS and Volterra testing.

Volterra testing

Jakub Pavlik of Volterra, a cloud-native platform services supplier, tested the cStor engine’s performance in Feb 2019 against Portworx, Gluster, Ceph and Azure container storage, with all of them running in the Azure public cloud.

February 2019 Volterra test using cStor.

OpenEBS cStor was the worst performer, as a chart of random read and write IO shows. It scored 23.1/2.6 MiB/sec (24.2/2.7 MB/sec) against Portworx’s 749/21.6 MiB/sec (785.4/22.6 MB/sec. It also had the lowest IOPS and longest latency.

Pavlik retested in September last year but used Mayastor instead of cStor. It provided 88.6/29.9 MiB/sec (92.3/31.4 MB/sec) random read/write throughput. He also tested Longhorn from Rancher Labs as the chart shows;

September 2020 Volterra test using Mayastor.

These tests are not directly comparable to the StorageOS tests but indicate that OpenEBS Mayastor would perform at a roughly similar level to Longhorn and Ceph instead of trailing far behind, as it did with the cStor backend.

StorageOS

OpenEBS’s cStor performed least well in a StorageOS benchmark test comparing Ceph, Gluster, Portworx and StorageOS.

This test used cStor and not the later, and faster, Mayastor.

Intel and Optane

A joint MayaData- Intel document describes how Inteltested Mayastor against two Optane Data Centre (DC) SSDs; the P4800X and newer P5800X, and found it contributed less than 10 per cent overhead to the P5800X’s raw device IO.

The testers measured performance in terms of thousands of queries a second (kqps) and IOPS, not bandwidth. They also changed system components during the tests, which makes it harder to compare performance with other suppliers. Intel’s main goal here was not provide a general CAS performance test result but to show how much better gen 2 Optane SSDs were compared with gen 1 Optane.

The document provides incomplete performance results.

For example, there are no kqps or IOPS numbers detailed for the P4800X system. Instead the document claims the “Optane SSD P5800X delivers 2-3 times throughput improvement over Intel Optane SSD P4800X, depending on the configuration.”

The document includes this table to indicate how Mayastor performance compares using the two Optane drives;

It uses different metrics, with undefined units and time, than the prior peak and sustained kqps and IOPS.

Summary

There is no way to generalise from this result to the Volterra and StorageOS Mayastor benchmarks. They didn’t use Optane SSDs or PCIe Gen 4, and Intel doesn’t use a bandwidth measure, all of which makes the comparison impossible. Intel’s declaration that Mayastor is the fastest open source storage for Kubernetes is fine as far as it goes, but that’s not much use in choosing between open source OpenEBS, Ceph, and Gluster, or proprietary Portworx and StorageOS on performance grounds.

Our conclusion from the three tests; Volterra, StorageOS and MayaData-Intel, is that it’s obvious that Mayastor is faster than the earlier cStor, but we don’t know by how much.

Deploy AI workloads with confidence using OpenVINO

3D rendering of AI brain

Sponsored Artificial Intelligence techniques have been finding their way into business applications for some time now. From chatbots forming the first line of engagement in customer services, to image recognition systems that can identify defects in products before they reach the end of the production line in a factory.

But many organisations are still stuck at where to start in building machine-learning and deep-learning models and taking them all the way from development through to deployment. Another complication is how to deploy a model onto a different system than the one that was used to train it. Especially for situations such as edge deployments, where less compute power is available than in a datacentre.

One solution to these problems is to employ OpenVINO™ (Open Visual Inference & Neural Network Optimization), a toolkit developed by Intel to speed the development of applications involving high-performance computer vision and deep-learning inferencing, among other use cases.

OpenVINO takes a trained model, and optimises it to operate on a variety of Intel hardware, including CPUs, GPUs, Intel® Movidius™ Vision Processing Unit (VPU), FPGAs, or the Intel® Gaussian & Neural Accelerator (Intel® GNA).

This means that it acts like an abstraction layer between the application code and the hardware. It can also fine tune the model for the platform the customer wants to use, claims Zoë Cayetano, Product Manager for Artificial Intelligence & Deep Learning at Intel.

“That’s really useful when you’re taking an AI application into production. There’s a variety of different niche challenges in inferencing that we’ve tackled with OpenVINO, that are different from when models and applications are in the training phase,” she says.

For example, Intel has found there is often a sharp decline in accuracy and performance when models that were trained in the cloud or in a datacentre are deployed into a production environment, especially in an edge scenario. This is because the trained models are shoe-horned into a deployment without considering the system they are running on.

In addition, the model may have been trained in ideal circumstances that differ from the actual deployment environment. As an example, Cayetano cites a defect detection scenario in which a camera view close to the production line may have been assumed during training. The camera may be positioned further away so the images may have to be adjusted – which OpenVINO can do.

Interoperability is another issue, according to Cayetano, “We were seeing this trend of a lot of businesses adopting AI and wanting to take AI into production. But there was a gap in developer efficiency where developers had to use a variety of different tools and ensure they were interoperable and could operate with each other in the same way,” she says. “So they needed a toolkit that would have everything that you want from a computer vision library that allows you to resize that image, or that allows you to quantize your model into a different data type like INT8 for better performance.”

And although OpenVINO started off with a focus on computer vision, the feedback Intel got from developers was that they wanted a more general-purpose solution for AI, whether that was for applications involving image processing, audio processing or speech and even recommendation systems.

“So, the shift we did last year was to make OpenVINO more accessible and for developers to be able to focus solely on the problem that they’re trying to solve and not be encumbered by having to pick different toolkits for different workloads,” Cayetano says.

Plug-in architecture

The two main components of OpenVINO are its Model Optimizer and the Inference Engine. The Model Optimizer converts a trained neural network from a source framework to an open-source, nGraph-compatible intermediate representation ready for use in inference operations.

In addition to the toolkit, there is the Open Model Zoo repository on GitHub, which comprises a set of pre-trained deep learning models that organisations can use freely, plus a set of demos to help with development of new ones.

OpenVINO supports trained models from a wide range of common software frameworks like TensorFlow, MXNet and Caffe. The optimisation process removes layers in the model that were only required during training and can also recognise when a group of layers could be unified into a single layer, which decreases the inference time.

The Inference Engine actually runs the inference operations on input data and outputs the results, as well as managing the loading and compiling of the model, and is able to support execution across different hardware types through the use of a software plug-in architecture.

An FPGA plug-in allows execution targeting Intel® Arria® devices, for example, while another supports CPUs such as the Intel® Core™ or Xeon® families. An OpenCL plug-in supports Intel® Processor Graphics (GPU) execution. A Movidius™ API supports VPUs, and there is an API for the low-power GNA coprocessor built into some Intel processors that allows continuous inference workloads such as speech recognition to be offloaded from the CPU itself.

“So, what OpenVINO does is it actually abstracts the developer away from having to code for each type of silicon processor. We have this write-once deploy-anywhere approach where you can take the same code, and then deploy that application into a Core or a Xeon processor, without having to rewrite your code. So, there’s a lot of efficiency to be had there,” Cayetano says.

This radically reduces time to market for projects and means that businesses are not encumbered by having to select the right hardware first or shoehorn their application into a system that does not fit their ultimate needs.

OpenVINO even allows simultaneous inference of the same network on several Intel hardware devices, or automatic splitting of the inference processing between multiple Intel devices (if, for example, one device does not support certain operations).

Real-life scalability

This flexibility means that OpenVINO has now been deployed by many Intel customers, including Philips and GE Healthcare, which use the toolkit for medical imaging purposes.

“Being able to analyse X-ray images or CT scans and be able to more accurately detect tumours, for example, is one use case. The neural network here is essentially similar to the example of the defect detection, it’s detecting if there’s a tumour or not in the X-ray image/datasets and is using OpenVINO to infer that,” Cayetano says.

With GE Healthcare, the firm used the capabilities of OpenVINO to scale their application for several different end customers, from private clinics to public hospitals.

“For one customer, they needed a Xeon processor, because they were optimising for accuracy and performance, whereas another customer was optimising for cost and size, because it was a smaller deployment site,” Cayetano explains. “So not only was GE Healthcare able to take advantage of the optimisation that OpenVINO can do, but also the scalability of being able to write once, deploy anywhere.”

This shows that OpenVINO provides organisations with the ability to take a trained neural network model and build a practical application around it, which might otherwise have taken a lot more effort to build and may not have delivered satisfactory results.

“Now you’re able to have faster, more accurate results in the real world when you actually go into deployments because we have optimisations built into OpenVINO that allow you to do really cool math stuff that only data scientists used to use, like layer fusion for your neural network, for example,” Cayetano says.

“We’ve put a lot of those advanced state of the art optimisations into OpenVINO for the masses to use, not just the top-notch data scientists. And then it’s all interoperable in the one toolkit. So, you can now use the same computer vision libraries with our model optimisation techniques, and quantisation, all these other tools that really streamline your development experience.”

This article is sponsored by Intel®.

What do you see when you put container storage firms into the sorting triangle?

In this week’s storage roundup we look at: container-attached storage; extracting, transforming and loading data into cloud data warehouses, and a fresh new take on the classic 4-box diagram. Read on.

Behold a container storage trimetric

Architecting IT consultant Chris Evans has managed the unimaginable and devised a new multi-box diagram for an IT market – for container-attached storage, in this case. We are familiar with Gartner’s Magic Quadrant, the Forrester Wave, IDC’s Marketscape, GigaOm’s Radar Screen and Coldago’s 4-column Research Map, Now Evans has devised his own representation, a three-layer, triangle-based concept.

The scheme is similar to the GigaOm Radar Screen, with suppliers progressing inwards from challenger and incumbent status to leader. It uses three axes to position them: Execution and Delivery, Market Vision and Awareness, and Solutions Maturity. 

Evans calls it a Trimetric diagram, and here it is in its three-side glory;;

The diagram shows five suppliers; Rook/Ceph, SUSE Longhorn, OpenEBS, Portworx and StorageOS with StorageOS and Portworx, now owned by Pure Storage, being the leaders.

We doff our tricorn hats in salute to this diagrammatic adventure excellence from the excellent Chris Evans.

StorageOS container storage benchmark

StorageOS came out tops in a container-attached storage (CAS) benchmark test compared to Longhorn, Rook/Ceph and OpenEBS castor, and using the open source fio test.

With CAS, persistent storage is implemented as software-defined storage (SDS) running in containers on the same infrastructure – nodes in a Kubernetes cluster – as the applications. The test storage medium consisted of NVMe SSDs.

Container-attached storage benchmark test chart.

StorageOS performed better in all tests, particularly in local read performance with and without a mirrored replica. It delivered greater throughput, bandwidth and with lower I/O latency than the other suppliers.

Supplier speed test results in CAS tests. Check out the report here.

Note: The testing used OpenEBS’ cStor, a precursor of the current and much faster MayaStor. B&F will publish a MayaStore performance article shortly.

Step into the Datameer

Cloud data ingest company Datameer has launched Spectrum ETL++ to get data faster into cloud data warehouses such as Snowflake, Google BigQuery, AWS Redshift, Azure Synapse, Databricks and various cloud data lakes.

George Shahid, Datameer CEO, said: “Datameer Spectrum … enables analytics teams to not only extract data from any sources — on-premises or in the cloud — but to transform it to their needs in a few clicks and load it in any cloud data warehouse provider without having to write a single line of code—and at a fraction of the costs of alternative solutions.“

Spectrum has a multi-user interface with visual, spreadsheet, and SQL interfaces. These can access more than 280 functions and more than 200 data source connectors.

Shorts

Object storage supplier Cloudian has announced its sixth consecutive fiscal year of record bookings, including its highest quarterly sales to date in the fourth quarter of 2020. It expanded its worldwide customer base by 36 per cent for the year, to over 550 customers, whose total Cloudian storage capacity grew 63 per cent.

CTERA has announced strong growth in 2020 with a 47 per cent year-over-year annual recurring revenue increase. Its products are deployed in over 50,000 sites in 110 countries and used by millions of corporate users daily.

Micron has upped revenue guidance for the current quarter from $5.6bn – $6.0bn range to a $6.32bn – $6.25bn range. It thinks its gross margin will also improve.

DR and data protection supplier Zerto said it expanded its customer base by 100 per cent and doubled its global healthcare business in the second half of 2020. There was a 100 per cent increase in project implementation size in the global public sector and a 300 per cent spike in EMEA public sector project implementation. 

HYCU Protégé for O365 is now generally available to protect Office365 (now Microsoft365 or M365) data with its backup-as-a-service, as well as virtual machines and apps on Enterprise Clouds and Public Clouds.

In-memory database supplier Redis Labs has announced the general availability of Redis Enterprise-powered tiers on Azure Cache for Redis. The new Enterprise tiers, managed by Microsoft, enable companies to optimise for low-latency data access in applications that need it. 

Comparing Nvidia A100 GPU storage vendors is a pain

Analysis. An enterprise that is in the market to buy storage systems that attach to Nvidia’s DGX A100 GPU systems will most likely conduct a supplier comparison exercise. Multiple storage vendors are quoting bandwidth numbers for shipping data to Nvidia’s DGX A-100 GPUs. So, that should make the exercise nice and easy.

Not so fast. Unless they use exactly the same configuration components and connectivity, quoted throughput numbers are effectively meaningless for comparison purposes. And yes, the vendors are using different set-ups to arrive at their numbers.

Let’s diagnose the problem.

  1. There is no one standard way of measuring data delivery rates to Nvidia’s GPUs, such as the bandwidth from one array across a certain class link to a single A-100 GPU box.
  2. Storage suppliers may use a single array or a scale-out server-based setup with inherent parallelisation meaning higher throughput.
  3. The suppliers may use a host server CPU/DRAM bypass technology like Nvidia’s GPUDirect or they may not, in which case they will be slower.
  4. The suppliers may have a direct array-to-GPU server link, with no intervening server. They may be in an Nvidia DGX POD reference architecture (RA), with quoted speeds summed across several A-100 systems.
  5. Quoted speeds in reference architecture documents may be exceeded many times by theoretical numbers from scaled out storage nodes. For example, NetApp and Nvidia hit this number in an RA, “A single NetApp AFF A800 system supports throughput of 25GB/sec for sequential reads. They went on to say: “With the FlexGroup technology validated in this [NetApp] solution, a 24-node cluster can provide … up to 300GB/sec throughput in a single volume.” Yes, but… the RA configuration did not include a 24-node cluster.

Blocks & Files has tabulated DGX A-100 delivery data rate numbers quoted by six data storage vendors, and listed certain configuration details such as GPUDirect use.

We specifically note the number of storage arrays or nodes and the number of target A100 systems. Using these numbers and the quoted throughput, we have estimated the single storage node data delivery rate to a single A100 GPU system. The results are in the far right column.

Pears v. Bananas?

The variation is immense, from 3.1GB/sec to 182GB/sec. – so immense that its cast doubt on the validity of my calculated numbers. Either the base numbers used in the calculations are wrong or the basis for making the comparisons is wrong.

Help us out here. Do you think it is valid to compare a NetApp A-800 high-availability pair (3.1GB/sec) with a Pavilion HyperParallel array (91GB/sec) and VAST Data (173.9GB/sec)?

One problem is answering the question; what is a storage node? A Pavilion HyperParallel array with 2.2PB raw capacity is a two arrays. But a WekaIO system runs in a cluster of storage servers. How many cluster nodes equate to a single Pavilion array?

How does a VAST Data array of N petabyte capacity equate to an IBM Spectrum Scale cluster of ES3000 nodes and associated Spectrum Scale servers? Should these comparisons be made on a raw capacity basis or on a number-of-controllers basis?

The data and comparison methodology is not available to do this, certainly not to us. There is no standard benchmark for testing data delivery rates to Nvidia’s DGX A100 GPUs that could provide such comparative data either.

Excelero has said it is working with Nvidia to support the A100 and it may well produce a throughput number for its NVMESH technology. But that number will be effectively useless unless there is a way to compare it to the other suppliers’ numbers. Caveat emptor.

Loss-making Snowflake doubles down in dash for growth

It takes Snowflake Computing six months or more from signing up a big enterprise customer for the revenues to start rolling in. Onboarding migration delays are to blame. This was one of the key takeaways from this week’s Q4 earnings call, in which the cloud data warehouse firm revealed why it is losing so much money.

Revenue in the quarter was $190.5m, up 117 per cent Y/Y and beating forecasts. However, the net loss was $198.9m, 139 per cent worse Y/Y. Full year revenue was $592m, up 124 per cent, and the company posted a loss of $539.1m – 54.7 per cent worse Y/Y. (Read The Register‘s report for more details.)

Earnings call

So what gives? The company is spending big in a dash for growth and operating expenses are outstripping revenues. Snowflake is on a hiring spree and doesn’t intend to slow down. It hired 800 people in fiscal 2021 and aims to hire even more in fiscal 2022. 

Frank Slootman.

CEO Frank Slootman said this in the earnings call: “We’re going to add 1,200 people next year. Actually, Q1 is a very, very big onboarding quarter. It will be probably the largest quarter of the year because we’re onboarding a lot of people in the sales and marketing organisation in advance of our sales kickoff that we just had. We are investing as quickly while being efficient in our business as we can.”

The revenue problem here is that Snowflake has focused on winning enterprise customers and they take many months to migrate their existing on-premises data warehouses to Snowflake’s cloud.

Migration drag

CFO Mike Scarpelli said: “I want to stress, it takes customers, especially if you’re doing a legacy migration, it can take customers six months-plus before we start to recognise any consumption revenue from those customers because they’re doing the data migration. And what we find is – so they consume very little in the first six months and then in the remaining six months, they’ve consumed their entire contract they have.”

Scarpelli said: “We are landing more Fortune 500 customers. We talked about we landed 19 in the quarter, but those 19 we landed, just to reiterate, we’ve recognised virtually no revenue on those customers. That’s all in the RPO that will be in the next 12 months.”

RPO (remaining performance obligation) represents an amount of future revenue that has been contracted with customers, but is not yet recognised as revenue.

Engineering spend

Slootman revealed that Snowflake engineering is developing the product to work faster: “Historically, we have shared data through APIs and through file transfer processes, copying, and replicating. It’s been an enormous struggle. The opportunity with Snowflake as to make this zero latency, zero friction, completely seamless.”

The need is to reduce what Snowflake calls data latency. Slootman explained: “One of the areas that we are investing in where we have extraordinary talent that we have attracted [to] the company is where our event-driven architectures. Today, our event latency is sort of seconds and minutes, right? But you want to drive that down to sub-second and dramatically sub-second.”

He said: “That obviously, that requires tremendous optimisation on our part and we are working on that because we see that as a very, very critical part of the ongoing evolution of digital transformations. … we have to become much, much faster than what we’ve done so far. … this is going to expand the marketplace in places where these technologies historically have not been.”

By the numbers

A chart shows the dramatic increase in losses over the last two quarters; 

Spot this two plunging red bars on the lower right.

Snowflake’s revenue growth is accelerating, as a second chart shows, with its steepening quarterly revenue lines.

Snowflake’s operating expenses have shot up faster still in the two most recent  quarters;

Snowflake watchers have to wait and see if its plans deliver increased growth in the next few quarters. Slootman is betting on it.

Seagate competes with its OEMs through StorONE

Seagate is selling its Exos AP drive arrays through StorONE, a software-defined storage startup. This takes the company into direct competition with OEMs such as Dell EMC, Hitachi Vantara, HPE, IBM and NetApp , which use Seagate disk drives in their arrays and filers.

Gal Naor

“Seagate’s Exos AP 5U84 Platform when combined with StorONE is an ideal solution for data centers of all sizes trying to consolidate workloads and reduce storage footprint,” said Gal Naor, CEO, and Co-founder of StorONE. “The Application Platform solution allows us to showcase our investments in the efficient utilisation of storage hardware and our next-generation hybrid technology.” 

The Exos AP 5U84 is a Seagate-built 5 rack unit, dual controller array with 84 x 3.5-inch drive bays. StoreONE installs its S1: Enterprise Storage Platform software in this box to provide an active:active storage array delivering over 200,000 IOPS, and more than 1PB of capacity for under $175,000. A 1.48PB configuration uses 70 x 18TB Exos disk drives for capacity, and 14 x 15.36TB SSDs for random IO performance heft while a 1.4PB config uses 216TB Exos disks.

StorONE will also sell you a smaller, 106TB hybrid system, based on Seagate’s 2RU, 12-slot AP 2U12 array. This is priced at $36,000.

Seagate Exos AP 5U84.

Seagate is making quite the push behind AP 5U84, last week announcing Lyve Cloud, an object storage services based on the system that resides in Equinix co-location centres in the USA. 

The company is also keen to counter some analyst views that SSDs are taking over from nearline disk drives. The Seagate-StorONE announcement declares: “Mainstream data centres require affordable high-performance and high-capacity. Many storage vendors have left these organisations behind, even though they represent most data centres in the world.”

That’s ‘left behind’ in the sense that these storage vendors are moving to all-flash, SSD  arrays.

A disadvantage of disk drives is that high capacity drives can take 24 hours or more to be rebuilt in a RAID scheme after a failure. StorONE’s vRAID technology rebuilds failed drives in under two hours, the companies say, without specifying a capacity, . 

In October 2020 StoreONE said it could rebuild a 16TB drive in under five hours, and a failed SSD in three minutes.

HPE sharpen’s SimpliVity edge backup

HPE has enabled updated its SimpliVity HCI operating system to protect data better and provision storage to containerised apps.

SimpliVity 4.1.0 can backup to the public cloud with Cloud Volumes Backup and integrates more deeply with the StoreOnce deduplicating target backup appliance.

The SimpliVity hyperconverged systems are positioned as one-box, do-it-all, remote office and branch office IT systems. They are pitched at distributed enterprises such as fixed site bricks and mortar retail businesses, and mobile site ones like oil and gas drillers and racing car teams. These traditional ROBO locations are now characterised as “Edge computing” but they are still classic ROBO IT operations.

The SimpliVity hardware – for example, a 1RU ProLIant DL325 server – is installed in the remote/branch sites. It is mounted in a rack, two cables are for power and networking are connected, and then switched on.

HPE DL325 1 RU server

Software is installed using VMware’s vCentre in the main data centre. Apps are installed as virtual machines in the same way.

Virtual machine backups are run at the remote site, with policies set centrally, and the data is sent to an HPE StoreOnce appliance in the data centre or to HPE’s Cloud Volumes Backup in a public cloud.

A Kubernetes CSI plug-in to the SimpliVity OS means that containerised apps can be pushed to the remote sites and use persistent volume storage in the SimpliVity HCI box. Their data is protected in the same way as VM data.

Pivot3 and Scale Computing also position their HCI systems as Edge computing boxes. HPE’s own dHCI (disaggregated HCI) Nimble systems are marketed as data centre systems, not edge systems. Dell EMC supplies its VxRail systems for both edge and data centre use.

HPE SimpliVity 4.1.0 is available worldwide at no additional charge for customers with valid support contracts. All new features and capabilities announced are supported with VMware vSphere 7.0. HPE Cloud Volumes Backup support is available in the Americas, Europe and Asia.