Home Blog Page 408

A potted history of all-flash arrays

Fourteen years ago Violin Memory began life as an all-flash array vendor, aiming to kick slower-performing disk drive arrays up their laggardly disk latency butt. Since then 18 startups have come, been acquired, survived or gone in a Game of Flash Thrones saga.

Only one startup – Pure Storage – has achieved IPO take-off speed and it is still in a debt-fuelled and loss-making growth phase. Two other original pioneers have survived but the rest are all history. A terrific blog by Flashdba suggested it was time to tell this story.

Blocks & Files has looked at this turbulent near decade and a half and sees five waves of all-flash array (AFA) innovation.

We’re looking strictly at SSD arrays, not SSDs or add-in cards, and that excludes Fusion IO, STEC, SanDisk – mostly – and all its acquisitions, and others like Virident.

A schematic chart sets the scene. It is generally time-based, with the past on the left and present time on the right hand side. Coloured lines show what’s happened to the suppliers, with joins representing acquisitions. Stars show firms crashing or product lines being cancelled. Study it for a moment and then we’ll dive into the first wave of AFA happenings.

First Wave

The first group of AFA startups comprised DSSD, Kaminario, Pure Storage, Skyera, SolidFire, Violin Memory, Whiptail, X-IO and XtremIO. Pure and XtremIO achieved major success and XtremIO post-acquisition by EMC, became the biggest-selling AFA of its era, achieving $3bn in revenues after three years availability.

XtremIO bricks

EMC was convinced of AFA goodness and spent $1bn buying DSSD, an early NVMe-oF array tech – but it bought a dud. After Dell bought EMC it canned the product in March 2017. This was possibly the biggest write-off in AFA history.

Pure Storage grew strongly, IPOed and has now joined the incumbents, boasting a $1.6bn annual revenue run rate.

Kaminario survives and is growing. Violin has survived a Chapter 11 bankruptcy and is recovering from walking wounded status.

Texas Memory Systems was bought by IBM and its tech survives as IBM’s FlashSystem arrays. Skyera stumbled and was scooped up by Western Digital in 2014.

SanDisk had a short life as an AFA vendor with its 2014-era IntelliFlash big data array, before it was bought by Western Digital in 2015 for an eye-watering $19bn. That was the price WD was willing to pay to get into the enterprise and consumer flash drive business.

SolidFire was bought by NetApp for $870m in December 2015.

SolidFire array

Whiptail was bought by Cisco in September 2013 for $415m. It found it had bought an array tech that needed lots of development work. In the end itcanned the Invicta product in June 2015.

Second wave – hybrid starts go all-flash

The next round of AFA development came from Nimble, Tegile and VM-focused Tintri. These three prominent hybrid array startups quickly went all-flash and formed a second AFA wave;.

All have been acquired. HPE bought Nimble with its pioneering InfoSight cloud management facility for its customers arrays. Nearly every other array supplier has followed Nimble’s lead and HPE is extending the tech to 3PAR arrays and into the data centre generally.

Poor Tintri crashed, entered Chapter 11 and its assets were bought for $60m by HPC storage supplier DDN in September last year. Tintri gives it a route into the mainstream enterprise array business.

X-IO was another hybrid startup that went all-flash. It stumbled, went through multiple CEOs and then, under Bill Miller, sold off its ISE line to Violin. It continues as Axellio, a maker of all-flash IOT edge boxes.

Incumbents retrofit and acquire

The seven mainstream incumbent suppliers all bought startups or /and retrofitted their own arrays with AFA tech and in two cases tried to develop their own AFA technology. One, NetApp’s FlashRay, was killed off on the verge of launch in favour of AFA retrofitted ONTAP.

The other, HDS’s in-house tech, survives but is not a significant player. In other words, no incumbent developed an AFA tech from the start that became a great product.

Dell EMC retrofitted flash to VMAX and VNX arrays on the EMC side of the house, and SC arrays on the Dell side. IBM flashified its DS8000 and Storwize arrays. HPE put a flash transplant into its 3PAR product line.

And Cisco? Cisco gave up after killing Invicta.

Invicta appliance

Interfaces

Initially, SSDs were given SATA and SAS interfaces. Then much faster multi-queue NVMe interfaces were used with direct access to a server or drive array controller’s PCIe bus, instead of indirect access through a SATA or SAS adapter.

This process is ongoing and SATA is on the way out as an SSD interface. NAND tech avoided the planar (single layer) development trap looming from every smaller cells becoming unstable, by reverting to large process sizes and layering decks of flash one above the other in 3D NAND.

It started with 16, then 32, 48, 64, and is now moving to 96-layers with 128 coming. At roughly the same planar-to-3D NAND transition time, single-bit cells gave way to double capacity MLC (2bits/cell) flash, then TLC (3bits/cell) and now we are seeing QLC (4 bits/cell) coming.

The net:net is that SSD capacities rose and rose to equal disk drive capacities – 10 and 12 and 14TB – and then surpass them with 16TB drives.

This process accelerated the cannibalisation of disk drive arrays by flash arrays. All the incumbents are busy helping their customers replace old disk drive arrays with newer AFA products. It’s a gold mine for them.

Third wave of NVME-oF inspired startups

We have also seen the rise of remote NVMe access, extending the NVMe protocol across networking links such as Ethernet and InfiniBand initially and lCP/IP and Fibre Channel latterly, to speed up array data access.

This technology prompted a third wave of AFA startups: Apeiron, E8, Excelero, Mangstor and Pavilion Data. Interestingly, DSSD was a pioneer of NVMeoF access but, among other things, was too early with its technology.

Late arrival Vast Data has seasoned its NVMe-oF tech with QLC flash and Optane storage-class memory, giving it a one array-fits-most-use-cases- product to sell.

Mangstor crashed and fizzled out, becoming EXTEN Technologies, but the others are pushing ahead, trying to grow their businesses before the incumbents adopt the same technology and crowd them out.

However, the incumbents, having learnt the expensive lesson of buying in AFA tech, are adopting NVME-oF en masse.

The upshot is that 15 companies are pushing NVME-oF arrays at the market.

The storage-class memory era arrives

Storage-class memory (SCM), also called persistent memory, as exemplified by Intel’s Optane memory products using 3D XPoint non-volatile media, promises to greatly increase data access speed. Nearly all the vendors have adoption programs. For instance:

  • HPE has added Optane to 3PAR array controllers.
  • Dell EMC is adding Optane to its VMAX and mid-range array line.
  • NetApp is feeding Optane caches in servers from its arrays.
Optane SSD

The third wave of startups need to adopt SCM fast or face the prospect of getting frozen out of the NVMe-oF array market they were specifically set up to develop.

Fast-reacting incumbents are moving so quickly that large sections of the SCM-influenced array market, the incumbent customer bases, will be closed off to the third wave startups and that will result in supplier consolidation.

It has always been that way with tech innovation and business. Succeed and you win big. Fail and your fall can be long and miserable. But we salute the pioneers- the healthy like Pure and Kaminario, and the ones with arrows in their back – DSSD, Mangstor, Tintri, Violin, Whiptail, X-IO.

You folks helped blaze a trail that revolutionised storage arrays for the better. and there is still a ways to go. How great is that.

Vexata dredges new channels as storm clouds gather

Vexata, the extreme high-performance storage array startup, has laid off several staff and imposed pay cuts, according to industry sources. The company declined to comment on pay or confirm the number of job losses.

Founded in 2013, Vexata has raised $54m in four funding rounds, of which the most recent was a $5m top-up in 2017.

The company’s scale-out VX-Cloud software and VX-100 software/hardware system comprises intelligent front-end servers talking to intelligent back-end NVMe storage nodes. Performance is claimed at 20 million IOPS.

In an interview, Vexata said a recent reorganisation saw the company move from direct sales to channel partnerships and had resulted in some departures. We can confirm at time of writing four job losses, but our sources say several more are in the market looking for new roles.

Blocks & Files understands Farad Haghighi, VP worldwide support and services, and Stephen King, director of sales and business development, left in January. Jack Dyke, senior solutions engineer, was laid off recently and Mithun Jose, staff engineer in India, left this month.

Vexata’s view

Rick Walsworth, VP product and solution marketing, told us in an email interview, that Vexata had “made some strategic shifts and subsequent adjustments in staffing and expenses to align to the new strategies. Specifically, we have adjusted focus on two areas; Go to Market motion and Product Development.

He explained: “As of Q4 of last year we made a strategic shift towards partner-driven sales motion vs a direct sales motion.  While we retain a handful of strategic sales executives for large accounts, we are driving most of our sales growth through this new partner motion. 

“The Fujitsu announcement in November last year was the first of many anticipated announcements. Since that partnership announcement, Fujitsu has been able to nearly double our pipeline in the last quarter. We expect this to ramp even further as we engage with Fujitsu on a global scale.”

Walsworth said the company had “also ramped work on our cloud and cloud-scale focused offering, which you saw as part of the VX-Cloud announcement a few weeks back. This shift meant an alignment on engineering talent around this new market and appropriate adjustments to ensure we can deliver to the unique product and market needs.”

Pivotal changes

The all-flash array market is a tough business environment for startups. Judging from its actions Vexata is seeking to reduce cash burn as it fights to gain traction in a market dominated by established storage vendors.

This explains the pivot to partner sales to expand sales coverage and also to cloud software to reduce engineering costs and capitalise on customer movement to the cloud.

SoftIron builds out storage line for Ceph-alitics

SoftIron has announced three Ceph-based storage systems – an upgraded performance storage node, an enhanced management system and a front-end access or storage router box

Ceph is free source storage software that supports block, file and object access and SoftIron builds scale-out HyperDrive (HD) storage nodes for Ceph. These can outperform dual-Xeon commodity hardware (see below) and are 1U enclosures, with an Arm64 CPU controlling a set of disk drives or SSDs. There are Value, Density and Performance versions.

The Value system (HD11048) has 48TB of disk-based storage using 8 drives. The Density system (HD11120) has 120TB of disk capacity while the Performance variant has 56TB of SSD capacity (HD31056), using 14 x 4TB SSDs.

SoftIron’s existing Performance node is a single processor system. The new HD32112 Performance Multi-Processor node has two CPUs and 28 x 4TB SSDs, totalling 112TB. This is twice the capacity of the first Performance system.

SoftIron Performance Multi-Processor system board showing two CPUs

Tabular comparison of the two SoftIron Performance products

The new HD Storage Manager appliance simplifies the complex Ceph environment for sysadmins. The GUI enables:

  • Centralised management of HyperDrive and Ceph
  • Update, monitor, maintain and upgrade HyperDrive nodes
  • Manage file shares SMB (CIFS) and NFS, object stores (S3 and Swift), and block devices (iSCSI and RBD – RADOS block devices)
  • Single HD product management facility
  • Insight into Ceph storage health, utilisation and predicted storage growth

The HD Storage Router centralises front-end client access protocols to the storage nodes. It is a 1U system which provides Ceph’s native block (RBD), file (Ceph-FS) and object (S3) access, adding iSCSI block  and NFS and SMB/CIFS. The iSCSI block and file protocols can operate simultaneously.

The NFS facility enables support of Linux file shares and virtual machine storage. SMB and CIFS enable support of Windows file shares.

Performance

SoftIron’s products have skinny ARM processors compared to commodity storage boxes with beefier Xeon CPUs. However SoftIron claims its integrated hardware and software systems outperform Ceph running on commodity dual-Xeon hardware.

With sequential reads and writes a SoftIron entry-level system (8-core ARM CPU, 10 x 6TB 7,200rpm disk drives, 256GB SSD for write journalling) achieved 817MB/sec peak bandwidth, with SoftIron’s HyperDrive being 26 per cent faster.

A cluster of three systems achieved 650MB/sec. Each node used 2 x 8-core Xeon E5-2630 v3 processors, 12 x 1.8TB SAS disk drives and 4 x 372GB SSDs.

Sequential object write performance comparison

With cached random object write performance the SoftIron system peaked at 606MB/sec with the dual Xeon box going faster at 740MB/sec.  

But supremacy was regained with cached random object read performance. The Xeon system reached 2,294MB/sec while SoftIron’s box went 44 per cent faster at 3,300MB/sec.

Cached random object read performance comparison

The term ‘OSD’ on the charts stands for Object Storage Device, a storage node in Ceph terms.

Summary

The three new SoftIron products bring flash capacity almost up to disk-based capacity (112TB NAND vs 120TB disk), add improved and simplified management and extend Ceph usage by adding iSCSI and NFS gateway functionality. This means that an enterprise lacking Ceph-skilled admin staff can envisage using it for the first time with a variety of potential storage use cases.

However, the SoftIron system performs poorly on Ceph object writes compared to a dual-Xeon CPU system. On object reads it’s faster.

We have no information about the performance of SoftIron Ceph systems on block or file access by the way.

AWS debuts cheap and slow Glacier Deep Archive

Amazon Web Services has announced its Glacier Deep Archive is available at about $1/TB/month, the lowest data storage cost in the cloud.

AWS claims this is significantly cheaper than storing and maintaining data in on-premises magnetic tape libraries or archiving data off-site. Of course data retrieval is faster from an on-site tape.

Glacier Deep Archive was previewed in November last year. The service has eleven ’nines’ data durability – 99.999999999 per cent. Ordinary data can be retrieved within 12 hours or less while bulk data at the PB level can take up to 48 hours. The data thaw time is s-l-o-w.

Retrieval costs a standard $0.02/GB or $0.0025/GB for bulk data.

Bulk data is a retrieval speed option. The AWS S3 FAQ says: “There are three ways to restore data from Amazon S3 Glacier – Expedited, Standard, and Bulk Retrievals – and each has a different per-GB retrieval fee and per-archive request fee (i.e. requesting one archive counts as one request).“

In AWS’s US East (Ohio) region standard retrievals cost $0.01/GB, expedited retrievals  $0.03/GB and bulk retrievals $0.0025/GB. Expedited retrievals typically complete in five minutes, standard in five hours and bulk in 12 hours

Check out AWS GDA pricing details here.

Data retrieval

Amazon says GDA is suitable for data that is accessed once or twice a year with either 12 or 48 hour latencies to the first byte.

The Restore (Retrieval) request, effected by an API call through the S3 management console, makes a temporary copy of the data, leaving the GDA-held data intact. The thawed out GDA data is not streamed direct to you on-premises or inside AWS for use as it comes in. You have to wait for the copy to be made.

The copy is accessed through an S3 GET request and you can set a limit for the retention of this temporary copy.

GDA data upload is done through an S3 PUT request or via the AWS management console, or AWS Direct Connect, Storage Gateway, Command Line Interface or SDK.

Tape Gateway

AWS’s Storage Gateway is a virtual tape library (VTL) in the AWS cloud Glacier Deep Archive, with virtual tapes despatched to GDA from your on-premises system. The Tape Gateway code is deployed as a virtual or hardware appliance. No changes are needed to existing backup workflows. The backup application is connected to the VTL on the Tape Gateway and streams backup data to this target device.

Commvault is one backup application that definitely supports the Glacier Deep Archive. Veritas is another.

The Tape Gateway compresses and encrypts the data and sends it off to a VTL in Glacier or the Glacier Deep Archive.

A cache on the Tape Gateway ensures recent backups remain local, reducing restore times.

S3 storage classes

As a reminder AWS offers;

  • Simple Storage Service (S3) Standard
  • S3 Intelligent Tiering
  • S3 Standard Infrequent Access (IA)
  • S3 One Zone IA
  • S3 Glacier for archive
  • S3 Glacier Deep Archive

You can  use S3 Lifecycle policies to transfer data between any of the S3 Storage Classes for active data (S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, and S3 One Zone-IA) and S3 Glacier.

Net:net

Should you give up your on-site tape library? Blocks & Files thinks this needs careful analysis of the costs of tape library storage and retrieval, with retrieval speed taken into account.

SpectraLogic TFinity ExaScale library

There is a crossover at a certain level of data stored and data retrieval frequency and size. And calculating that crossover point for your tape installation is key. Although your on-site tape library costs may be predictable into the future, AWS’s GDA costs are driven by marketing needs. They may dip a little over the next year or three and then rise again if AWS grabs enough tape library market share.

Monopolists see little need to lower prices. Of course, GDA is a good idea if you don’t have an on-site tape library and need to store data long-term with one or two accesses a year.

Read this! The database market’s long-awaited shake-up

I’ am bewildered by all the changes in the database market. It used to be so simple with relational databases but then along came NoSQL and the confusion started.

Today there are multiple types of non-relational databases, public cloud databases, business analytics databases and unified relational-NoSQL databases and graph databases and ….

William Blair database report diagram

It has become too much for generalists to easily comprehend what’s going on.

And so my thanks goes to Jason Ader, an analyst at William Blair, who has written a comprehensive report on the database market report, in which he details technology development trends, classifies products, sizes the market and profiles the main players.

William Blair kindly says we can distribute it to our readers and here it is;

To download the report click on this: William Blair Database Market report

Blocks & Files hopes you find it a useful read.

Zoned SSDs live fast, die old, Western Digital claims

Western Digital is developing zoned SSDs that can be managed by host systems to speed data access and prolong the life of the drives.

This could make short-life QLC (4bits/cell) flash drives usable for fast-access archive. Applications that currently use shingled disk drives for this purpose could instead use zoned SSDs with few code changes to increase data access speed.

Shingled disk drives

Shingling exploits the difference in width between larger disk drive write tracks and read tracks. The tracks are shuffled closer, keeping read tracks separate while overlapping write tracks at their edges.

Disk shingling concept

This means the drive can hold roughly 20 per cent more data.

But data can no longer be directly rewritten as that would also alter an underlying write track.

Instead whole blocks of write tracks are rewritten when any data in the block needs rewriting. This necessitates reading the data in the block, adding the new data and then writing the data back to the block of tracks.

Three points to note: first, this is similar to the read-program-erase (PE) cycle used when blocks of written data are recovered for re-use in SSDs. Second, in Western Digital’s scheme the host manages the shingled drive re-write process and not the drive itself. Third, a block of tracks is also a zone of tracks.

Shingled drives are finding favour with hyperscale customers who need to store PBs of read-centric bulk data as cost effectively as possible, and relish turning a raw 14TB disk drive into, say, a 16TB shingled disk drive.

Zoning NAND

In some cases, these customers need faster access to read-intensive reference data than disk storage can provide. Step forward QLC (4bit/cell) flash. However it wears out faster than TLC (3bits/cell) flash. QLC has a low PE cycle number – possibly 1,000 to 1,500.

Data is written anywhere there is free space in an SSD, without reference to usage patterns or access frequency. Each SSD looks after its own data deletion, in a process called garbage collection that recovers blocks for re-use.

This involves reading any valid data in the blocks and writing it elsewhere in the SSD so the contents of a whole block can be erased and recovered.

The additional writes are called write amplification and this can prolong the life of an SSD if the write amplification factor is as low as practicable. Western Digital thinks zoning can achieve this and that applications that host-manage shingled drives can managed zoned SSDs equally well – the interface is basically the same.

Zoned NameSpace SSDs

Matias Bjørling Western Digital’ director for solid-state system software ,gave a presentation on this topic at the Vault’19 Linux Storage and Filesystems conference in February 2019. He said a NVMe working group developing industry standards for Zoned Namespaces (ZNS) has a technical proposal for the technology.

NVMe-access SSDs have parallel IO queues and these can be used to direct different data types to different zones of an SSD in WD’s Open-Channel concepts scheme. The zones can span multiple dies their size is based on the SSD’s block size.

This image has an empty alt attribute; its file name is WD-Open-Channel-Concepts.jpg

Bite-sized chunks

The drive’s address space is known as its Logical Block Address (LBA) range and this is divided conceptually into chunks, which are multiples of the SSD’s block size.

Chunks are grouped together into zones and these zones are aligned to the NMAND block size and zone capacity is aligned to the SSD’s capacity.

Different zones can be dedicated to different types of data – music, video, images, etc. – so that the usage pattern for data within any one zone is constant and predictable.

Each zone is written sequentially. Incoming data of any one type is divided into chunk-sized pieces and written to a specific zone in sequential format. Zones can be read randomly and are deleted as a whole, which reduces consequent write amplification to near zero and prolong’s the SSD’s life.

The SSD controller’s own workload is reduced and it needs less DRAM to do it, which lowers its cost. As well as reduction in the write amplification factor, the SSD needs less over-provisioning of capacity to replace worn-out cells during its working life., IO performance is more consistent, as there is little or no interruption from device-side garbage collection or wear-levelling.

SK Hynix and Microsoft

SK Hynix is developing a zoned SSD and has blogged about it. It suggests zoned SSDs could be 30 per cent faster than their traditional counterparts and last four times longer.

SK Hynix zoned M.2 format SSD

Microsoft is also keen on the idea, with principal hardware program manager Lee Prewitt speaking on the subject at the Open Compute Project summit this month. A video of his pitch can be seen here and his thoughts on ZNS begin at 8min 50sec.

This image has an empty alt attribute; its file name is Prewitt-pitch.jpg
Lee Prewitt’s Microsoft ZNS pitch at the OCP Summit

Blocks & Files thinks zoned SSDs will find a role as storage devices for bulk, fast-access reference data and help make both TLC and QLC flash usable as a near-archive data store. Get Matias Bjørling’s slide deck here.


Sitting on the dock of Bay Microsystems, Vcinity says data distance is no problem

Vcinity, a San Jose networking startup, has transferred 1PB of file data across a 4,350 mile, 100 Gbit/s WAN,  with 70 ms latency in 23 hours and 16 minutes.

It says it can use up to 97 per cent of the available network bandwidth in the link.

The company’s  Ultimate X (ULT X) products use a fabric extension technology, and support InfiniBand and also RDMA over Converged Ethernet (RoCE).

Vcinity says it can make a remote filer seem as if it were local. Chief Technology Officer Russel Davis said: “Our technology has shattered the mould of data-access – we extend RDMA over global distances, so users can remotely access, manage and manipulate large datasets.”

Vcinity networking concept scheme

This PB-transferred-in-under-24-hours claim rang a distant memory bell. Surely … yes, Bay Microsystems did just that in May 2018, moving a petabyte of data across a 2,800 mile network link in a little over 23 hours. It said it took the distance out of data at the time.

How do these two companies and their technologies relate to one another?

Bay Microsystems

Bay Microsystems was founded in 1998 and went through six funding rounds, taking in $34.4m publicly and possibly more privately. Its CEO from November 2015 to October 2018 was Henry Carr and its COO and EVP for emerging solutions was Russ Davis from January 2014 to October 2018.

Vcinity came out of stealth mode in October 2018 and its CEO is Henry Carr. Russ Davis is also its CTO and COO. The SVP for Engineering is Bob Smedley and he was SVP Engineering at Bay Microsystems too. The CFO is Mike McDonald and his LinkedIn entry says he has been in that position since 2004, which is nonsense as the firm was only founded in 2018.

Sure enough, Crunchbase lists him as Bay Microsystems CFO.

Vcinity says it acquired Bay Microsystems assets in July 2018. There was no fanfare and no reason was given for Bay Microsystems going out of business.

Steve Wallo, VP for Sales Engineering at Vcinity, held the same role at Bay Microsystems. Mark Rodriguez, VP Product Management at Vcinity, was in the same position at Bay Microsystems.

There are no recorded funding rounds for Vcinity, but the board includes James Hunt, a professional investor, and Scott Macomber, a private investor.

Vcinity is Bay Microsystems reborn

It’s obvious that Vcinity is actually Bay Microsystems reborn, probably re-financed, and the technology is re-worked Bay Microsystems technology.

Vcinity sells its technology as RAD X-1010e and RAD X-1100 PCIe cards and ULT X software running in off-the-shelf Linux servers or in virtual machines.  This presents a NAS (NFS and SMB) interface to accessing application servers.

Ultimate X server

The products can be used to move data fast across long distances for migration purposes, or to carry out real-time actions on far-away files.

Get an Ultimate X product brief here. Vicinity only sells through channel partners and has them in the USA (12), Europe (3) and Asia (2). Find out more from them.

Fujitsu Eternus array shatters SPC-1 storage benchmark

Fujitsu has set the new top score for the Storage Performance Council SPC-1 benchmark, scoring 10,001,522 IOPS with its ETERNUS DX8900 S4 array.

In SPC-1 benchmark terms this is the fastest storage array in the industry.

The SPC-1 v3 benchmark tests a storage array with a single business critical-type workload and supports deduplication and compression. Fujitsu’s ETERNUS DX8900 S4 is a high end and hybrid disk and flash array with six ‘nines availability and a compression engine.

It supports NVMe drives and 32Gbit/s Fibre Channel links. The test configuration was all-flash with 230,400GB of storage capacity using 576 x 400GB SAS SSDs, and linked to 44 servers.

SPIC-1 results summary table for ETERNUS DX8900 S4

The next highest-scoring system is a Huawei OceanStor 18000nV3, which achieved 7,000,565 IOPS. Another Huawei system, an OceanStor 18800F V5, is third with 6,000,572 IOPS.

As well as the fastest system tested the Fujitsu array is also the most expensive at $6,442,522.88. The second-placed Huawei system’s total price was $2,638,917.96.

By charting IOPS performance against cost per thousand IOPS we can get a better picture of how the DX8900 stacks up against other systems.

Fujitsu’s array is way out to the right in a category all of its own, leaving Huawei’s arrays behind in the dust.

MemVerge harnesses Optane memory to make Apache Spark run faster

MemVerge a stealth-mode startup, based in Silicon Valley, is developing shared pools of external Optane DIMM memory to make Spark and similar applications run faster than servers with local drives.

The company outlined the technology at the SNIA’s Persistent Memory Summit in January2019 and you can download the slides and watch the video.

The presentation shows how storage and memory can be converged using Optane DIMMs to reduce data-intensive application run times. Of course it is a sneak preview, and MemVerge gives no clue when it could become a shippable product.

MemVerge presentation at SNIA Persistent Memory Summit

The idea is to cluster Spark Accelerators – servers containing a CPU, some DRAM and Optane DIMMs – to form a single virtual pool of memory which is accessed by application compute nodes (servers) in a Spark Cluster using an Ethernet switch.

Data is analysed by the Apache Spark engine using MemVerge’s PMEM Centric Elastic Spark data storage system (PMEM is Optane persistent memory).

Spark cluster nodes run the Tencent Spark application and MemVerge software. Spark accelerators store source data for the Spark Cluster nodes and also intermediate calculation results called Resilient Distributed Dataset (RDD) and Shuffle data.

Shuffle dance

Shuffling is a process of redistributing data needed by the Spark application cluster nodes. It can mean data has to move between different Spark application servers.

If the shuffle data is held in a single and external virtual pool it doesn’t have to actually move. It just gets remapped to a different server, which is much quicker than a data movement.

A Spark Accelerator cluster looks like this architecturally:

Individual accelerator nodes – called compute nodes in the diagram – use Optane DC Persistent Memory and standard DRAM. They present a pool of persistent memory through Spark adapters to the Spark accelerator nodes.

Each accelerator node has 12 x Optane DIMMs:

Accelerator node Optane DIMMs

That number fits the 2-socket Cascade Lake AP Optane DIMM server configuration we are already familiar with.

Times table

Yue Li

In response to an audience question at the Summit, the MemVerge presenter, co-founder Yue Li, declined to say if the Spark cluster nodes accessed the accelerator nodes using RDMA. Blocks & Files thinks the company uses an RDMA-class access method to enable the accelerator to out-speed local drives in the cluster nodes.

Yue Li gave three performance examples in his presentation. First, a HiBench wordcount time run, with a 610GB data set, took 10 minutes using a 5-node Spark cluster with local SSDs while the PMEM-based Spark accelerator node took 2.2 minutes.

Second, persisting RDD data to a local hard drive took roughly a third longer than putting it in the remote PMEM pool formed by the Spark accelerators.

Third, shuffling data was also faster with the remote PMEM than local disk drives.

Asked why not simply put Optane drives in the Spark Cluster nodes, Yue Li said there could be a 1,000 or more such nodes, which makes it expensive and challenging. It is simpler and less expensive to build a single shared external PMEM resource, he said.

MemVerge is developing its technology with Intel’s help and in partnership with ecommerce powerhouse Tencent which has a Cloud Data Warehouse.

Q&A: What is CTERA up to with HPE?

This month CTERA announced the debut of Edge X Series, a its hyperconverged file serving cloud storage gateway using HPE’s SimpliVity hardware.

This piqued our curiosity about the relationship between the two companies and so who better to ask than CTERA CEO Liran Eshel?

The Q and A

Blocks & Files: Are the local and cloud dedupe processes separate with, for example, HPE SimpliVIty doing the local dedupe, and CTERA doing the cloud dedupe? If they are separate processes do the local dedupe files have to be rehydrated so they can then be deduped again by the cloud dedupe process?

Liran Eshel: True. The processes are separate. However, the local dedupe is hardware accelerated, so the rehydration and dehydration takes place in real-time without creating disk I/O or consuming any temporary disk space.

The files are never stored in a dehydrated form. The dedupe algorithms are also different, the local one is tuned for high IOPS, primary storage use case, and to do so, relies on low latency directly connected flash.

In contrast, the cloud deduplication is optimized for WAN at high-latency, as well as providing the ability to perform global deduplication between multiple sites.

Blocks & Files: Is the CTERA Edge X Series only available on an HPE SimpliVity hardware/software base?

Liran Eshel: The Edge X is only based on HPE SimpliVity. The Edge V is a virtual edition that can run on any other server or HCI platform, and is certificated for the likes of Dell, Cisco and Nutanix.

Blocks & Files: What is the relationship between CTERA and HPE? Is CTERA reselling HPE SimpliVity systems? Do customers only buy Edge X Series systems from CTERA?

Liran Eshel: CTERA is reselling SimpliVity in the case of the X Series. In addition to that, HPE is selling the CTERA software portfolio as part of HPE Complete program on top of HPE HW and in combination with 3Par and Scality.

Blocks & Files: Did CTERA consider partnering other HCI vendors, such as Dell EMC with VxRail, or Nutanix with either vSphere or Acropolis?

Liran Eshel: CTERA is already partnering with Cisco HyperFlex and Nutanix (both vSphere and Acropolis), in a meet in the channel mode.

Blocks & Files: Where does support come from; CTERA and/or HPE? Does the customer have 1 or 2 throats to choke?

Liran Eshel: 1 throat to choke. CTERA takes front line for the X Series, with back-line support from HPE for all HW and HCI related matters.

Blocks & Files: How is data protection (backup) provided?

Liran Eshel: CTERA has built-in backup capabilities with customer defined interval, retention and choice of cloud.

Comment

HPE is an avid storage partnering machine, with relationships with Datera, Cloudian, Hedvig, Qumulo, Scality, and WekaIO. A common factor is that they involve HPE servers – Proliants and Apollos.

The CTERA relationship is based in SimpliVity hyperconverged kit which is ProLiant server-based. Nutanix doesn’t need to do this because it has its own File product. Ditto NetApp which has ONTAP filers available for its HCI products if needed.

HPE’s deal with CTERA could spark other HCI suppliers to addfile services from external suppliers. For example, Cisco, Pivot3 and Scale Computing might look to do something here.

Intel gets ready to go live with servers with 12TB Optane

Expect Intel to announce on April 2 that server makers will ship 4-socket, 112-core servers with up to 12TB of Optane memory, from July onwards.

This means the servers will run applications faster than servers using DRAM and SSDs alone.

Blocks & Files has joined up the dots from several Intel pronouncements to draw this picture.

Dot 1: Intel has a Data-Centric Innovation Day event scheduled for April 2. It will stream this live.

Dot 2: Rob Crooke, Intel’s SVP for its Non-Volatile Memory Solutions Group,  blogged on March 19: “We’re also excited about soon-to-be released Intel Optane DC Persistent Memory that will be available on next-generation Intel Xeon processors for data centres. This is redefining the memory and storage hierarchy and bringing persistent, large-scale memory closer to the processor.”

Optane DC Persistent Memory is 3D XPoint media supplied as a non-volatile DIMM with memory-channel connectivity as opposed to the Optane SSDs with slower PCIe bus connectivity.

Dot 3: Late last year Intel announced the availability of Optane DIMMs in a beta testing program for OEMs and cloud service providers, which “paves the way for general availability in the first half of 2019.”

The Cascade Lake AP upgrade of Intel’s data centre Xeon server CPU line was announced in November last year and these CPUs support Optane DIMMs.  Cascade Lake AP parts are single or dual-socket processors with up to 48 cores and 12 DDR4 channels per package.

The Optane DIMMs come in 128GB, 256GB and 512GB capacities. A 2-socket Cascade Lake AP could have 12 X DDR4 memory channels, each supporting 2 DIMMs, either DRAM or Optane DIMMs. There could be a maximum of 6TB of Optane memory.

The first Cascade Lake AP iteration is a multi-chip package combining two 24-core processors connected by a UPI link, into a single 48-core CPU. 

That would be 12 x 512GB Optane DIMMs, leaving 12 DIMM sockets for DRAM – the servers use a mix of DRAM and Optane.

Things have moved on

Dot 4: On March 15 Jason Waxman, GM of Intel’s Cloud Platforms Group, said Intel sees a need for 4-socket servers with up 112 cores – 28 per socket (processor) – and 48 DIMMs – 12 per processor.

These servers would support up to 12TB of Optane DIMM capacity and be available from July onwards.

Waxman is pointing to a second iteration of Cascade Lake AP with 28-cores/socket – 4 more cores than before, and 12 DIMMs per CPU (socket). This adds up to 6 memory channels per CPU, as before, and 24 memory channels in total.

12TB of Optane DIMMs in turn implies 24 x 512GB Optane DIMMs – six per CPU (socket) using up 3 memory channels and leaving 3 for DRAM.

Our conclusion is that Intel will announce 4-socket, 112-core Cascade Lake AP packages on April 2 that support up to 12TB of Optane memory. Server systems using this will be coming available from Dell EMC, HPE, Lenovo, Inspur, Supermicro and Quanta, with first shipments in July.

Falling NAND prices to drive NVMe SSD uptake, say industry watchers

The great NAND flash price slump will accelerate the uptake of SSD storage, industry sources have predicted, with PCIe/NVMe SSDs possibly accounting for half of the market by the end of the year.

Demand for flash plummeted in late 2018 and the first quarter of 2019 but is expected to bounce back in an improving market for smartphones, laptops, servers and other products that use NAND.

In the SSD market, suppliers will increase the downward pressure on 512GB/1TB prices, according to DRAMeXchange analyst Ben Yeh. Along with an increasing proportion of value PCIe SSDs in product shipments, this will drive a greater fall in average selling prices, increasing the uptake of SSDs in laptops.

There is plenty of room for growth in the enterprise market, where suppliers have all set their sights on high-margin PCIe/NVMe products, Yeh said. He added that more opportunities for competition will arise as demand from servers and data centres heats up.

PCIe SSDs will account for up to 50 per cent of the market by the end of 2019, according to a separate report in Digitimes.

This is driven by the shrinking difference in price between the two types. The report claims that the unit price for 512GB PCIe SSDs fell 11 per cent to $55 during the first quarter of 2019, compared to a corresponding price drop of 9 per cent for SATA SSDs, with the price gap continuing to narrow from the 30 per cent seen in 2018.

Digitimes also quote CK Chang, president of Apacer Technology, an SSD maker, who said consumer PCIe SSDs will gradually entirely replace SATA SSDs and also see broader mass adoption in industrial control systems and data centres.