Home Blog Page 365

Dell EMC’s Secure Remote Services recovers itself from own outage

Secure Remote Services, formerly ESRS, is a “secure, two-way connection between Dell EMC products and Dell EMC Customer Support that helps customers avoid and resolve issues. It “supports Dell EMC storage (except PS and SC series), networking, and converged products.” And it operates 24×7. Only not on January 15 and 16.

Reddit user u/bpoag posted on Wednesday Jan 16: “We’ve got a bunch of EMC arrays throwing whiny but inconsequential errors about not being able to call home via ESRS… We’ve been told by a manager in EMC connectivity support that there’s been a global ESRS outage since about 2AM yesterday morning. Any clue what’s going on, and why in the hell they’ve been in the ditch this long?”

An industry insider said: “Global outage apparently. Can’t dial in to field systems. Not sure how it happened.”

A Dell EMC spokesperson said late on Jan 16: “Our customers were made aware via passive notification that ESRS was experiencing downtime, temporarily pausing some remote support services but not adversely impacting product operations. Service has been restored for all customers.”

Data storage estimates for intelligent vehicles vary widely

Car-borne data storage is becoming a vital aspect of cars, trucks and lorries as they get smarter.

Intelligent vehicles will use and generate a lot of data that needs storage inside the vehicle. Although early days, we can begin to understand how much storage could be needed, and its characteristics.

Let’s start by defining what we mean by an intelligent vehicle. This an autonomous or near-autonomous road vehicle, using IT to manage vehicle component functions and assist or replace the driver.

Steps to autonomous vehicles

The Society of Automotive Engineers (SAE) defines five levels of autonomy.

  • L0 – none as human driver controls everything, except stuff like an engine fuel injection system
  • L1 – a single driver process is automated, like auto cruise control or lane-keep assist
  • L2 – Multiple driver functions taken over by IT, such as cruise control, lane-centring and steering and braking
  • L3 – Conditional automation with human driver on standby to take over
  • L4 – fully autonomous cars with no human driver control.

The higher the autonomy level, the more IT is needed, and the more data that needs to be stored. The data is generated by a car’s sensors, which can be outward-looking, like cameras, radar and lidar instruments, and also inward-looking, such as logged engine output, exhaust emissions and suspension spring rates.

The data can also be preloaded in the car; firmware for embedded computer systems for example, and sent to it over-the-air, such as map data for navigation systems.

Data generation

As yet, there is no consensus on just how much data intelligent vehicles will generate.

According to Mark Pastor, archive product marketing director, Quantum, autonomous test vehicles typically generate 5TB and 20TB of data per day. This figure is higher for test vehicles than that anticipated for normal operating mode.

Stan Dmitriev, an author at Tuxera which develops automotive storage systems, says autonomous cars will generate more than 300TB of data per year – less than 1TB/day.

Dmitriev has not published the number of hours per year but 300TB for 365×24 usage seems unrealistic. For the purpose of this article we will assume that this refers to 12 hour car usage per day.

He says cars at up to L2 autonomy can generate about 25GB/hour. That means 300GB in 12 hours – 0.3TB/day.

At CES this month Seagate highlighted that a single vehicle equipped with Renovo’s (AV software platform company) software can generate up to 32TB/day per vehicle. This is the headline number for data that will be collected and stored at the edge before being migrated to the cloud.

This is a wide, wide range, from 5TB/day to 32TB/day.

John Hayes, CEO of self-driving car technology company Ghost, said about the 32TB/day of data generation: “The headline data requirements are off by about a factor of 10,000.” He thinks Seagate and Renovo are talking tosh in other words and implies around 3.2GB/day is more plausible.

Keeping it safe

How the data is stored will depend upon the IT design in the car. This can basically be centralised, distributed or some combination of the two. A distributed design will need storage for each distributed computing element. A centralised scheme will have a central data storage facility, and a hybrid scheme will have a smaller central facility, and various storage elements in sub-systems around the car.

We’ll assume a hybrid scheme for now, with a main controlling computer for navigation and driving the car, complemented with subsidiary component ones for engine management, suspension and braking control, etc.

It is generally assumed an intelligent vehicle will be connected over-the-air to a remote host such as the manufacturer or a robo-taxi fleet operator, sending generated data to the host and receiving reference information such as traffic alerts.

The smart vehicle must make instant braking and steering decisions to avoid hazards and it needs to store information to carry out such real-time functions. The communications network cannot be assumed to be 100 per cent reliable and the car has to store generated data between uplink periods.

There could be minutes or hours of interrupted communications, or even days if the vehicle leaves mobile coverage areas for extended periods.

At Seagate’s high-end 32TB/day estimate of generated data, a 30 day storage period to cover extended non-uplink time would require 960TB capacity.

Mission-critical storage

The car’s central data storage system will be mission critical for what is in effect a mobile edge computing data centre. If it fails, operations can be compromised, causing loss of functions and even failure of the vehicle or an accident.

It seems intuitively obvious that disk drive storage is too unreliable in a car’s vibrating environment and wide temperature ranges. Also it is possibly too slow to retrieve data for instantaneous decision making. These issues indicate flash storage will be needed.

That is more expensive than disk. The mixed read-write workload and a vehicle’s operating life will make read/write endurance important. We can also see that, at the extreme data generation and storage period above, a 960TB SSD would add tens of thousands of dollars to a car’s cost.

A 30TB Samsung PM643 SSD costs $12,09.99 on CDW. We would be looking at $30,000 or so for a 960TB drive; that seems a totally unrealistic cost. The moral here is that data generation and its storage are going to need careful consideration by manufacturers,

Selecting components that can operate inside an automotive environment and store data for a vehicle’s lifetime will be crucial. (I’m driving a 13-year old car and that would blow an SSD’s warranted working life out of the water.) Initial mistakes can come back to haunt a manufacturer, as has happened with Tesla.

Tesla‘s data storage problems

Tesla has been caught out with limited flash endurance. InsideEVs reports that Tesla fitted a Media Control Unit (MCU) to Tesla Model S and X cars from 2010 to 2018. This controls the car’s main touchscreen and has a flash card soldered to a motherboard housing the NVIDIA Tegra Arm-based CPU. It is an 8GB SK Hynix eMMC flash card and stores the firmware for the system.

At launch the MCUv1 firmware was 300MB in size but grew to take up 1GB.

Car logging information is written to the flash card, and this fills the card up. That means there are no extra cells (over-provisioning) for use by the card’s wear-levelling function and the MCU’s functionality becomes affected. The web browser may not work, for example, or system startup could take minutes or even fail. The car is still drivable though.

As the cars developed the fault Tesla replaced the entire MCU board as a fix under warranty. Out-of warranty cars need a fix costing $3,000 or more by a Tesla support shop. The firmware and card were upgraded in 2018 to handle the logging load, and this MCUv2 has a 32GB flash card.

Tesla owners have set up a petition to replace MCUv1 units with MCUv2 ones. A Tesla owners’ forum also discusses the issue.

Poster DallasModelS said in April last year: “Love the car but I’m afraid that service centers have no empathy and are just cold. Just can’t make a product, you have to stand behind it. How does a MCU which controls 99 per cent of the function of the car fail in under 3 years?”

Tesla has been contacted for a comment.


Your occasional enterprise storage digest, featuring Red Hat, Nexsan, and Scality

Nexsan has started up the RoCE road giving faster access to its Assureon archive arrays. IBM-owned Red Hat has added multi-cloud support to its container storage and Scality thinks flash object storage will proliferate at the edge. There are a bunch of other news items below these headline ones.

Nexsan archive kit gets RoCE and Blockchain

Nexsan has added Private Blockchain and end-to-end RDMA (Remote Direct Memory Access) over Converged Ethernet (RoCE) to its Assureon arrays with a v8.3 update.

Private Blockchain stores data in an immutable data structure, using cryptography to secure transactions and with an automated integrity audit at the redundant sites to ensure data integrity and transparency. It provides secure archiving of digital assets for long-term data protection, retention and compliance adherence.

Surya Varanasi, CTO of StorCentric, Nexsan’s parent company, said: “With the release of Assureon 8.3, we have implemented RoCE to provide over a 2x performance improvement… Users are now able to quickly and efficiently retrieve data with a 40Gbit/s RDMA Converged Ethernet connection between the Assureon server and Assureon Edge servers, and thereby accelerate access to archived data storage securely.”

IBM (Red Hat) OpenShift container storage

Red Hat has announced OpenShift Container Storage 4. This is based on Ceph storage which supports file, block and object storage. It has a multi-cloud object gateway from Red Hat’s December 2018 acquisition of NooBaa.

Red Hat says customers operate from a unified Kubernetes-based control plane for applications and storage and can now choose data services across multiple public clouds. No public cloud lockin in other words.

Customers get Rook storage orchestration facilities.

Red Hat has also announced Red Hat OpenShift Container Platform 4.3, with new security capabilities including FIPS (Federal Information Processing Standard) compliant encryption (FIPS 140-2 Level 1) and encryption of the etcd datastore.

Red Hat OpenShift Container Storage 4 is generally available today for Red Hat OpenShift Container Platform 4.

Flash insight for Scality at the edge

Scality CEO Jérôme Lecat thinks object storage used for edge computing will be all-flash.  

Scality CEO Jérôme Lecat

Specifically, object storage will move into the edge for applications that capture large data streams from a wide variety of mobile, IoT and other connected devices. This will include event streams and logs, sensor and device data, vehicle drive data, image and video media data and more.

There will be high data rates and high concurrency from thousands or more simultaneous data streams. The edge applications will be developed for cloud-native deployment, being containerised. They will naturally embrace RESTful object style storage protocols, making object storage on flash media an optimal choice on the edge to support this emerging class of data-centric applications.

RESTful APIs use standard HTTP methods such as GET, POST, PUT, PATCH and DELETE. Get and Put are object storage type commands.

Lecat also thinks that, as enterprises adopt cloud-native applications, Kubernetes will become the default platform for infrastructure deployment in the data centre. This chimes with Portworx’s thoughts on Kubernetes.

Shorts

Data backup and security provider Acronis has signed a multi-year deal to provide cyber-protection products to the San Diego Padres professional baseball team.

The National Academy of Television Arts and Sciences has awarded Dell Technologies a Technology & Engineering Emmy for the Dell EMC Isilon scale-out NAS storage, recognizsng its early development of hierarchical storage management systems. Dell EMC media and entertainment customers have used over 1.5 EB of Isilon storage in the past three years.

Hammerspace’s Hybrid Cloud File Service has officially been verified as Citrix Ready. Among many other things, it enablesCitrix Virtual Apps and Desktop users to access data across multiple locations and / or within the public cloud. Citrix said Hammerspace helps reduce delays and improve response times when running virtual desktops from the hybrid cloud.

IT infrastructure management business Kaseya said annual bookings passed $300m in 2019. Its organic growth rate was about 30 percent and it added more than 5,000 new customers. Kaseya claims more than $2bn valuation. The company owns the Unitrends and Spanning backup businesses and is backed by private equity firm Insight Partners.

Kingston Digital Europe has announced the availability of a new data centre SSD, the DC1000B M.2 NVMe boot drive. It has capacities of 240 and 480GB and has a PCIe gen 3 x4 interface. The performance is up to 3,200/565 MB/sec sequential read/write and 205,000/20,000 IOPS.

MariaDB has released MariaDB Platform X4, claiming it gives developers instant access to smart transactions previously only achievable with expensive proprietary systems. Gregory Dorman, VP distributed systems and analytics at MariaDB Corporation, said: “We implemented a dual storage layout for data: row based for transactions and columnar for true analytics.”

MariaDB Platform X4 uses block storage, such as AWS EBS, for fast transactions along with object storage, such as AWS S3, for fast, scalable analytics. The company claims it’s it the only product on the market to offer this level of cloud-native storage.

Object storage supplier Object Matrix has paired up with Ortana Media in a technical collaboration to provide integrated content management and archive workflows. Cubix from Ortana Media Group is an asset management and orchestration platform and MatrixStore is a media-focused and object-storage based system. Cubix can move stuff to and from MatrixStore.

People

Cloud-enabled data manager Aparavi has hired Gary Lyng to be its CMO. He comes from Violin Systems. Before then he was at SanDisk (acquired by Western Digital), EMC, NetApp, Hewlett Packard, and Veritas Software.

Erwan Menard is currently leading Google Cloud’s Infrastructure and Applications Modernization  portfolio of solutions. Previously, he was the CEO of Elastifile, a start-up providing scalable enterprise file storage solutions for the Cloud, acquired by Google in 2018.

Forsa taps into Optane direct mode to turbocharge apps. No code change required

Formulus Black’s Forsa software gives applications an Optane persistent memory boost with no app software changes needed.

Intel’s Optane 3D XPoint media drives are faster than flash and but Optane’s is fastest when used as memory, as opposed to storage. This requires application software changes. The software has to add code to select and move data between DRAM, which is volatile, and Optane memory, which is not. The combination of this difficulty and Optane’s expense has slowed Optane adoption by software vendors.

Intel has devised adaptions for specific software such as SAP but a general application software adaptation method would be helpful. And this is where Formulus Black, a venture-backed in-memory computing vendor, comes in.

The company’s Forsa software accelerates applications by disassembling them into Forsa bit markers and then running these in memory.

Jing Xie, Formulus Black COO, this week briefed Blocks & Files on the new technology. In July 2019, Forsa was upgraded to make use of Optane’s app direct mode. It uses a Logical Extended Memory (LEM) construct. At the time we wrote “Forsa amplifies the available memory up to fourfold so that 3TB of DRAM looks like 12TB.”

That means any application software inside the Forsa framework with LEMs can use Optane’s app direct mode and so run faster.

Forsa benchmarks

Formulus Black recently ran benchmarks with Intel using the MySQL database and a variety of Intel media:

iPhone slide photo

The transactions per second and latency results shown in the slide above. The Forsa PERF LEM refers to Forsa running in a single system, with Forsa HA LEM referring to a high availability implementation using two systems.

Starting with the P4510 as a base line (3,824tps),Forsa did 2.7x more transactions per sec using the Optane DIMM (10,299tps). The P4510’s latency (at the 99th percentile) was 85ms and the Optane DIMM-assisted Forsa achieved 18ms.

Running in high availability mode slowed the Forsa-Optane DIMM system to 7,827tps with 22ms latency.

The comparison with SATA SSDs shows the Forsa-Optane DIMM combo doing almost ten times more transactions per second.

According to Xie, servers are turbocharged when running Forsa with Optane DIMMs. This combo enables customers to get more use out of server cores, and to run more applications in a server, with those applications able to do more.

If your applications could benefit from being run in-memory – and you can justify purchasing Optane DIMMs – think about running a proof of concept with Formulus Black’s Forsa. 

VAST Data quick-fills Kubernetes containers with file data

VAST Data has announced Kubernetes Container Storage Interface (CSI) support for its storage array operating system. This enables automated provisioning of storage to containers and will also work with Red Hat Open Shift and other container orchestration platforms.

VAST, a heavily funded venture-backed storage startup, said it can stream up to 9GB/sec of data to servers running containerised workloads. VAST claims recent customer testing showed that VAST’s CSI driver and Universal Storage system delivered three times the performance of a Pure FlashArray all-flash SAN storage system using NFS over RDMA.

VAST’s array is a solid state-based system that uses quad-level cell flash memory, Optane XPoint media and NVMEoF access to store data in a single tier and provide low latency access to it. XPoint is used as a metadata store and write buffer. VAST Data nodes are clustered from 2-10,000 stateless servers that all expose an NFS, NFSoRDMA (NFS over Remote Direct Memory Access) and S3 global namespace.

VAST added NFSoRDMA support in its September 2019 v2.0 software release. When used with the CSI interface it can stream data into single containers at nearly 9GB/sec. A Mellanox and VAST Solution Brief claims “RDMA offloads boost NFSoRDMA performance to 18.7 GB/sec per client”. 

NFSoRDMA protocol

We suggested to VAST that the NFSoRDMA protocol was quite old, being 2014-era or earlier.

VAST marketing head Jeff Denworth replied: “2014 isn’t that old, I would argue. More importantly, it’s in the kernel. I can’t begin to tell you all the customers who complain to us about using parallel file systems and the nail-biting experience it is to upgrade an OS and fear breaking the file system or vice versa.

“So we made the conscious choice to avoid this complexity by digging down into what’s already in the Linux kernel and keeping things simple all while expressing the performance of Flash to all Linux applications.

“We are only a bit unique in our use of this as we’re the only all-flash company shipping a NFSoRDMA server, but Oracle/ZFS paved the way for us on this one and I’d argue there’s more ZFS in the world by a factor of at least 1,000 vs. all the world’s high speed file systems combined.”

We understand VAST Data will add SMB support in a matter of months.

The eyeopening multi-billion-dollar merry-go-round of Insight Partners, Veeam – and their one-time beau N2WS

Updated What an interesting world of revolving doors the enterprise storage sector can be sometimes.

It’s been brought to our attention that Insight Partners, the private equity outfit that just snaffled Veeam for $5bn, once bankrolled N2WS, Veeam’s one-time subsidiary.

Insight and Veeam invested in N2WS, a Florida-based AWS EC2 backup and recovery service, in May 2017. Insight previously said on its website that the upstart was “founded in 2012 to address the challenge of providing highly-efficient data protection for production environments deployed in the public cloud.”

The private equity biz – which has injected cash into tons of businesses, from Docker to Twitter to Tintri, over the years – also brought in Veeam co-founder Ratmir Timashev to join the N2WS board. Insight previously invested significant sums in Veeam.

Fast forward to January 2018, and Veeam acquired N2WS for $42.5m, though it soon ran into problems. N2WS’s US federal customers didn’t want to deal with its new parent Veeam, a part-Russian-owned business based in Switzerland.

By August last year, Veaam had offloaded N2WS back to its original founders.

Backup backers

Insight is a fan of backup software companies. It is the majority owner of Kaseya, an IT infrastructure management vendor, and it has a stake in OwnBackup, which specialises in cloud-based backup for Salesforce data suppliers. Prior investments listed on Insight’s website include Acronis, a prominent backup and security supplier; Imceda, the SQL database backup vendor bought by Quest in 2005; and Mimecast, which IPOed in 2015.

In May last year, Kaseya bought Unitrends (which had been acquired by Insight Partners in 2013) and in October took over Spanning (in-cloud backup), which was bought by Insight in 2017. The private equity house’s acquisitions also include RapidFire Tools and IT Glue.

Data protection borg

In effect, Insight Partners is acting as a consolidating force in the fragmented data protection industry.

The well-known vulture capital private equity playbook is to buy business assets, grow them while cutting costs, and then sell them. Insight has shuffled its existing backup investments already, putting Unitrends and Spanning into Kaseya. We speculate that OwnBackup could join Kaseya’s portfolio; it looks a good fit with Spanning as both are in-cloud backup businesses.

Editor’s note: This article was updated after publication to clarify that Insight Partners does not own N2WS: the business was sold back to its co-founders in mid-2019. We are happy to make this correction.

Hard disk drive shipments fell 50% between 2012 and 2019

Disk drive shipments fell by half in the seven years to the end of 2019 according to TrendForce, a market research firm.

Enterprise nearline drives is the only category showing good growth in the last year. As reported by Wells Fargo senior analyst Aaron Rakers, TrendForce’s estimated numbers for the fourth 2019 quarter show the other categories mostly declined.

Rakers notes that TrendForce estimates have a good historical match with actual shipments, using the chart we publish below to make his point.

This chart shows the approximate 50 per cent drop in shipped disk drives from 2012-2019.

The HDD market segment data shows nearline’s dominance as SSDs eat away all other categories except desktop PCs, where capacity is prized over access speed.

  • Total HDDs – c77.2 million drives, down 12.3 per cent y-o-y,
  • 2.5-inch mobile/CE – c30 million units, down 31 per cent y-o-y
  • 3.5-inch Desktop/CE – c27.5 million units, up 5 per cent y-o-y
  • Nearline capacity – c15 million units, up 41 per cent y-o-y,
  • Mission-critical – c4.6 million units, down 15 per cent y-o-y.

There are just three disk drive suppliers, and TrendForce awards Seagate the lead in shipped units in the fourth quarter:

  • Seagate – c31.5 million drives, down 14 per cent y-o-y and 40.8 per cent market share. It was 41.7 per cent a year ago.
  • Western Digital – c29.35 million drives, down three per cent y-o-y and around c.38 per cent share; up from 34.3 per cent a year ago. 
  • Toshiba – c16.35 million units, down 20 per cent y-o-y, with a21.2 per cent hare; down from 24 per cent last year.

Extrapolate the trends and we might see one manufacturer exiting the market in the next decade. Equally we might see the third-placed vendor clawing back lost ground. But don’t hold your breath; this is an ultra-marathon, as exciting as watching oil tankers race.

Portworx takes Kubernetes magic carpet ride into enterprise data centres

Portworx, a venture-backed startup that specialises in storage for container deployments, is banking on enterprises adopting Kubernetes to orchestrate all their apps – not just containerised versions.

The company was founded in 2015 by CEO Murli Thirumale and CTO Gou Rao and has bagged $55.5m in three funding rounds, including $27m last year. There are around 100 employees. Sales grew 500 per cent from an undisclosed number in 2016 to 2017 and 100 per cent from 2018 to 2019. The company hopes to grow at the same rate in 2019 – 2020.

Portworx is not yet making an operating profit but claims to be the most widely used Kubernetes storage platform. It has around 140 customers which include Adobe, Carrefour, Comcast, the Department of Homeland Security, DreamWorks, Ford, IBM, GE, Lufthansa and Verizon. An average sales cycle lasts four months and the average deal size is $100K.

The company provides block storage and has added file capabilities to its roadmap. Minio software is used to provide object storage.

Portworx aims to win enterprise storage business on Kubernetes’ coat tails. Thirumale told a Silicon Valley press briefing last month. “Tomorrow Kubernetes will manage apps and infrastructure for all workloads. The lifecycle of containerised apps will be managed by Kubernetes with HA, DR, backup and compliance extensions.”

Murli Thirumale (left) and Gou Rao

“Kubernetes will become the new control pane for the data centre and the cloud.” He cited VMware’s Project Pacific as an example of how even VMware is being forced to recognise Kubernetes’ dominance.

Machine vs app-centric

Data centre workflows will become app-centric and move away from today’s machine-centric workflows, according to Thirumale.

Machine-defined vs app-defined characteristics; Portworx slide.

Today, admin staff manage and provision virtual machines and their storage resources (block volumes) but tomorrow, end user application owners will drive application IT resource provisioning and management through Kubernetes.

Kubernetes will orchestrate everything an app needs, enabling cloud native ones to run anywhere in the on-premises and multiple public cloud environments. This will give enterprises freedom to run their apps in the best location and move them freely.

When or rather if this switch occurs, storage will have to fit in with Kubernetes and become part of the cloud native application stack. This will require containerised storage management resources, subject to Kubernetes’ control, and capable of using any underlying physical storage. Non-containerised storage will be unable to keep up.

Portworx CTO Gou Rao said Kubernetes increases the density and frequency of storage operations; “You can’t attach and detach an iSCSI volume in 15 secs. Containers can live and die in five seconds.”

Legacy storage such as VMware’s vSAN is built for machine-defined storage, for providing block storage volumes, said Rao. “Container-granular storage is different. You can’t provision hundreds of volumes to containers. You have to use different language and different control.”

In the Kubernetes world, storage services include services such as high-availability, disaster recovery, backup, migration, security and compliance. Kubernetes becomes the control plane for these services, talking to a storage function through its CSI (Container Storage Interface.) 

The Portworx Enterprise platform delivers this function and is a ready-made good storage citizen in the Kubernetes universe, Thirumale said Portworx is a storage overlay for Kubernetes in the sense that it is the Kubernetes’ interface to actual storage resources. 

The Portworx storage overlay for Kubernetes;

Rao said: “Portworx is a storage overlay built for containers driven and managed by Kubernetes. There’s no storage admin. Everything is done through Kubernetes – it’s literally deploy and forget.”

Portworx extolled the simple enterprise storage credentials of its product in this Kubernetes world. For instance, say a user/app calls for a 1GB virtual volume via Kubernetes and the YAML interface. Portworx provisions that volume and handles its block IO. Data goes directly from the active container to the Portworx storage volume with no sideways hop through the Linux kernel, making storage access faster.

A user (app) can purchase cloud storage capacity automatically if running in the public cloud. Portworx provides high-availability, backup, continuous replication and disaster recovery on top of basic provisioning. 

The company aims to provide a one-stop shop for containerised enterprise storage, removing the need for the separate purchase of backup and disaster recovery products, for example. “Our competition is behind us in providing HA and DR,” Rao claimed

Net:net

Portworx is betting the farm on the enterprise adoption of Kubernetes. But that just gets it an entry ticket to the ball. The company must then persuade customers that it delivers a better set of storage services inside the Kubernetes house than legacy suppliers that bolt on a CSI gateway to their products. 

If it pulls that off then it gets to dance at the storage buyers’ ball, and has a promising future. If not…

WekaIO pulls alongside Dell EMC Isilon at Genomics England

WekaIO is being used for a genome sequencing installation’s new project where an existing project’s temporary Isilon performance problem, due to near 100 per cent capacity usage, has been resolved.

In 2015 Genomics England deployed 7PB of generation 5 Isilon clustered NAS flash and disk drive storage from EMC for its 100,000 Genomes project. That project is being followed on with a separate five million genomes project and WekaIO’s filesystem SW has been selected for that.

David Ardley, director of technology at Genomics England, said: “Our legacy storage system had already reached its limit and performance had deteriorated. We needed a modern storage solution that could scale to hundreds of petabytes while maintaining performance scaling, and it had to be simple to manage at that scale.”

However, our understanding from talking to people close to Dell and the project, is that the the deployed Isilon system became virtually full up, running at more than 95 per cent of its configured capacity. Following Dell EMC recommendations it’s capacity was increased and performance returned to the desired level.

In its latest SW iteration the Isilon OneFS file system software can support 4 times larger files, up to 16TB in size. Gen 6 Isilon systems also scale out further and perform faster than earlier gen 5 systems.

100,000 Genome Project

Originally, Genomics England (GEL) wanted to sequence 100,000 genomes from 70,000 people, including NHS patients and their families. Its goal is to provide better disease response by optimising medication for genomes – a person’s DNA structure – and identifying patients at risk from diseases linked to their genome types.

Scientist loading  a DNA sample onto a sequencing machine in Cambridge, UK.

GEL works on large files and looks for common patterns. It requires parallelised access to a library of files, up to 240GB in size, held in network-attached storage (NAS). In 2015, Isilon provided the best kit for this task, with DRAM and flash for metadata, and bulk sequenced genome data stored in SATA disk drive.

Backup services were provided by Dell EMC’s Data Domain and Networker. In September 2016 GEL decided to additionally use an Isilon data lake to store all the data collected during the sequencing process for it to be analysed.

The data lake was then sized at 17PB. GEL also bought 24 all-flash XtremIO X-Bricks to provide faster block storage for its applications. At that point it had sequenced 13,040 genomes.

Genome sequencing and cancer.

GEL completed its 100,000th sequence in December 2018 and has amassed the world’s largest database of whole genome sequences with associated clinical data. The Isilon genome sequencing storage system and data lake is still operational.

Pilot 20,000 baby genomes project

There is a pilot GEL project in which 20,000 babies will be given whole-genome sequencing to detect their liability for epilepsy, cystic fibrosis and other conditions. NHS England operates the national NHS Genomic Medicine Service (GMS) and intends to integrate genomic medicine with routine NHS care by 2025.

The NHS GMS will be deployed across England from April 2020 and comprises seven networked genomic laboratory hubs in an NHS genomic medicines centre infrastructure. A national genomic test directory and whole-genome sequencing will be available nationwide with an integrated clinical service.

In comes WekaIO

NHS England has now decided to sequence five million whole genomes by 2024. That means a genome library in the 100s of petabytes. GEL has decided that WekaIO’s file system is the one to use.

A linear projection from 100,000 genome sequences at 25PB to five million sequences entails 1,250PB of data lake storage.

Ardley said he likes WekaIO’s combination of flash for performance and object store for scale, with data tiered from disk to flash.

Blocks & Files point out that Isilon systems can be similarly configured, and tier data to the public cloud if desired.

WekaIO CEO Liran Zvibel said: “The Weka File System has delivered a 10x performance improvement over GEL’s legacy NFS-based NAS and is enabling more effective use of existing cloud infrastructure [This will] improve overall productivity and empower researchers to become more efficient at analysing results.”

Blocks & Files‘ understanding is that a clustered Isilon system can scale up and out to that level, to exabyte scale. In fact it has customers storing well past 100PB on its systems already, with newer hardware outperforming the fifth generation..

Such exabyte-class storage systems are good news for genome sequencing.

Genomics England Chief Scientist Professor Mark Caulfield said: “As the UK database expands to five million sequences and beyond, new insights will help to save many lives, both in the NHS and around the world.”

Dremio: let’s kill all the data warehouses

Using a data warehouse in the cloud is a Band-Aid to make existing data preparation methods for analytics last longer. But this method is plain wrong – so says Dremio, a data warehouse startup that argues analytics should work directly on source data in the cloud.

It thinks the extract, transform and load (ETL) mechanism for populating data warehouses is time-consuming and wasteful. With ETL, source data in a data lake is selected, copied, processed and then loaded into a second silo, the data warehouse, for analytics work, costing time and money.

More processing is required, as data scientists reformat it into data cubes, business intelligence (BI) extracts and aggregation tables for their own purposes.

Dremio slide showing existing multi-layered data access for data scientists.

It would be better to run the analytics directly on source data in the data lake, according to Dremio, which has built a Data Lake Engine running on AWS and Azure to do this. Users can execute queries faster by going straight to the S3/ADSL data lake data. The software uses open file formats and source code so there is no vendor lock-in, as there can be with existing data warehouses.

Dremio direct data access scheme.

Dremio says that its direct access software means there’s no need to create cubes, BI extracts and aggregation tables. Data scientists use a self-service semantic layer that’s more like Google Docs than an OLAP cube builder or ETL tool. 

Dremio features

To speed data access the software provides a columnar cloud cache which automatically stores commonly accessed data on NVMe drive storage close to the clustered compute engines, called Executor nodes. Predictive pipelining pre-loads the cache and helps eliminate waits on high-latency storage.

The software has an Apache Arrow-based engine for queries. This has been co-created by Dremio and provides the columnar, in-memory data representation and sharing. Dremio claims Arrow is now the de-facto standard for in-memory analytics, with more than one million downloads per month.

Apache Arrow Flight software extends the performance benefits of Arrow to distributed applications, It uses the Remote Procedure Call (RPC) layer to increase data interoperability by providing a massively parallel protocol for big data transfer across different applications and platforms.

Flight operates on record batches without having to access individual columns, records or cells. For comparison, an ODBC interface involves asking for each cell individually. Assuming 1.5 million records, each with 10 columns, that’s 15 million function calls to get this data for analytics processing.

An open source Gandiva vectorised processing compiler is used to accelerate the handling of SQL queries on Arrow data. It reduces the time to compile most queries to less than 10ms. Gandiva supports Xeon multi-core CPUs with GPUs and FPGAs on Dremio’s roadmap. Dremio claims Gandiva makes processing 5 to 80 times faster again, on top of Arrow’s acceleration.

The software also integrates with identity management systems like Azure Active Directory to ease its use by enterprises validating data access that way. It also supports AWS security utilities.

A Dremio Hub software entity has Snowflake, Salesforce, Vertica and SQLite connectors that joins data with existing databases and data warehouses. The hub supports any data source with a JDBC driver, and includes relational databases, REST API endpoints, and other data sources. More connectors are being added, with 50+ targeted for the end of the year.

The Dremio software can also run on-premises as well as in the AWS and Azure public clouds.

Company background

Dremio was started in 2015 by former MapR employees Tomer Shiran (CEO) and Jacques Nadeau (CTO). Total funding is $45m. It is headquartered in Santa Clara, CA and the v1.0 product was delivered in 2017. Customers include Diageo, Microsoft, UBS, Nutanix, and Royal Caribbean Cruises.

Shiran predicts 75 per cent of the global 2000 will be in production or in pilot with a cloud data lake in 2020. That sounds good but the claim is hard to validate.

Dremio will need a major funding round this year or next to participate fully in this predicted upswing. This would help it compete with Snowflake and its massive $923m funding.

WekaIO trims sales force

WekaIO, the fast parallel file system storage startup, has laid off some of its sales team. We don’t have exact numbers but contacts in two tech storage vendors that declined to be named told us that recently-departed WekaIO execs had applied for sales positions at their companies.

WekaIO CEO Liran Zvibel said: “This is a B.S. spin from our competitors.”

WekaIO CEO Liran Zvibel

“We did have a re-alignment of our sales force as part of our 2020 fiscal year to be highly channel focused in three key vertical markets – machine learning, life sciences and finance.”

He explained: “We have right-sized a small number of teams that could not find a good market fit in their region, keeping teams throughout the USA and Europe. That said, we are proactively hiring new salespeople and we are expanding our teams in the regions that had great success.”

“In your backyard, we have recently hired a Support organisation to handle Europe, augmenting the competent London Sales office we already had.”

Background

WekaIO is a high-flying and fast-growing startup has won numerous benchmarks for its super-fast storage software (here, here and here, for example).

WekaIO was founded in Israel in 2013 and has taken in $66.7m in funding, including a $31.7m C-round last year. Trade investors include HPE, Nvidia, Seagate, Western Digital Capital, Mellanox and Qualcomm.

The recent funding suggests that cash burn is unlikely to be an immediate problem. Zvibel confirmed this: “For the record, we are also very well funded into 2021.”

WekaIO sells 100 per cent through its channel and its own sales reps support the channel. OEMs include Dell EMC, HPE, Penguin Computing and Supermicro. There are more than 65 global resellers, mostly based in the USA. The company launched its WekaIO Innovation Network global partner program in November last year to strengthen channel sales.

Your occasional enterprise storage digest, featuring Pavilion Data, HPE, Supermicro and more

This week’s enterprise storage round-up starts with NVMe-over-Fabric startup Pavilion Data is readying a big sales and marketing push, spending August 2019 VC money. And there are other items such as a big damages win for HPE. Read on.

Pavilion Data revs go-to-market activities

Pavilion Data Systems, an NVMe-over-Fabrics array startup, has hired Mike Canavan as chief revenue officer and Amy Love as chief marketing officer.

Start-ups like Pavilion and Excelero have to work extra hard to make progress, as all the major tech storage suppliers have their own NVMe arrays. Pavilion took in $25m in C round funding in August 2019 to help it do this.

Canavan said he is “looking forward to bringing additional sales leaders on board as we leverage our collective expertise and relationships to grow our footprint and increase NVMe-oF adoption in enterprises around the world.”

We note Dan Heydenfeldt left his Pavilion VP Global Sales position in December 2018.  Asked what has happened with sales leadership since then, a Pavilion spokesperson said: “Gurpreet Singh, the CEO, took over sales responsibility during that time in order to drive alignment with customers and the Pavilion product offering. The strategy worked well. Pavilion now has solid customer traction with Pavilion products in production, which enabled Gurpreet to go after hiring Mike Canavan, rather than an interim step of a VP of Sales.”

Canavan was previously global VP of sales for Flashblade at Pure Storage. and Amy Love was VP, Corporate Market at TriNet and before that, CMO at Violin Memory.

Shorts

HPE has won $439m in damages from Taiwan-based Quanta as a US federal judge tripled damages awarded when a jury found Quanta had conspired with Hitachi-LG, Sony and Panasonic to keep optical disk prices high. Hitachi-LG, Sony and Panasonic settled with HPE years ago. Quanta elected to go to trial – and lost. Case details: Hewlett-Packard Co. v Quanta Storage, 4:18 762, U.S. District Court, Southern District of Texas (Houston).

Supermicro is relisting on Nasdaq. The company was delisted in August 2018 due to financial reporting irregularaties.  It has taken almost 18 months to arrive at the correct revenue numbers for fiscal 2017, 2018 and subsequent years.

CEO Charles Liang said: “This marks our successful comeback and is the culmination of our efforts to become current with our SEC filings. We are pleased to begin a new chapter for Supermicro that is based on improved internal controls and a dedication to profitable growth. Not only are we back, but we are stronger, better, and re-energised to capitalise on the opportunities ahead.”

Hybrid cloud data warehouser Yellowbrick Data has appointed Jeff Spicer as the company’s Chief Marketing Officer. His career includes stints at Green Dot Corporation, IBM Analytics, VMware and Oracle.

Yellowbrick wants to expand its markets, increase brand awareness and accelerate the adoption of its modern data warehouse offerings, against a background of strong competition from Snowflake and others.

Acronis has become Liverpool Football Club’s Official Global Cyber Backup and Storage Partner.

DataStax has introduced DataStax Luna to provide enterprise-class support for open source Cassandra projects. It claims says Cassandra is the most viable distributed NoSQL database for enterprise applications for a multi-cloud future. Enterprises like like Netflix, Uber, Spotify and others use the open source version of Cassandra because it’s one of the fastest, most powerful NoSQL distributed databases there is. But support was minimal, and enterprise-class support was unavailable outside of forums and distributed groups. 

As is its usual practise when a new year starts, deduplication backup to disk appliance supplier Exagrid has announced record results without talking revenue numbers. It previously reported 12.2 per cent growth in 2016, 14.5 our cent in 2017, and 20 per cent in 2018. Exagrid CEO Bill Andrews said: “We added over 100 new customers with a record number of new customer six-figure purchase orders. We achieved double-digit growth over the same quarter a year ago.” 

VP Products Josh Goldenhar has left Excelero and joined Lightbits Labs as VP Product Marketing. There he will push Lightbits NVMe/TCP technology which directly competes with Excelero’s tech. Eyal Traitel, VP Product Management left in October last year to join Cybellum.

The ObjectiveFS 6.6 release includes memory and performance improvements, such as a new memory caching algorithm for faster cache operations, new heuristic for memory cache, reduced memory usage for active directories, etc.

Smart Modular Technologies has introduced its ME1 and ME2 Series SATA SSDs with NVMSentry firmware. They come in in M.2 2280 and 2.5-inch SSD form factors with  240 GB to 1920 GB capacities. The drives are built from 3D NAND Flash technology with enterprise-class SSD controller features such as end-to-end data path protection, internal SRAM and external DRAM cache ECC (error correcting code) and a firmware code recovery mechanism. 

Block storage supplier StorPool has launched v19 of its eponymous product with multi-cluster support. Also one storage system can provide shared storage to multiple IT stacks – Windows, VMware, Kubernetes (K8S), bare metal, and more. The StorPool iSCSI target now supports Layer 3 routed datacenter networks (next to the existing Layer 3 support. There are performance improvements with latency below 0.1 ms for committed, 3x synchronous replicated systems.