Home Blog Page 322

At exabyte levels, data gravity exerts an enormous pull. This sucks

I spoke recently to Michael Tso, Cloudian CEO, about the consequences of exabyte-level object stores. Exabyte-level data egress from the public cloud could take years, he told me, and cost a million dollars-plus in egress fees. There’s lock-in and then there’s throwaway the key lock-in.

Supercomputers are already approaching exascale, meaning compute exaflops and also, quite likely, storage at the exabyte level. How far behind are enterprise installations?

Michael Tso

Cloudian has “multiple customer sites designed for Exabyte scale,” Tso said. “Not all are at the exabyte level yet but we have one knocking at the door.”

Customers will not arrive at the exabyte level in one go, according to Tso. Consider this: an exabyte of capacity entails buying, deploying and managing one million 10TB disk drives (HDDs), 500,000 x 20TB HDDs or 750,000 14TB drives. There will be a continuous need to add new capacity, physically deliver, rack-up, cable, power and cool, and bring on-line disk chassis.

There will be drives of different capacities in the population, because drive capacities rise over time, and different drive types, with SSDs, also in different capacities, added to disk drives as fast object stores become needed. The system will be multi-tiered.

It will therefore be crucial to ensure that the management of disk drive populations of 500,000 drives and up is practicable – no-one wants to spend their days removing failed drives and adding new ones. On-premises management difficulties will scale up enormously unless drive-level management approaches are superseded.

Disk failures and upgrades

In a population of 500,000, some drives will inevitably fail. For example cloud backup storage provider Backblaze in 2019 reported a 1.89 per cent annualised HDD failure rate with a population of 122,507 drives. That translates into 2,316 drives failing over the course of the year, about six or seven each day. Scale this example to 750,000 drives and 14,175 drives will fail in a year, about 39 per day.

It gets worse. You will need to add new drives as well as replace failed ones. Tso said: “The data will outlive the hardware. The system will be continuously renewing. You will be continuously adding new hardware.“

Storage at that scale is constantly upgrading. If your exabyte data store is growing at 1PB per week, you will be adding 50 drives a week. If it grows at 1PB a day then you will add 50 HDDs and replace 39 failed drives every day.

These illustrations show that it is not possible to manage exabyte data stores at the individual drive level. They store must managed at multiple-drive, chassis levels. Hold that thought while we look at data gravity.

Public cloud data gravity

Suppose you have an exabyte of S3 object storage in Amazon, or as blobs in Azure. You decide the cost is too great and you want to repatriate it. How long will it take and how much will it cost in egress fees?

Tso provided a data migration time taken vs network speed table.

Data Migration time and network speed table

It will take two-and-a-half years using a 100Gbit/s link, and an astonishing 25 years with a 10Gbit/s link. And the egress fees will be astronomical.

With Azure, outbound (egress) charges can be reduced by using Azure’s Content Delivery Network (CDN), with an example cost of $0.075/GB in a rate band of 10 – 50TB/month. Let’s apply that price to moving 1EB – one million GB – out of Azure. It will cost $1.5 million, a ransom on its own.

“At that [exabyte] scale a customer’s data is not that portable,” says Tso. “When people have more than 10PB of data, data migration is not a topic; it’s not possible…. It’s data gravity…[after a certain size] gravity dictates everything.”

Customers looking to move data out of the cloud may well find it impossible, due to the time and cost reasons above. They may cap their cloud data level and start storing data on-premises instead. One extra reason for doing this could involve keeping data private to prevent potential cloud service provider competition.

Whatever the reasoning, it means they have two classes of stores to manage and software to do it.

Management

Tso argues that managing exabyte-scale object stores will require AI and machine learning technologies. Here’s why.

At the exabyte level, the storage software and management facility has to handle continuous upgrading and multi-tier storage. The software must cope with drive failures, without internal infrastructure rebuild and error-correction events taking up increasing resources. It has to manage multiple tiers such as flash, disk tape; multiple locations on-premises and in the public cloud; and protection schemes,such as replications and erasure coding. And do this all automatically.

Tso offers this thought: “For EB scale, how cluster expansions are managed is critical. Nobody starts off with an EB of data but rather grows into it, so starting smaller and scaling up is a key challenge that needs to be addressed.

He says there are two approaches; either a single physical pool of nodes\disks, which he and Cloudian support, or having islands of object storage.

The islands approach means “storage HW resources are deployed in domains determined by the erasure coding stripe size (e.g., EC 4+2 would need 6 nodes).

“At expansion time, you would need to add a new domain of nodes (6 nodes in this example)… Data written to a storage domain stays on those disks forever (until deleted of course). Write IOPs are limited to the amount of free HW resources which is probably the new nodes that have just been added (again, 6 nodes in this example). In short, write performance will never scale beyond the amount of nodes in a single domain.”

The single pool approach means: “All storage HW resources are grouped in a single logical cluster, and data requests are load balanced across all nodes\disks in the cluster. The object storage platform uses algorithms to balance data evenly across all nodes in the cluster to deliver resilience, scalability and performance.”

From drives to chassis

Blocks & Files thinks that the management software will need to work at the drive chassis level and not at the drive level. Managing 750,000-plus drives individually will become impossible. This means that the single pool approach that Tso supports would need to be modified to embrace islands.

It would be simpler for the management software to look after 75,000 chassis, each containing 100 drives, than 750,000 drives. Each chassis would have a controller and software looking after the drives in its box. But even that may be too much. We could end up with groups of chassis being managed as a unit – say 10 at a time – which would mean the management software now looks after 7,500 clusters of drive chassis.

Toshiba (KumoScale), Seagate (AP 5U84), and Western Digital (OpenFlex) are each building chassis with multiple drives, and it would be unsurprising if they are also developing in-chassis management systems.

A final thought. this article looks at 1EB object storage, but exabyte-level file storage is coming too, and possibly exabyte block storage also. Data storage will soon move up to a whole new level of scale and complexity.

N.B. X-IO tried to develop sealed chassis of disk drives, ISE canisters, between 2010 and 2014 but the technology foundered. Perhaps it was a few years ahead of its time.

Exclusive: Clumio expelio’s sales staff as it retreats to the cloud

Clumio is laying off up to half the sales team, according to sources, in a reorganisation that sees the backup software vendor pivot to “cloud-first”. The company did not directly answer our question if that meant it was pivoting to cloud-only – i.e. withdrawing from actively selling on-premises solutions.

A company spokesperson told us: “In 2020, Clumio doubled its total staff headcount to steadily define, develop and deliver on its ‘all cloud’ enterprise data protection vision. As a result, we’ve seen increased demand for our cloud-first offerings throughout 2020.

“As Clumio works to anticipate and meet this growing customer demand, we are amplifying our cloud efforts and restructuring a very small part of our sales team. We’ve actively communicated this with our internal team, and we look forward to delivering a customer-proven, cloud-agnostic SaaS solution for tomorrow’s cloud enterprise.”

Clumio has been named one of the “Built In 2021 Best Places to Work” companies and received three awards – Built In San Francisco’s 2021 Best Places to Work, 2021 Best Midsize Companies to Work For, and 2021 SF Best Paying Companies. LinkedIn shows 186 employees.

Clumio competes in a hotly-contested market against suppliers including Acronis, Cohesity, Commvault, Druva, Rubrik and Veeam.

The company was started in 2017 by the founders of PernixData, which was acquired by Nutanix in August 2016. Clumio has raised $185m in funding, including a $135m C-round in 2019.

Its SaaS software can have a single set of policies defined to protect AWS EBS volumes, Microsoft 365 email and VMware VMs, regardless of whether they run on-premises or in the cloud. Clumio in December 2020 announced RansomProtect, which it said was the industry’s first virtual air-gapped ransomware protection for private and public clouds and SaaS applications.

Phison PCIe gen 4 SSD is faster than Optane SSD

Businessman in superman pose

Phison is showcasing an unreleased PCIe 4.0 controller CES 2021 that is faster than Intel’s P5800X Optane SSD.

Update: PCIe Gen 4 bus saturation point added. 12 Jan 2021.

The Taiwanese NAND flash vendor said it plans to launch PCIe gen 4 controllers this year. PCIe gen 4 is twice as fast as PCIe gen 3, at 2GB/sec per lane versus gen 3’s 1GB/sec.

The PS5021-E21T (E21T), built with 12nm technology, delivers up to 25 per cent improvement in performance and power consumption over the previous 28nm generation. According to Phison, this means the PCIe 4.0 E21T is good for console, PCs laptop, and mobile gaming – and here are new design-in opportunities.

The PS5018-E18 is Phison’s second, faster, PCIe Gen 4×4 NVMe SSD, delivering up to 7.4/7.0 GB/sec for sequential read and writes. This is faster than the 7.2GB/sec sequential read speed of Intel’s P5800X Optane SSD, which also uses PCIe Gen 4.

An industry source suggests that both the Optane and Phison drives are saturating the PCIe gen 4 bus. Also, some of the difference between the Phison and Optane drive numbers could be due to how speeds are reported, SI-based numbers versus non-SI numbers for example.

Phison beats the  6.9GB/sec of Kioxia’s CM6 2.5-inch SSD, 6.3GB/sec of KIoxia’s XG7 M.2 format drive, and 6.2GB/sec of Kioxia’s CD6 M.2 drive.

We don’t know what kind of flash the Phison controller is driving, although we think it is Micron TLC flash. All the Kioxia drives above use 96L3D NAND formatted as TLC flash.

The E18 controller supports up to 8TB capacity and, we think, 1 million random read IOPS. Phioson also said the E18 has superior power efficiency, but did not say what it is superior than.

N.B. PCIe 4.0 may be twice as fast as PCIe 3.0, but this does not also mean that SSDs using the technology are also necessarily twice as fast. For example, a PCIe 4 SSD that uses QLC flash will probably out-run an PCIe gen 3 SSD that uses faster TLC NAND memory. But there will not be a doubling of performance.

Micron’s good Q1fy21 results as DRAM industry cycle starts rebound

Micron has posted first fiscal 2021 quarter’s revenues of $5.77bn, up 12 per cent Y/Y. The US semiconductor maker chalked up net income of $803m for the quarter ended December 3, up 63.5 per cent Y/Y, and the company thinks it has reached the trough of a DRAM price depression.

The company expects next quarter’s revenues to be $5.8n at the mid-point up 21 per cent Y/Y but notes that a power outage in Taiwan on December 3 and an earthquake in the country on December 10 has affected DRAM production. The effects have been factored into the expectations for the next quarter.

President and CEO Sanjay Mehrotra noted in prepared remarks: “Memory and storage industry revenues have grown faster than the broader semiconductor industry, from approximately 10 per cent of semiconductor industry revenues in the early 2000s to now approaching 30 per cent.”

He said: “We are excited about the strengthening DRAM industry fundamentals. For the first time in our history, Micron is simultaneously leading on DRAM and NAND technologies, and we are in an excellent position to benefit from accelerating digital transformation of the global economy fueled by AI, 5G, cloud, and the intelligent edge.”

DRAM accounted for 70 per cent revenues ($4.056bn) and increased 17 per cent Y/Y. NAND represent 27 per cent its revenues $1.574bn), and rose 11 per cent Y/Y.

In business unit terms:

  • Compute and Networking – $2.5bn, up 29 per cent Y/Y,
  • Mobile – $1.5bn, up 3 per cent Y/Y
  • Storage – $911m , down 6 per cent Y/Y,
  • Embedded – $809m, up 10 per cent.

3D XPoint

CFO Dave Zinsner noted: “3D XPoint revenues are now reported in the Compute and Networking Business Unit. Excluding 3D XPoint from the prior year’s quarter, SBU revenues would be up 14 per cent year-over-year.”

We think that means $169m of the SBU’s $968m revenues for Q1 fy20 were due to 3D XPoint – in other words, that’s what Intel paid Micron for XPoint chips for its Optane products in the quarter.

Micron said NVMe represented over 90 per cent of client SSD bits, with QLC flash accounting for almost half of the total. Mehrotra claimed: “We are leading the industry with the broadest portfolio of QLC SSDs across client, consumer and data centre markets. QLC helps to make SSDs more cost-effective and accelerates the replacement of HDDs with SSDs. QLC SSD adoption continues to grow, and our bit mix of QLC SSDs increased further in FQ1.”

He also talked about memory: “Cloud and enterprise DRAM revenue declined sequentially  … We began revenue shipments for our ultra-bandwidth HBM2E memory, which is used for data centre AI training and inference. We are making progress on the DDR5 transition, which will double bandwidth and reduce power consumption, and we plan to start that transition in the second half of fiscal 2021.”

New server CPUs, like Intel’s Ice Lake, shoulddrive DRAM demand higher as they support more memory channels.

The embedded market looks good too, according to Mehrotra “We had a record auto revenue quarter, resulting from the resumption of auto manufacturing around the globe and the continued growth of memory and storage content per vehicle.“

Red Hat buys StackRox to bolster OpenShift K8s security

shipping container with doors open
Open container

Red Hat is buying Stackrox, a Kubernetes-native security startup, for an undisclosed sum. The company intends to add it to Red Hat OpenShift but said Stackrox will continue to support existing users of other Kubernetes flavours.

Red Hat says it wants to deliver a single, holistic platform so users can build, deploy and securely run nearly any application across the entirety of the hybrid cloud.

Red Hat will focus on transforming how cloud-native workloads are secured by expanding and refining Kubernetes’ native controls, with StackRox, as well as shifting security leftwards into the container build and CI/CD phase. The aim is to provide enhanced security up and down the entire IT stack and throughout the lifecycle.

California-based StackRox was founded in 2014 and is focused on Kubernetes security. Customers can control and enforce security policies, using a Kubernetes declarative approach. This means their secured applications can scale  more easily, it’s claimed, than ones using container-centric security approaches.

StackRox software provides visibility across all Kubernetes clusters by directly deploying components for enforcement and deep data collection into the Kubernetes cluster infrastructure. The company claims this reduces the time and effort needed to implement security, and streamlining security analysis, investigation and remediation. 

Its policy engine includes hundreds of built-in controls to enforce security best practices, industry standards such as CIS Benchmarks and NIST, and configuration management of both containers and Kubernetes, and runtime security.

Your occasional storage digest with Cisco Hyperflex, SoftIron, and more

2021 has started with a bang… but let’s talk about data storage. In this week’s news roundup we feature Cisco, Backup Assist and SoftIron. Also there are a flurry of people moves involving Cohesity, Commvault, Pure Storage, Quantum, and Rubrik.

Cisco HyperFlex software update looks imminent

Cisco’s hyperconverged, UCS server-based product is called HyperFlex and HyperFlex software 4.5 is soon to release. A Cisco document outlines its features;

  • iSCSI block support
  • N:1 replication for HX Edge 
  • Boost Mode to enable virtual CPU configuration change in HX controller VM
  • ESXi and vCenter 7.0
  • Secure boot with built-in hardware root of trust,
  • Hardened SDS Controller to shrink the attack surface and react to insecure admin credentials
  • New 2U, short-depth HX240 Edge server
  • ML-driven cluster health checks
  • Intersight Workload Optimiser
  • Intersight Kubernetes service
  • New validated workloads on HyperFlex, including:-
  • – Cisco ISE, 
  • – Cisco SD-WAN
  • – Epic Systems electronic health records (EHR) on HX All NVMe.

The HX240 has all-flash and hybrid disk-flash configurations. It has 1 or 2 CPU sockets, up to 3TB RAM, to 175TB storage capacity, and PCIe slots for GPUs.

Read a Cisco blog to find out more. We understand HyperFlex software will be available on its own, to run on commodity X86 servers.

BackupAssist adds Wasabi cloud back end

BackupAssist, a backup software supplier for SMEs, has extended support for backend Wasabi cloud storage to the Classic version. Previously, this feature was reserved for the company’s ER product.

Combined with Wasabi’s cloud, BackupAssist’s software provides backups for full system and granular recoveries anywhere, anytime, the company says. BackupAssist’s CryptoSafeGuard, combined with Wasabi’s object durability and data immutability, enables backup files to remain secure.

Linus Chang, BackupAssist CEO, said: “Clients have been asking for our recommended alternatives to the large first-generation public cloud providers, commonly inquiring about Wasabi specifically.”

Wasabi’s storage is one fifth the cost of Amazon S3 with no fees for egress or API requests, according to the company.

The Hut Group selects SoftIron

The Hut Group (THG), a big UK online etailer, uses SoftIron’s Ceph-optimised HyperDrive as the storage backbone for its global THG Ingenuity ecommerce platform.

According to THG, the platform delivers enterprise-class scalability and resiliency, is cloud-native, provides superior efficiency and exists within its carbon-neutral THG Eco initiative.

THG has 24 data centres across the UK, Europe, USA, and Asia, and will add more, CTO Schalk Van der Merwe, said: “We’re already operational with HyperDrive in the UK, Germany and the US and look forward to working together to deliver a new class of service and support for our customers across the globe.”

He added: “While open source solutions may seem like a ‘science project’ for non-tech companies – for us, being cloud-agnostic, free of lock-in, and achieving ‘Cloud Native Without Cloud,’ is all strategically important within our architecture. We love the ‘task-specific’ approach SoftIron is taking to make open-source Ceph deployable and manageable at scale for enterprises like ours, while at the same time creating efficiencies that reduce our environmental footprint.”

SoftIron CEO Phil Straw said: “We’ve engineered HyperDrive storage hardware from the source code and motherboard on up – a method which we call ‘task-specific.’ We’ve custom-built our hardware to support the source code first and have thus eliminated the inefficiencies that exist using the legacy ‘commodity’ approach.”

Shorts

Robin Matlock

Cohesity has appointed former VMware CMO Robin Matlock to the Board of Directors. She brings more than 30 years of marketing, sales, and business development experience to the board spanning a variety of markets including enterprise software, cloud services, and security.

Coldago’s Jan 2021 Storage Unicorn listing – companies valued at $1bn or more – includes Acronis, Barracuda Networks, Cohesity, DDN, Druva, Infinidat, Kaseya, Nasuni, Qumulo, Rubrik, VAST Data, Veeam Software and Veritas Technologies. Its June 2020 listing included these plus Actifio (bought by Google) and Datrium (bought by VMware).

John Tavares.

Commvault has hired John Tavares, a 25-year EMC and Dell EMC exec, as VP, Global Channel and Alliances. His predecessor Mercer Rowe becomes Area VP for Japan, and will focus on growing Metallic, Commvault’s SaaS division, in the region and strengthening the company’s presence in APAC.

Coincidentally, Callum Eade, Commvault’s VP and Managing Director – Asia Pacific and Japan (APJ), has resigned, apparently for personal reasons. Rachel Ler, Commvault’s current head of ASEAN, Hong Kong, Korea and Taiwan, becomes VP and GM for APJ, with responsibility for Australia and New Zealand, India, China and Japan. Mercer Rowe will therefore report to her.

Excelero’s NVMesh elastic NVMe storage software is being used in a GPU computing infrastructure at the University of Pisa’s Green Data Centre. This is the first customer deployment of USTI (Ultrafast Storage, Totally Integrated) from Excelero partner E4 Computer Engineering. USTI uses NVMesh’s a zero-latency storage infrastructure to power ML, DL, genomics and data analytics and can allocate GPU resources dedicated to scientific research applications.

HelpSystems has acquired FileCatalyst which supplies secure enterprise large file transfer acceleration. across global networks. This can be useful for broadcast media and live sports. Kate Bolseth, CEO, HelpSystems, said: “FileCatalyst is an excellent addition to our managed file transfer and robotic process automation offerings, and we are pleased to bring the FileCatalyst team and their strong file acceleration knowledge into the global HelpSystems family.” Terms were not disclosed.

MariaDB Corporation has announced general availability of MariaDB Connector/R2DBC (Reactive Relational Database Connectivity). R2DBC is an emerging Java standard which enables applications to use a stream-oriented approach to interact with relational databases. They can use R2DBC to take advantage of declarative programming techniques to create more powerful, efficient and scalable Java Virtual Machines. The MariaDB Connector/R2DBC exposes a non-blocking API that interacts with MariaDB databases, including MariaDB SkySQL, to create fully reactive solutions.

Data recovery specialist Ontrack reports 78 per cent of the data recovery requests it received in 2020 were for ageing or physically damaged hard disk drives (HDDs). This compares to only three per cent of requests being for solid-state drives (SSD), three per cent for RAID servers, three per cent for other servers, two per cent for flash recovery, and one per cent for USB sticks. The remaining 10 per cent were for mobile devices.

Ajay Singh.

Pure Storage has hired VMware’s Ajay Singh, GM of the Cloud Management Business Unit, as its new Chief Product Officer. Singh will have direct responsibility for all of Pure’s business units, and the Global Alliances team.

Rick Valentine.

Rick Valentine has joined Quantum as SVP & Chief Customer Officer. The veteran tape storage vendor has embarked on journey to transform into an as-a-service provider for unstructured data management, analytics and storage. Prior to Quantum, Valentine was Chief Customer Officer for Silver Peak Systems, where his work was instrumental in HPE’s $925 million acquisition of Silver Peak in 2020. Before that, he was Chief Customer Officer with Symantec and Veritas.

Rubrik CTO Chris Wahl has resigned after five and a half years with the startup.

Quest Software has acquired Erwin, a data governance company. The price was not disclosed. Erwin has Data Modeler, Data Intelligence and Evolve software products for big data application development, metadata management, and regulatory and industry compliance, respectively. Erwin appears the leader’s box in Gartner’s 2019 Magic Quadrant for Metadata Management and ha deployments in over 60 countries, with over 3,500 customers and over 50,000 users.

SingleStore, the company formerly known as MemSQL, has announced native support for AWS Glue. AWS Glue is a serverless data preparation service that makes it easy for developers, data analysts, and data scientists to extract, clean, enrich, normalise, and load data. The new integration will enable developers, data engineers, and data scientists to build with SingleStore on AWS more easily. The fully managed, scaled-out Apache Spark environment for extract, transform, and load (ETL) jobs provided by AWS Glue is matched to SingleStore’s distributed SQL design. It enables faster processing through parallel ingestion in AWS, much in the same way that SingleStore’s Spark Connector delivers these benefits in other environments.

Western Digital settles gender bias class action for $7.75m

Final approval has been granted to a Western Digital settlement of a class action lawsuit, which accused the company of underpaying female staff and discriminating against them in pay, promotions, and placement.

Western Digital will pay $7.75m – $5m to 1863 female employees, and most of the rest to their law firm. In addition, the company will undertake “sweeping programmatic measures to help eliminate gender disparities and foster equal employment opportunity going forward”.

The lawsuit was bought by Yunghui Chen, who  joined WD’s Audit Department in 2005 and resigned in September 2016, having been promoted to Internal Audit Manager in 2008.

She alleged that Western Digital “paid her $30,000 less than her male counterparts performing equal and substantially similar work and refused to promote her to Senior Manager, despite promoting similarly situated, less-qualified men.” 

Her class action lawsuit, filed in May 2019, stated: “Men dominate Defendants’ leadership and management. Upon information and belief, the overrepresentation of men in Defendants’ leadership is both the source and product of continuing systemic discrimination against female employees.”

Also, Western Digital’s “compensation, promotions, and placement policies and practices have resulted in and perpetuated longstanding, company-wide gender discrimination and sex-based disparities with respect to pay, promotions, and job placement.”

The company relies “on a tap-on-the shoulder promotion process that disparately impacts women and encourages the predominantly male management to engage in a pattern of disparate treatment. Rather than posting open positions, managers evaluate which, if any, of their reporting employees should be placed into them.”

Where the money goes

The $7.75m settlement will be disbursed to several groups of people;

  • $4,811,667 goes to California-based female employees of WD and related companies at or below the senior management level after November 1, 2012, and also to female employees of WD elsewhere in the USA with similar posts since November 1, 2013,
  • $2,583,333 at a maximum goes to class counsel in attorneys fees,
  • $97,324 goes to them for for litigation costs,
  • $180,000 at maximum to Chen for her litigation costs,
  • $18,000 service award to Chen,
  • $50,000 for Class Administrator’s fees and costs,
  • $75,000 for PAGA costs.

Any remaining money goes to a couple of legal aid charities.

The 1,863 individual class action members (plaintiffs) and they will get $3,615 each on average. The precise amount will depend upon their employment duration and pay rate.

Interested parties with Pacer access can read the final settlement document or by looking up Chen v. Western Digital Corp., C.D. Cal., No. 8:19-cv-00909, final approval of class settlement 1/5/21.

Handbags at dawn for Pure and NetApp: Who you calling a ‘legacy vendor’?

Pure Storage is a struggling legacy vendor while NetApp is powering ahead. So says NetApp in a corporate blog entitled Will the real legacy storage vendor please stand up? published this week.

John ‘Ricky’ Martin, director of strategy and technology at the Office of the CTO APAC, NetApp, penned the article in response to November 2020 blog by Pure that asserts NetApp is a legacy vendor following behind Pure.

Pure’s anti-NetApp ad

Proclaiming it’s “time to put complex legacy storage solutions in your rearview mirror”, the unnamed writer outlines Pure’s view of NetApp’s Insight 2020 virtual event.

The company criticises NetApp’s products for being complex, for example: “NetApp added S3 protocol support to its later version of ONTAP and claims that it’s been simplified (again). Adding more features to an operating system born in 1992 doesn’t simplify it in my book. Just ask NetApp how many parameters are available on the volume create operation. Try more than 30. It’s the opposite of simplicity.”

The blog writer says: “From our perspective, NetApp is:

  • Playing catch up on the hardware front
  • Adding even more complexity to existing software solutions
  • Announcing “projects” instead of products
  • Adding more management tools to an already long and complex list
  • Providing limited public information about a true utility consumption model
  • Offering no clear program for technology refreshes.

Ouch.

NetApp’s reply

Ricky Martin

In his riposte, Martin writes: “The new NetApp is already where Pure needs to be. Cloud led, data-centric, software first.”

And he goes on the attack. “After ten years of profitless business, Pure is desperately gambling large chunks of its dwindling funds at the R&D and M&A casino, hoping for a payoff that may never come.”

His blog is filled with zingy one-liners.

  • In the early 2010s “Pure Storage was rocking up the charts building the kind of business growth that comes from selling twenty-dollar bills for fifteen bucks a piece and telling everyone it was because of its radical differentiation.”
  • “Purity is over ten years old, and STILL looks a lot like ancient versions of ONTAP 7.3.”
  • Pure is “a box pusher stuck in the past, looking in its rear mirror while blindly heading down a path dominated by datacentres and storage media, still obsessed with simplistic proprietary hardware offerings as a road to differentiation”
  • ““All Flash” isn’t disruptive anymore. It’s ubiquitous, leaving Pure looking like nothing more than a Dell mini-me.”
  • “Pure doesn’t get cloud, Pure can’t do real data management, and when Pure does get software, it seems the first thing it tries to do with it is to shove it inside of a box full of proprietary tin.”

Where’s the referee?

We think NetApp and Pure are excellent companies with great products and services. NetApp is a legacy vendor with a substantial incumbent base, making remarkable strides to stay relevant in today’s hybrid, multi-cloud and pandemic-stricken environment. Pure is a newer vendor whose disruptive technology and business models have spurred NetApp and others to upgrade their own technology and business models, to customers’ benefit.

But in our view, Martin was intemperate in his attack against Pure and we disagree with the some of the points he makes. We might note that NetApp was late getting into all-flash arrays, and its efforts were spurred by Pure’s success. We might also consider that NetApp’s Keystone storage subscription service, introduced in October last year, was in part a response to Pure’s Evergreen and Pure-as-a-Service offerings.

Also we think Martin’s “proprietary tin” gibe rings hollow. Proprietary storage hardware in itself is of little or no concern when data access is through standard protocols; there is no lock-in here. All vendors have proprietary technology of some description, be it hardware or software. NetApp has its proprietary approach to hardware, using Pensando accelerators, for example.

Martin says: “We’ve worked hard to get where we are, we didn’t take shortcuts, we didn’t claim stuff we couldn’t do, we didn’t over-hype stuff, or spread baseless FUD.”

Pure also worked hard to get where it is. It didn’t take shortcuts either. Designing its own hardware to speed its data access services and make them more efficient over time is not taking a shortcut. And we’ll mention that Pure overtook NetApp in Gartner’s Magic Quadrant for Primary Arrays in December.

N.B. Revenue performance

The verbal energy that Martin puts into his blog post could be seen as a tribute to Pure’s competitive threats. Of course, NetApp is a much bigger company than Pure Storage. But if we compare the two company’s quarterly revenues we can see that Pure has reduced the gap somewhat, albeit from a long way behind.

The chart compares the two companies quarterly revenues normalised to NetApp’s fiscal year scheme. It also shows their net income with Pure consistently making losses as it pursues growth over profitability.

We can focus more closely on the revenue difference between NetApp and Pure by charting the gap between their revenues. (Note, there is a pronounced seasonal effect as NetApp’s Q4 persistently shows the highest quarterly revenue in any fiscal year.)

Analytics startups Dremio, Starburst and Firebolt draw in dollops of dollars

Data analytics startup Dremio has raised $135m in a D series, taking total funding to $250m, and achieving unicorn status ($1bn+ valuation). Competitor unicorn startup Starburst has pulled in $100m and Firebolt, another competitor, has taken in $37m.

The funding context is that cloud-based data warehouser Snowflake had a hugely successful IPO in September, 2020 and has a market cap of $80bn, at time of writing.

The shared dream here is to run fast data analytics on data from a variety of data sources. Dremio and Starburst say they do it simultaneously, without extracting the data you need and transforming it so it can be loaded into a single destination data warehouse. Instead you use software to combine different data sources into a single virtual data warehouse and analyse the data in that and in real time.

Firebolt says it performs extract, transform and load (ETL) much faster and more simply, and then runs faster analytics.

Dremio

California-based Dremio’s software technology enables data analytics to access source data lakes, thus avoiding existing extract, transform and load (ETL) procedures to build a data warehouse. Its cloud data lake engine software runs in the AWS and Azure public clouds, or on-premises via Kubernetes or Docker, and uses executor nodes with NVMe SSDs to cache data.

Dremio has a Hub software entity that provides connectors for Snowflake, Salesforce, Vertica, AWS Redshift, Oracle, various SQL databases, and others to integrate existing databases and data warehouses. 

Its performance claims seem almost outlandish; 3,000 times faster ad hoc queries, 1,700 times faster BI (Business Intelligence) queries and 90 per cent less compute needed than other SQL engines.

Starburst

Total funding for Boston-based Starburst stands at $164m, including $100m raised recently at a $1.2bn valuation.

The technology is based on Facebook’s open source Presto distributed query project. It applies SQL queries across disparate data lakes and warehouses, such as Teradata, Oracle, Snowflake and others.

Starburst’s software runs in AWS, Azure, and Google Cloud or on-premises via Kubernetes.

Firebolt

Israel-based Firebolt, the newest of the three startups here, says it delivers the ultimate cloud data warehouse running analytics with extreme speed and elasticity. The company set up in 2019 and bagged $37m in A-round funding in December 2020.

The software has native support for semi-structured data and querying with SQL. Firebolt claims “semi-structured data can be analysed quickly and easily, without the need for complicated ETL processes that flatten and blow up data set size and costs.”

In other words, it runs ETL processes to get data into its more scalable data warehouse, and then queries the data faster.

Firebolt says its serverless architecture separates compute from its S3 data lake storage and provides an order-of-magnitude leap in performance. Customers can analyse much more data at higher granularity with lightning fast queries.

More reading

Download a Dremio Architecture Guide to find out more about its software. Download a Starburst Presto guide to read about its technology, and inspect a Firebolt document comparing its technology to Snowflake’s. 

Huawei is the big beast of 2020’s IO500

An Intel DAOS file system has leapfrogged WekaIO on the IO500 list, an annual league table of the fastest HPC file systems. This reverses Weka’s win over DAOS in 2019. However, a Huawei-based system is almost four times faster again.

This AI cluster is seriously big supercomputing iron, making Intel’s DAOS test rig of 30 servers and 52 clients look like a Raspberry PI in comparison. Having it on the same list as the DAOS and Weka systems makes the IO500 look unbalanced, like comparing a racing yacht with kayaks.

You’re playing with the big boys now

The Pengcheng Cloudbrain-II, jointly developed by Huawei Technologies and Pengcheng Laboratory in China, radically out-performs every other system, with its 255 client nodes scoring 7,043.99 on the IO500 test.

The hardware is a Huawei Atlas 900 AI cluster that uses Huawei’s Kunpeng and Ascend processors. Kunpeng 920 CPUs are 64-core, 64-bit ARM processors, designed by Huawei and built on a 7nm process. According to the partners, this is the world’s largest artificial intelligence computing platform.

Updated IO500 list.

Pencheng Lab aims to eventually reach exascale computing, with four Atlas 900 AI clusters deployed, delivering 1,000 petaflops.

This AI cluster is seriously big supercomputing iron, making Intel’s DAOS test rig of 30 servers and 52 clients look like a Raspberry PI in comparison. Having it on the same list as the DAOS and Weka systems makes the IO500 look unbalanced, like comparing a racing yacht with kayaks.

Huawei Atlas 900 AI Cluster

Intel vs WekaIO

DAOS – Distributed Application Object Storage – is Intel’s open-source and Optane-using parallel file system for high performance file system operations. All-flash servers are being used. DAOS puts metadata into Optane Persistent Memory and also stages small IO operations there, before writing full blocks to the SSDs. 

Accessing filesystem metadata in the Optane memory is faster than accessing it in NVMe SSDs.

The relevant Intel DAOS and Weka IO500 scores are:

  • Intel DAOS + Optane PMem 200 Series – 1,792.98
  • WekaIO – 938.95
  • Intel DAOS + Optane PMem 100 Series – 933.64
Intel DAOS IO500 scores.

The extra performance of the gen 2 4-deck 3D XPoint-based PMem 200 Series DIMMs over the first generation PMem 100 Series almost doubled Intel’s DAOS score. 

DAOS is obviously a fast file system, and also cheap – it’s open source. But users have to commit to using Optane PMem products to get the best use out of it. A trade-off calculation is required; is DAOS + Optane PMem as cost effective as WekaIO’s software?

Kelsey Prantis, a senior software engineer at Intel, discusses the DAOS system in a YouTube video.

The full IO500 list is here

Infinidat picks Western Digital exec to be its new CEO

Infinidat, the enterprise large array supplier, has appointed a new CEO and CFO, replacing its co-CEO structure and completing the leadership transition from founder Moshe Yanai’s era.

The new CEO is Phil Bullinger, the ex-SVP and GM of Western Digital’s Data Centre Business Unit. This business unit was largely shelved by WD in September 2019, with the IntelliFlash product sold to DDN and the ActiveScale archival array to Quantum.

Phil Bullinger.

Scott Gilbertson, a member of the Infinidat Board of Directors, supplied a statement: ”Phil’s breadth of experience overseeing product development, strategy, and operations will enable him to lead Infinidat as it delivers the innovative and agile storage technologies its customers require for sustained competitive advantage.”

Bullinger joined WD in 2017 from EMC where he was the SVP and GM for its Isilon business unit.

Infinidat has hired Alon Rozenshein as its new CFO. Former co-CEO Nir Simon, becomes the sales head for the EMEA and Japan territories. The other co-CEO, Kariel Sadler, left the company last month. Rozenshein joins from Clarizen, a company that Infinidat executive chairman Boaz Chalamish also chairs. Chalamish’s role at Infinidat may change in the coming months, as he has finished “shepherding the company through its transition”.

Bullinger is based in Colorado and is not relocating to Infinidat’s Tel Aviv headquarters. Instead there will be lot of teleconferencing until the pandemic restraints ease, and then he will be well-positioned to meet face-to-face with Infinidat’s US customers, the company told us.

 A Chalamish statement said: “As we focus on executing the company’s strategy, we can rest assured that the new leadership will continue to carry on our cohesive, globally-facing approach to the market – while working closely with the Israeli-based engineering and product operations teams”.

Bullinger confirmed that founder Moshe Yanai is an active Chief Technology Evangelist for Infinidat, is “a huge advocate for Infinidat” and remains a major investor. Bullinger also said that Rozenshein is a “very strategic CFO”.

He told Blocks & Files that Infinidat is profitable, cashflow positive, and growing. The company grew in each 2020 quarter, and posted significant Y/Y growth in the fourth quarter. There was 24 per cent growth in PB deployed Y/Y in 2020, and 10 new Fortune 500 customers came on board in the year, with an average deal size of more than $1.3m. The company also hired extra staff in the fourth quarter.

Infinidat executive management team.

Bullinger said the Infinidat management team is experienced and solid and hisf ocus is to ensure the company executes its strategy and continue its profitable growth. His priorities include growing go-to-market capability, and scaling sales and engineering. There will be product enhancements to increase Infinidat’s value to its customers, he added.

Infinidat has now completed its changeover from being led by its founding CEO, guiding engineering light and Israel-based Moshe Yanai, to be a possibly more global business led by Bullinger.

Bullinger said the Covid-19 pandemic had levelled the playing field in one way. Because sales teams could not visit or, indeed, dominate customers with face-to-face sales team attention, there was a reliance on Zoom-style meetings and documentation. This was favourable to Infinidat and the company was getting involved in more requests to tender.

Why AWS Outposts is good for Cloudian

Last month I wrote there is an arguable case that AWS Outposts represents an existential threat to the on-premises hybrid cloud storage and server market.

Michael Tso, CEO of Cloudian, the object storage vendor, thinks I am wrong. For example, Outposts is severely limited when it comes to object storage, he says.

Tso told us: “Outposts is an enabler for Cloudian, similar to VMware, AzureStack and Anthos. Cloudian is very good at storing massive amounts of data, but users must stand up their own on-premises compute environments to work with the data.” Outposts can provide that on-premises compute environment.

Cloudian’s support of and partnerships with VMware, Azure Stack, Google Anthos  and Outposts have helped it gain over 500 enterprise and government customers.

Michael Tso

I put it to Tso that “Outposts with S3 competes with on-premises Cloudian.”

Think again, he replied: “Our business has surged since Outposts was announced two years ago (plus AzureStack and Anthos), which I think is due to two reasons.”

“The first is that Outposts (as well as AzureStack and Anthos) unequivocally validated why we started Cloudian eight years ago: data gravity leads to cloud [data] needing to be distributed rather than centralised. Cloud compute will need to move closer to the data because moving data is very expensive and time consuming. The Outposts website states why someone would want to use the service and includes low latency, local data processing and data residency – all of which have been our consistent message for the last eight years.” 

Hybrid cloud era

Tso acknowledges that on-premises storage vendors need to innovate to stay competitive with cloud storage service suppliers, and their on-premises systems. Cloudian “has invested heavily in the right differentiation areas and propelled our growth in this new hybrid-cloud era.”

Hardware players need to have “cloud-like elasticity in capacity and pricing, ” he says. “Software players need to provide more cloud-like features such as easy management, elastic capacities, cloud-native S3 APIs, and multi-tenancy and also differentiate in areas like security/air gap, IT policies/storage efficiencies, multi-cloud, etc.”

But surely, Outposts represents competition for Cloudian?

Tso replied: “In the two years since Outposts have been around, we have never seen them pop up as competition, rather only as an opportunity to collaborate.”

How come?

“Outposts S3 is quite different from AWS S3. Think of it as more of a cache – its control plane is in AWS, so it must always be ‘tethered’, and it runs on HCI (hyperconverged infrastructure) hardware, so it’s currently limited to 96TB per Outposts deployment with associated costs. In contrast, Cloudian scales to EBs, can operate in ‘air gap/dark site’ mode, and is frequently lower cost.”

“The model is entirely different, and so is the technology… Ultimately what Cloudian specialises in (distributed storage) is very, very hard at scale – both in terms of data volumes and number of deployments/configurations. While compute is elastic and portable, the technology for data persistence, resilience, security and sovereignty at scale in diverse enterprise network environments is very different from public cloud storage.“

According to TSO, AWS and Cloudian share the goal of using the S3 protocol for data on-premises instead of having it locked away in traditional silos. This S3-commonality makes that data friendly to cloud-native applications, such as AI, ML and analytics.

OK, So AWS Outposts is good for Cloudian today and presumably it is good for some other storage hardware vendors too. But tomorrow? Let us remind ourselves that Amazon is playing the long game, a game in which it sets the rules.