B&F discussed predictions for 2023 with Ajay Singh, Pure Storage‘s Chief Product Officer, and challenged some of these, including tiering management, the economics of HDDs and more.
Pure Prediction – 1: 2023 will sound the last death knell of spinning-disk storage with the era of tiering, complexity, and forced customer compromises finally coming to an end.
Blocks & Files: Why? What forced customer compromises? The hyperscalers use tiering so what’s wrong with it?
Ajay Singh: The benefits of flash for data storage are undeniable – it has until now been largely a question of cost, and what workloads a customer has justify the additional cost for flash. To bridge that gap, customers have had to make tradeoffs, or manage tiering between flash and disk to save costs.
With the long-term decline of NAND costs, the additional benefits of QLC and advances like Pure Storage’s DirectFlash technology, we are within arm’s reach of crossing that cost-chasm and saving customers from having to make tradeoffs.
Blocks & Files: Also Pure’s Evergreen//One has four block and two unified file and object tiers in the catalog. The block tiers have different performance levels. Eg:
- Capacity Tier – lower commitments with a minimum of 200 TiB, and for tier 2 workloads, decreasing the minimum entry point by one third. Other tiers retain their minimum commitment of 50 TiB .
- Faster Performance Tier to accelerate hybrid and multi-cloud environments.
- Even faster Premium Tier to support specialised tier 1 workloads such as containers and test and dev applications.
- The fastest UItra Tier designed for in-memory databases.
So if Pure itself offers tiering, what’s wrong with it?
Ajay Singh: There’s always going to be a place for higher and lower performance workloads – but with our performance tiers, it’s an issue of optimizing and balancing compute performance for storage efficiency and capacity, not trading off disk vs. flash because of cost. With our Evergreen/One’s performance tiers, we are giving customers choice on how much compute processing to deploy to drive needed performance vs. how much to optimize for capacity, scale and power efficiency.
Pure Prediction – 2: The economics that long led companies to maintain lower-cost but slower spinning-disk storage no longer hold as NAND cost per bit continues to approach that of disk.
Blocks & Files: But flash is still more expensive than disk and there is no evidence that NAND cost/bit will equal disk cost/bit. Where are the numbers in cost/bit and TCO to justify your assertion?
Ajay Singh: Without getting into exact cost $’s (which we don’t disclose), the main drivers to close the gap on finished system costs between Pure’s arrays and a disk-based array mainly come down to: more effective data reduction, much greater density (without sacrificing performance), lower power, cooling and space costs, and much longer service lifetimes.
The parallelism of flash, combined with Purity operating environment software which was designed to exploit it, allows us to drive significantly more data reduction (dedupe and compression) than disk-based systems.
Our DirectFlash technology allows us to ship much denser systems without sacrificing performance in the way that large HDDs or SSDs would, which results in being able to provide several times more raw storage behind the same amount of compute that other systems can.
And then with the operating savings we give customers in lower power, cooling, and space costs, as well as reliability and service lifetimes – we can make flash very compelling compared to HDD-based systems on a TCO-basis.
Pure Prediction – 3: The workloads that dominate companies’ IT and strategic agendas are increasingly based on modern machine generated unstructured data which is incompatible with the spinning disk.
Blocks & Files: Why is machine-generated data incompatible with spinning disk? It needs to be stored and disk storage is cheaper than flash storage. What is the problem?
Ajay Singh: Unstructured data tends to be produced and accessed in much more unpredictable and highly concurrent ways – mostly as a result of the applications interacting with the data. Structured data tends to be beholden to a single application, which drives one type of access pattern and moderate parallelism.
Some forms of unstructured data, particularly those involved in analytics / data pipelines or technical computing, tend to be fanned in and out of larger scale-out applications / compute tiers, which results in highly parallel and less predictable data access patterns – something that is much more challenging for mechanical disk-based systems to manage.
Pure Prediction – 4: The pandemic has forced organizations to look at the human touch points associated with forklift upgrades, painful upgrades, and unplanned outages – and the need to eliminate them. Flash is better enabled by software and significantly more reliable.
Blocks & Files: Prove it. High-availability disk drive arrays can have non-painful, non-forklift upgrades and can be free from outages. Such things are not inherently media-centric. Why is flash better enabled by software than disk and what is it better enabled for? What numbers are you using to justify the statement that flash is significantly more reliable than disk?
Ajay Singh: Pure’s flash is significantly more reliable than disk. We look at industry data on annualized failure rates (AFR) of HDDs and SSDs, and compared to our own fleet reliability data (DirectFlash modules that Pure has shipped, is supporting, and is monitoring through phone home telemetry), our flash modules are significantly more reliable as measured by our own annualized return rates.
This advantage compared to HDDs, is largely from avoiding environmental or mechanical failures, and compared to SSDs is largely from far simpler firmware in the drives (due to DirectFlash software), and much better media endurance due to Purity’s flash awareness and optimizing for P/E cycles.
Pure Prediction – 5: Truly elastic “as-a-service consumption” is delivering the agility that organizations need as they evolve to distributed cloud architectures. Flash is more agile and efficient.
Blocks & Files: Are you saying that disk storage cannot be used in an elastic “as-a-service consumption way? Why not? The hyperscalers use disk storage in an as-a-service way so they find it agile and efficient. What numbers are you using to justify this claim? Which hyperscalers have abandoned disk storage?
Ajay Singh: This is more a comment on the as-a-Service experience built on enterprise arrays – hyperscalers have built their services on significantly different infrastructure (SW and HW) than the typical enterprise has or needs.
Within the context of enterprise-class systems, Pure’s flash-based systems are able to serve a wide range of workloads, over a wide range of performance and capacity points, with a single set of HW building blocks, a largely shared SW codebase, and 2 core architectures. This is contrasted against many other systems needing to be more specifically configured/tuned for different workload types, making it much more difficult to deliver a truly seamless “as-a-Service” experience based on it.