Home Blog Page 66

Pure CEO on single storage environments, hyperscalers buying flash, and PII protection

Pure Storage CEO Charlies Giancarlo believes that customers need a single- and multi-protocol storage environment to supply and store data for modern needs, and not restricted block or file-specific platforms or many different and siloed products that are difficult to operate and manage.

He was present at a Pure Accelerate event in London, UK, and his views on this – and on the hyperscaler flash-buying opportunity – came apparent during an interview. We started by asking him what, supposing he was meeting a customer who had been having a conversation with Vast Data, he would suggest they think about that company.

Charles Giancarlo.

Charles Giancarlo: “Vast is certainly an interesting company. They focused on a non-traditional use case when they first got started, which was large-scale data that required fast processing for  a short period of time. And they’ve made a lot of themselves over over the intervening year or two. They have a very different strategy than Pure. Their strategy is, basically, they created a system, and they’re attempting to make that one system … as the solution for everything.

“Our strategy rather, is that we’ve created one operating environment for block, file and object, across two different hardware architectures – scale up and scale out – but the main focus being to really virtualize storage across the the enterprise, such that the enterprise can create a cloud of data rather than individual arrays.”

He went on to say that: “Enterprise storage never made the transition that was made for personal storage – or for, let’s say, networks or compute – where you were able to virtualize the environment.”

Giancarlo suggested that enterprise on-premises developers moved to the cloud “because they could set up a customized infrastructure in the period of about an hour, through GUIs and through APIs. Your developers are not able to do that with enterprise. Why is that? Well, you have IP networks, you have Ethernet networks, just like the cloud does. You have your own virtualization, whether that was VMware or something else. So you have that. But it’s your storage that’s not been virtualized, and so your developers don’t have the ability to just set up new compute and open up new storage or access data that already exists with just a few clicks and through APIs.

“With Pure Storage now, a customer can manage their entire data environment as a single cloud of storage where they can set up the policies and the processes whereby their data is managed, and have that managed automatically in an orchestrated way – but furthermore, be able to share that storage among all of their different application environments.” 

Blocks & Files: And they can only do that in the cloud? Because Azure, AWS and Google have taken the time and trouble to put an abstraction software layer in place to enable them to do that. So in theory, you can take that same concept and bring it on premises?

Charles Giancarlo: “Exactly. You finish my story for me, because you already have it in place for compute and network, and now we complete the circle, if you will, with the storage side, with APIs. And what’s behind our capability is Kubernetes. We’re already put you on your Kubernetes journey with this, and now you can create that virtual storage. 

“[The software] is really now an orchestration layer. All of the APIs already exist. Your employees already understand how to use either VMware [and are] starting to become familiar with Kubernetes and containers.

“What’s really interesting about this is that enterprises, of course, have requirements that they have to fulfil – whether they’re regulatory or compliance or even to just to fulfil the needs of the enterprise itself – which may at times be different from those of the developer. 

“What I mean by that is the developer may not be thinking about resiliency, or how fast to come back from a failure, or when things should be backed up, right? So the organization now can set up policies for their data. They can set up different storage classes and then make those storage classes available by API to their developers, so the developers can choose a storage class that’s already been predefined by the organization to fit the organization’s needs. So it’s really a beautiful construct to allow an enterprise to operate more like a cloud.”

Blocks & Files: Pure has put Cloud Block store in place, which crudely speaking could be looked as Flash Array in the cloud. Is FlashBlade in the cloud coming?

Charles Giancarlo: “Flash blade in the cloud is coming. Think of it this way. Both of them are [powered by the ] Purity OS. Cloud Block Store looks like FlashArray in the cloud, but more importantly, it looks like block in the cloud. So what you’re asking really about is, what about file in the cloud? Because once up in the cloud, it is scale-out by definition, and FlashBlade is nothing but a scale-out version of FlashArray, right? So, the next step we want to take is file in the cloud, which we are working on. But we still want to see the block environment grow. Object in the cloud we’re uncertain about right now.”

Blocks & Files: There’s a very big S3 elephant in that forest.

Charles Giancarlo: “We’ve talked very, very long and hard about about block in the cloud to where we could provide a superior service at lower cost to the customer. With file in the cloud, we believe we’ll be able to do the same thing, albeit it’s taken us longer to get to understand what that would look like. Object, of course, was invented in the cloud to begin with, pretty much – not exactly, but for all intents and purposes, it was. And we have to find that if we can’t improve upon that, then it wouldn’t make sense. Certainly, what we want is make it fully compatible with S3 and and Azure.”

Blocks & Files: You don’t want simply to have say, a cloud object store, which is simply an S3 gateway.

Charles Giancarlo: “Of course. If we’re not adding value, we’re not going to do it.”

Blocks & Files: Do you think that Pure will continue its current growth rate, or that it will settle down to some extent, or even possibly accelerate?

Charlie Giancarlo: “I think we have the opportunity to accelerate for several reasons. One is – I’m already on record, so I’m not going to change it now – that we should win our first hyperscaler this year.”

Blocks & Files: It will be a revolutionary event, if you do that, because they’ll be buying from you. At the moment, they buy raw disk drives from Seagate, etc., but sure as heck, don’t buy disk arrays from anybody.

Charlie Giancarlo: “That’s right. And not as much, but they also buy raw SSDs. Roughly 90 percent of the top five hyperscalers, and maybe 70 to 80 percent of the top 10 hyperscalers, are hard disk-based.

“And there’s no question that, at some point in time, in my opinion, it will be all flash. Whether it’s us or SSD. That being said, because of our direct flash technology, we’re at the forefront of this, and we think we bring other value. We’re fairly certain we bring other value to the hyperscaler beyond just the fact that it’s flash rather than disc.

“We’re trying to be very careful about when we’re talking about when we refer to a hyperscaler design win and a hyperscaler buying our infrastructure. We’re separating that, for example, from AI, even if it’s a hyperscaler buying for AI purposes. The reason is, when we’re talking about the hyperscaler, we’re talking about them buying it for their core customer-facing storage infrastructure.

“In most of the hyperscalers, they have three or four unique layers of storage. Just three or four, you know, some with the lowest possible cost. And I’m talking about online now, not tape, from the lowest price-performance right to their highest price performance. And every one of their services – of which there could be hundreds or thousands – use one of the three or four infrastructures that they have in place. 

“What we’re talking about at first, is just replacing, let’s call it the nearline disk environment. But frankly, what we found is, as we get further along in our conversations, they say, ‘Well, if we’re going to use your technology for that, we might as well use it for all of the layers’ – because we’re not performance-limited, because we’re flash. And so, if you make sense at the lowest price layer, well you also make sense at higher performance layers.”

Blocks & Files: Seagate has been having enormous problems getting its HAMR disk drives qualified by hyperscalers.

Charlie Giancarlo: “There are two problems with disk that that Seagate won’t admit. And really I’m not trying to be competitive with disk. But the first is, the I/O doesn’t get any better. You can double the density, and the I/O doesn’t get any better. So eventually it just gets so big that you can’t really use all the capacity that’s in the system. 

“And the second is the power space and cooling doesn’t get doesn’t get any better. And so between those two things we’ll be at that point in flash. We’re not there yet, but we’ll be at the point where they can give away the discs, but the infrastructure costs will be more than the full system cost.

“The flash chip that’s not being accessed uses practically no power, almost no power. And of course, the chips themselves are getting denser. So it’s not as if you’re using twice as many chips. You’re using the same number and with twice the amount of capacity on each chip. So the thing that uses most of the power out of the DFM [Direct Flash Module] is our microcontroller that that runs the firmware that we download into it. Think of it as a co-processor. We don’t need more than one per DFM. We don’t really have any RAM … a SSD has a lot of DRAM on it that use a lot of power, so we’re lower power than SSDs as well.”

Blocks & Files: Could you envisage a day when a Pure array controller includes GPUs?

Charlie Giancarlo: “I don’t see the purpose of a GPU for the for the sole purpose of of accessing storage and for delivering storage. It wouldn’t accelerate, for example, the recovery of storage. So there’s several questions that would come from that. One is, could the GPU be used for other things – such as maybe some AI enhancement to the way that data is masked or interpreted before being written. The second thought that comes to my mind would be, would a customer want to place an AI workload of one type or another on the same controller?

“I think I would tend to think not on the second one, because it’s too constraining to the enterprise. Because there’s always going to be some constraint, whatever that is – scale, speed, whatever. I think most enterprises would want to separate out their choice of compute – a GPU compute platform – from the storage. Keep application compute and storage largely separate.

“We have been thinking – and I have nothing to to announce right now – but we have been thinking that there might be reasons why customers may want to do some AI work on data being written. In order to do things such as auto-masking, to be able to separate out some types of data from other types of data into different buckets that then could be handled in the background, differently than other buckets, to maybe vectorize the data, you could think of it as vectorization.

“Customers have told me this directly. They don’t know which of their files contain PII (Personally Identifiable Information). They don’t know this until after it’s been stolen and they’ve been ransomed.

“So imagine now that you had an engine that could somehow, auto-magically, start to do some amount of separation of the type of data that gets written, such that your PII data is held in a more secure environment than the non-PII data. If your non-PII data is stolen, OK, well, you don’t like it being stolen – but it’s not quite as damaging [as PII data being stolen]. So there are ideas such as that where the answer is maybe, possibly, more to see.”

MLPerf Storage V1.0 results show critical role of next gen storage in AI model training

SPONSORED FEATURE: MLCommons just released the result of MLPerf Storage Benchmark V1.0 which suggests that Huawei’s OceanStor A800 all flash array beats its competition by offering almost double the total throughput of its nearest rival.

The benchmark contains three workloads of 3D-Unet, resnet50, and cosmoflow. Compared with V0.5, V1.0 removed Bert workload, added resnet50 and cosmoflow, when NVIDIA H100 and A100 were also added to accelerator types.

Huawei participated in the 3D-Unet workload test using an 8U dual-node OceanStor A800 and it successfully supported the data throughput requirement of 255 simulated NVIDIA H100s for training, by providing a stable bandwidth of 679 GB/s and maintaining over 90 percent accelerator utilization.

The objective of MLPerf Storage Benchmark is to test the maximum number of accelerators supported by the storage system and the maximum bandwidth that the storage system can provide while ensuring optimal accelerator utilization (AU).

WorkloadBandwidth requirement for each accelerator
H100A100
3D-Unet2727MB/s1385MB/s
Resnet50176MB/s90MB/s
cosmoflow539MB/s343MB/s

Source: MLCommons

The data above indicates that to obtain high benchmark bandwidth, more accelerators need to be simulated. 3D-Unet H100 has the highest bandwidth requirement for storage among the workloads. This means that if the same number of accelerators  are simulated, 3D-Unet H100 can exert the greatest access pressure on storage. 

Source: Huawei

It’s important to note that the accelerator numbers and the bandwidth of each computing node do not directly reflect storage performance. Rather, they indicate the server performance of the computing nodes. Only the total number of accelerators (simulated GPUs) and the overall bandwidth can accurately represent the storage system’s capabilities.

“The number of host nodes is not particularly useful for normalization,” said an MLCommons spokesperson. “The scale of a given submission is indicated by the number and type of emulated accelerators – ie ten emulated H100s is 10x the work of one emulated H100 from a storage standpoint”. 

You can read more about how the MLPerf Storage v1.0 Benchmark results are compiled and presented here.

Source: Huawei

This result indicates that the OceanStor A800 is ahead of the curve in one important aspect: its total throughput registered 1.92x that of the second-place player, while the throughput per node and per rack unit were 2.88x and 1.44x that of the runner-up respectively (the full MLPerf Storage Benchmark Suite Results are available here).

Additionally, different from traditional storage performance test tools, the MLPerf Storage Benchmark also has strict requirements on latency. For a high-bandwidth storage system, when the quantity of accelerators is increased to provide higher access pressure to the storage system, a stable low latency is a must to prevent AU reduction and to achieve expected bandwidth. In the V1.0 test results, OceanStor A800 also appears capable of providing stable and low latency for the training system even when the bandwidth is high, which can help to maintain high accelerator utilization.

Source: Huawei

GenAI advancing with storage development 

In a global survey of AI usage conducted by independent analyst firm McKinsey, 65 percent of respondents revealed that they are now regularly using generative AI (GenAI), nearly double the number recorded by a previous McKinsey survey 10 months earlier. 

While regular AI is designed to work with existing datasets, GenAI algorithms focus on the creation of new content that closely resembles authentic information. This ability is creating a range of possibilities across numerous verticals. 

From software, finance to fashion, autonomous vehicles, most of varied GenAI use cases depend on the use of large language models (LLMs) to create the right kind of applications and workloads. When GenAI and LLMs work cooperatively with each other, it is also putting a strain on underlying storage architectures – a slow update of the data fed into large AI models could lead to poor results, including so-called AI hallucinations where a large AI model can start to fabricate inaccurate answers. 

Most technology companies are busy striving to resolve the challenges with storage products and solutions. The V1.0 test result indicates that the OceanStor A800 can provide data services for AI training and the maximization of GPU/NPU computing utilization, whilst also supporting cluster networking and providing high-performance data services for large-scale training clusters.

Huawei launched the OceanStor A800 High-Performance AI Storage in 2023 specifically to boost the performance of large model training and help organizations accelerate the rollout of applications based on those large AI models. During the recent HUAWEI CONNECT 2024 event, Dr. Peter Zhou – Vice President of Huawei and President of Huawei Data Storage Product Line – said that this new long-term memory storage system can significantly boost large AI model training and inference capabilities, and help various industries step into what he called the “digital-intelligent era”.

Sponsored by Huawei.

Samsung pitches latest SSD at automotive AI … which is now properly a Thing

With AI now widely deployed across the automotive sector, Samsung Electronics says it has developed the first PCIe 4.0 automotive SSD based on eighth-generation (236-layer) vertical NAND (V-NAND).

The new auto SSD, AM9C1, delivers on-device AI capabilities in automotive applications.

With about 50 percent improved power efficiency compared to its predecessor, the AM991, the new 256GB auto SSD will deliver sequential read and write speeds of up to 4,400 megabytes per second (MBps) and 400 MBps, respectively, says the vendor.

Samsung AM9C1 gen 8 V-NAND flash chip

“We are collaborating with global autonomous vehicle makers and providing high-performance, high-capacity automotive products,” said Hyunduk Cho, vice president and head of the automotive group at Samsung Electronics’ memory business. “We will continue to lead the physical AI memory market. that encompasses applications from autonomous driving to robotics technologies.”

Built on Samsung’s 5-nanometer (nm) controller and providing a single-level cell (SLC) namespace feature, the AM9C1 is currently being sampled by “key partners”, said Samsung, and is expected to begin mass production by the end of this year.

Samsung plans to offer multiple storage capacities for the SSD, ranging from 128 GB to 2 TB, to address the growing demand for higher-capacity automotive SSDs. The 2 TB model, which is set to offer the industry’s largest capacity in this product category, is scheduled to start mass production “early next year,” the supplier said.

The SSD is deemed to satisfy the automotive semiconductor quality standard AEC-Q1003 Grade 2, ensuring stable performance over a wide temperature range of -40°C to 105°C.

Samsung received ASPICE CL3 authentication for its 5G mobile and automotive UFS 3.1 flash storage product in March this year. CL3 is the highest standard for software and firmware in the automotive industry, with Western Digital also achieving it this year.

Earlier this month, Samsung said it was targeting the general AI market through the first mass production of its 1 TB quad-level cell QLC 9th-generation V-NAND.

Nvidia and Hive team up to tackle problem of rogue AI content

Hive, a provider of proprietary AI solutions to understand, search, and generate content, is integrating its AI models with Nvidia NIM microservices in private clouds and on-premises datacenters.

Nvidia NIM, part of the Nvidia AI Enterprise software platform, provides models as optimized containers, and is designed to simplify and accelerate the deployment of custom and pre-trained AI models across clouds, datacenters and workstations.

“Our cloud-based APIs process billions of customer requests every month. However, the ability to deploy our models in private clouds or on premises has emerged as a top request from prospective customers in cases where data governance or other factors challenge the use of cloud-based APIs,” said Kevin Guo, co-founder and CEO of Hive. “Our integration with Nvidia NIM allows us to meaningfully expand the breadth of customers we can serve.”

Existing Hive customers include the likes of Reddit, Netflix, Walmart, Zynga, and Glassdoor.

The first Hive models to be made available with Nvidia NIM are AI-generated content detection models, which allow customers to identify AI-generated images, video, and audio. The continued emergence of generative AI tools comes with a risk of misrepresentation, misinformation, and fraud, presenting challenges to the likes of insurance companies, financial services firms, news organizations, and others, says Hive.

“AI-generated content detection is emerging as an important tool for helping insurance and financial services companies detect attempts at misrepresentation,” said Justin Boitano, vice president of enterprise AI software products at Nvidia. “With NIM microservices, enterprises can quickly deploy Hive’s detection models to help protect their businesses against fraudulent content, documents and claims.”

Hive is also offering internet social platforms a no-cost, 90-day trial for its technology.

“The newfound ease of creating content with generative AI tools can come with risks to a broad set of companies and organizations, and platforms featuring user-generated content face unique challenges in managing AI-generated content at scale,” said Guo. “We are offering a solution to help manage the risks.”

Hive plans to make additional models available through Nvidia NIM “in the coming months”, including content moderation, logo detection, optical character recognition, speech transcription, and custom models through Hive’s AutoML platform.

Micron revenues surge 93% driven by AI demand

AI-driven server memory, particularly GPU high-bandwidth memory (HBM), and SSD demand sent Micron revenues in its final FY 2024 quarter to $7.75 billion, 93 percent higher year-on-year.

It made a net profit of $887 million in the quarter ended August 29, contrasting with the $1.4 billion loss a year ago. Full FY 2024 revenue was $25.1 billion, 62 percent higher year-over-year, with a $778 million net profit, versus FY 2023’s $5.83 billion loss.

Sanjay Mehrotra, Micron
Sanjay Mehrotra

President and CEO Sanjay Mehrotra stated: ”Micron delivered a strong finish to fiscal year 2024, with fiscal Q4 revenue at the high end of our guidance range and gross margins and earnings per share (EPS) above the high end of our guidance ranges. In fiscal Q4, we achieved record-high revenues in NAND and in our storage business unit. Micron’s fiscal 2024 revenue grew over 60 percent; we expanded company gross margins by over 30 percentage points and achieved revenue records in datacenter and in automotive.” 

He added: “Our NAND revenue record was led by datacenter SSD sales, which exceeded $1 billion in quarterly revenue for the first time. We are entering fiscal 2025 with the best competitive positioning in Micron’s history. We forecast record revenue in fiscal Q1 and a substantial revenue growth with significantly improved profitability in fiscal 2025.”

Financial summary

  • Gross margin: 36.5 percent vs -9 percent a year ago
  • Free cash flow: $323 million vs -$758 million last year
  • Cash, marketable investments, and restricted cash: $9.2 billion vs $10.5 billion a year ago
  • Diluted EPS: $1.18 vs -$1.07 a year ago. 
Micron revenues
FY 2024 has seen four straight quarters of increasing growth rates and Micron is set for revenue records in FY 2025

Micron makes two products for SSDs, DRAM and NAND, with DRAM revenues in the quarter rising 93 percent year-over-year to $5.33 billion and NAND up 96.3 percent to $2.4 billion. These products are sold into four markets, where the rosy revenue picture is:

  • Compute and networking: $3 billion, up 152 percent year-over-year
  • Mobile: $1.9 billion, up 55 percent
  • Storage: $1.7 billion, up 127 percent
  • Embedded: $1.2 billion, up 36 percent

The compute and networking business unit is growing fastest, followed by storage. The key demand driver is generative AI. Micron said multiple vectors will drive AI memory demand over the coming years: Growing model sizes and input token requirements, multi-modality, multi-agent solutions, continuous training, and the proliferation of inference workloads from the cloud to the edge. It sees no sign of any AI bubble with customers turning away from the tech.

Micron revenues
Micron expects a record revenue in FY 2025 Q1 and a substantial revenue record with significantly improved profitability in FY 2025. We expect to see $9 billion-plus revenue quarters in the future

In end-market terms, the strongest DRAM sector is HBM, needed for GPUs, and Micron expects the total addressable HBM market “to grow from approximately $4 billion in calendar 2023 to over $25 billion in calendar 2025. As a percent of overall industry DRAM bits, we expect HBM to grow from 1.5 percent in calendar 2023 to around 6 percent in calendar 2025.” 

Mehrotra said: “We have a robust roadmap for HBM and are confident we will maintain our time-to-market, technology and power efficiency leadership with HBM4 and HBM4E.” In the earnings call, he commented: “We look forward to delivering multiple billions of dollars in revenue from HBM in fiscal year ’25.”

Micron is also seeing “a recovery in traditional compute and storage” and has “gained substantial share in datacenter SSDs” where it “achieved a quarterly revenue record with over a billion dollars in revenue in datacenter SSDs in fiscal Q4, and our fiscal 2024 datacenter SSD revenues more than tripled from a year ago.” 

The company expects that “PC unit volumes remain on track to grow in the low single-digit range for calendar 2024. We expect unit growth to continue in 2025 and accelerate into the second half of calendar 2025, as the PC replacement cycle gathers momentum with the rollout of next-gen AI PCs, end of support for Windows 10 and the launch of Windows 12.”

Smartphones are being affected by AI as well, with Micron saying: “Recently, leading Android smartphone OEMs have announced AI-enabled smartphones with 12 to 16 GB of DRAM, versus an average of 8 GB in flagship phones last year … Smartphone unit volumes in calendar 2024 are on track to grow in the low-to-mid single-digit percentage range, and we expect unit growth to continue in 2025.”

Micron achieved a fiscal year record for automotive revenue in 2024 where infotainment and ADAS are driving long-term memory and storage content growth. Its automotive demand is being constrained as the industry adjusts the mix of EV, hybrid, and traditional vehicles to meet changing customer demand. It expects “a resumption in our automotive growth in the second half of fiscal 2025.” 

Western Digital reckons that the 3D NAND layer count race is over as each layer count addition adds a diminishing return. This will lengthen the period between layer count transitions. Micron agrees with this view, saying: “NAND technology transitions generally provide more growth in annualized bits per wafer compared to the NAND bit demand CAGR expectation of high-teens … We anticipate longer periods between industry technology transitions and moderating capital investment over time to align industry supply with demand.”

Micron invested $8.1 billion in capex in FY 2024 and expects to increase this by something like 35 percent in fiscal 2025, driven mostly by greenfield fab construction and HBM manufacturing facilities. Its investments in facilities and construction in Idaho and New York will support its long-term demand outlook for DRAM and will not contribute to bit supply in fiscal 2025 and 2026. 

Outlook

Mehrotra said: “We are entering fiscal 2025 with the strongest competitive positioning in Micron’s history … We look forward to delivering a substantial revenue record with significantly improved profitability in fiscal 2025, beginning with our guidance for record quarterly revenue in fiscal Q1.”

HBM will contribute to this: “We expect to ramp our HBM3E 12-high output in early calendar 2025 and increase the 12-high mix in our shipments throughout 2025 … Our HBM is sold out for calendar 2024 and 2025, with pricing already determined for this time frame.” 

The revenue outlook for the next quarter (Q1 FY 2025) is $8.7 billion +/- $200 million, an 84 percent increase at the midpoint on the year-ago number. The full-year outlook was not given.

Generative AI training and inference is set to boost Micron’s revenues. Let Mehrotra have the final words: “With the advent of AI, we are in the most exciting period that I have seen for memory and storage in my career.” 

MLPerf AI benchmark tests how storage systems keep GPUs busy

The MLPerf Storage benchmark combines three workloads and two types of GPU to present a six-way view of storage systems’ ability to keep GPUs busy with machine learning work.

This benchmark is a production of MLCommons – a non-profit AI engineering consortium of more than 125 collaborating vendors and organizations. It produces seven MLPerf benchmarks, one of which is focused on storage and “measures how fast storage systems can supply training data when a model is being trained.”

MLPerf website

MLCommons states: “High-performance AI training now requires storage systems that are both large-scale and high-speed, lest access to stored data becomes the bottleneck in the entire system. With the v1.0 release of MLPerf Storage benchmark results, it is clear that storage system providers are innovating to meet that challenge.”

Oana Balmau, MLPerf Storage working group co-chair, stated: “The MLPerf Storage v1.0 results demonstrate a renewal in storage technology design. At the moment, there doesn’t appear to be a consensus ‘best of breed’ technical architecture for storage in ML systems: the submissions we received for the v1.0 benchmark took a wide range of unique and creative approaches to providing high-speed, high-scale storage.” 

The MLPerf Storage v1.0 Benchmark results provide, in theory, a way of comparing different vendors’ ability to feed machine learning data to GPUs and keep them over 90 percent busy.

However, the results are presented in a single spreadsheet file with two table sets. This makes comparisons between vendors – and also between different results within a vendor’s test group – quite difficult. To begin with, there are three separately tested workloads – 3D Unet, Cosmoflow, and ResNet50 – each with MiB/sec scores, meaning that effectively there are three benchmarks, not one.

The 3D UNet test looks at medical image segmentation using “synthetically generated populations of files where the distribution of the size of the files matches the distribution in the real dataset.” Cosmoflow is a scientific AI dataset using synthetic cosmology data, while ResNet50 is an image classification workload using synthetic data from ImageNet. All three workloads are intended to “maximize MBit/sec and number of accelerators with >90 percent accelerator utilization.”

MLPerf graphic
MLCommons diagram

These three workloads offer a variety of sample sizes, ranging from hundreds of megabytes to hundreds of kilobytes, as well as wide-ranging simulated “think times” – from a few milliseconds to a few hundred milliseconds. They can be run with emulated Nvidia A100 or H100 accelerators (GPUs), meaning there are actually six separate benchmarks.

We asked MLPerf about this and a spokesperson explained: “For a given workload, an emulated accelerator will place a specific demand on the storage that is a complex, non-linear function of the computational and memory characteristics of the accelerator. In the case here, an emulated H100 will place a greater demand on the storage than an emulated A100.”

There are two types of benchmark run division: Closed, which enables cross-vendor and cross system comparisons; and Open, which allows for interesting results intended to foster innovation. Open allows more flexibility to tune and change both the benchmark and the storage system configuration to show off new approaches or new features that will benefit the AI/ML community. But Open explicitly forfeits comparability to allow showcasing innovation. Some people might think having two divisions is distracting rather than helpful. 

Overall there are seven individual benchmarks within the MLPerf Storage benchmark category, all present in a complicated spreadsheet that is quite hard to interpret. There are 13 submitting organizations: DDN, Hammerspace, HPE, Huawei, IEIT SYSTEMS, Juicedata, Lightbits Labs, MangoBoost, Nutanix, Simplyblock, Volumez, WEKA, and YanRong Tech, with over 100 results across the three workloads.

David Kanter, head of MLPerf at MLCommons, said: “We’re excited to see so many storage providers, both large and small, participate in the first-of-its-kind v1.0 Storage benchmark. It shows both that the industry is recognizing the need to keep innovating in storage technologies to keep pace with the rest of the AI technology stack, and also that the ability to measure the performance of those technologies is critical to the successful deployment of ML training systems.”

We note that Dell, IBM, NetApp, Pure Storage and VAST Data – all of whom have variously been certified by Nvidia for BasePOD or SuperPOD use – are not included in this list. Both Dell and IBM are MLCommons members. Benchmark run submissions from all these companies would be most interesting to see.

Hammerspace noted: “It is notable that no scale-out NAS vendor submitted results as part of the MLPerf Storage Benchmark. Well-known NAS vendors like Dell, NetApp, Qumulo, and VAST Data are absent. Why wouldn’t these companies submit results? Most likely it is because there are too many performance bottlenecks in the I/O paths of scale-out NAS architectures to perform well in these benchmarks.” 

Comparing vendors

In order to compare storage vendors on the benchmarks, we need to separate out their individual MLPerf v1.0 benchmark workload type results using the same GPU on the closed run type – such as 3D Unet-H100-Closed. When we did this for each of the three workloads and two GPU types, we get wildly different results, even within a single vendor’s scores, making us concerned that we are not really comparing like with like.

For example, we separated out and charted a 3D Unet-H100-Closed result set to get this graph:

MLPerf results

Huawei scores 695,480 MiB/sec while Juicedata scores 5,536 MiB/sec, HPE 5,549 MiB/sec, and Hammerspace 5,789 MiB/sec. Clearly, we need to somehow separate the Huawei and similar results from the others, or else normalize them in some way.

Huawei’s system is feeding data to 255 H100 GPUs while the other three are working with just two H100s – obviously a completely different scenario. The Huawei system has 51 host compute nodes and the other three have none specified (Juicedata) and one apiece for HPE and Hammerspace.

We asked MLPerf if we should normalize for host nodes in order to compare vendors such as Huawei, Juicedata, HPE, and Hammerspace. The spokesperson told us: “The number of host nodes is not particularly useful for normalization – our apologies for the confusion. The scale of a given submission is indicated by the number and type of emulated accelerators – ie ten emulated H100s is 10x the work of one emulated H100 from a storage standpoint. While MLCommons does not endorse a particular normalization scheme, normalizing by accelerators may be useful to the broader community.”

We did that, dividing the overall MiB/sec number by the number of GPU accelerators, and produced this chart:

MLPerf results

We immediately see that Hammerspace is most performant – 2,895 MiB/sec (six storage servers) and 2,883 MiB/sec (22 storage servers) – on this MiB/sec per GPU rating in the 3D Unet workload closed division with H100 GPUs. Lightbits Labs is next with 2,814 MiB/sec, with Nutanix next at 2,774 MiB/sec (four nodes) and 2,803 MiB/sec (seven nodes). Nutanix also scores the lowest result – 2,630 MiB/sec (32 nodes) – suggesting its effectiveness decreases as the node count increases.

Hammerspace claimed it was the only vendor to achieve HPC-level performance using standard enterprise storage networking and interfaces. [Download Hammerspace’s MLPerf benchmark test spec here.]

Huawei’s total capacity is given as 457,764TB (362,723TB usable) with Juicedata having unlimited capacity (unlimited usable!), HPE 171,549.62TB (112,596.9TB usable), and Hammerspace 38,654TB (37,339TB usable). There seems to be no valid relationship between total or usable capacity and the benchmark score.

We asked MLPerf about this and were told: “The relationship between total or usable capacity and the benchmark score is somewhat submission-specific. Some submitters may have ways to independently scale capacity and the storage throughput, while others may not.”

Volumez

The Volumez Open division test used the 3D Unet workload with 411 x H100 GPUs, scoring 1,079,091 MiB/sec; the highest score of all on this 3D Unet H100 benchmark, beating Huawei’s 695,480 MiB/sec.

John Blumenthal, Volumez chief product officer, told us: “Our Open submission is essentially identical to the Closed submission, with two key differences. First, instead of using compressed NPZ files, we used NPY files. This approach reduces the use of the host memory bus, allowing us to run more GPUs per host, which helps lower costs. Second, the data loaded bypasses the Linux page cache, as it wasn’t designed for high-bandwidth storage workloads.”

Volumez submitted a second result, scoring 1,140,744 MiB/sec, with Blumenthal explaining: “In the second submission, we modified the use of barriers in the benchmark. We wanted to show that performing a barrier at the end of each epoch during large-scale training can prevent accurate measurement of storage system performance in such environments.”

YanRong Tech

YanRong Tech is a new vendor to us. A spokesperson, Qianru Yang, told us: “YanRong Tech is a China-based company focused on high-performance distributed file storage. Currently, we serve many leading AI model customers in China. Looking globally, we hope to connect with international peers and promote the advancement of high-performance storage technologies.”

We understand that the firm’s YRCloudFile is a high-performance, datacenter-level, distributed shared file system product built for software-defined environments, providing customers with a fast, highly scalable and resilient file system for their AI and high-performance workloads.

NetApp unveils all-flash SAN arrays, AI tech at INSIGHT 2024

NetApp announced all-flash SAN arrays, a generative AI vision, and AI-influenced updates across its product line at its NetApp INSIGHT 2024 event in Las Vegas.

It has begun the Nvidia certification process for its ONTAP AFF A90 storage array with Nvidia’s DGX SuperPOD AI infrastructure. This certification will complement and build upon NetApp ONTAP’s existing certification with the DGX BasePOD. NetApp’s E-Series is already SuperPOD-certified.

ONTAP now has a directly integrated AI data pipeline, allowing ONTAP to make unstructured data ready for AI automatically and iteratively by capturing incremental changes to the customer data set, performing policy-driven data classification and anonymization. It then generates highly compressible vector embeddings and stores them in a vector database integrated with the ONTAP data model, ready for high-scale, low-latency semantic searches and retrieval-augmented generation (RAG) inferencing.

NetApp separately announced today a new integration with Nvidia AI software that can leverage the global metadata namespace with ONTAP to power enterprise RAG for agentic AI. The namespace can unify data stores for the tens of thousands of ONTAP systems. The overall architecture brings together NetApp’s AIPod, ONTAP, the BlueXP unified control plane, and Nvidia’s NeMo Retriever and NIM microservices.

Harvinder Bhela, NetApp
Harvinder Bhela

Harvinder Bhela, NetApp chief product officer, stated: “Combining the NetApp data management engine and Nvidia AI software empowers AI applications to securely access and leverage vast amounts of data, paving the way for intelligent, agentic AI that tackles complex business challenges and fuels innovation.” 

NetApp customers will be able to discover, search, and curate data on-premises and in the public cloud, based on a set of criteria, honoring existing policy-based governance policies. Once the data collection has been established through BlueXP, it can be dynamically connected to NeMo Retriever, where the dataset will be processed and vectorized to be accessible for enterprise GenAI deployments with appropriate access controls and privacy guardrails.

This, NetApp claims, “creates the foundation for a generative AI flywheel to power next-generation agentic AI applications that can autonomously and securely tap into data to complete a broad range of tasks to support customer service, business operations, financial services and more.”

Other AI news

NetApp is working to provide an integrated and centralized data platform to ingest, discover, and catalog data across all its native cloud services. It is also integrating its cloud services with data warehouses and developing data processing services to visualize, prepare, and transform data. The prepared datasets can then be securely shared and used with the cloud providers’ AI and machine learning services, including third-party offerings.

Krish Vitaldevara, SVP Platform at NetApp, said: “NetApp empowers organizations to harness the full potential of GenAI to drive innovation and create value across diverse industry applications. By providing secure, scalable, and high-performance intelligent data infrastructure that integrates with other industry-leading platforms, NetApp helps customers overcome barriers to implementing GenAI.”

NetApp’s AIPod with Lenovo ThinkSystem servers for Nvidia OVX converged infrastructure system, designed for enterprises aiming to harness GenAI and RAG capabilities, is now generally available.

FlexPod AI, the converged system built with Cisco UCS  compute, Cisco networking, and NetApp storage, now has new AI features. When running RAG, it simplifies, automates, and secures AI applications.

Additionally, F5 and NetApp announced an expanded collaboration to accelerate and streamline enterprise AI capabilities using secure multi-cloud networking solutions from F5 and NetApp’s suite of data management solutions. This collaboration leverages F5 Distributed Cloud Services to streamline the use of large language models (LLMs) across hybrid cloud environments. F5 said that by integrating F5’s secure multi-cloud networking with NetApp’s data management, enterprises can implement RAG solutions efficiently and securely, enhancing the performance, security, and utility of their AI systems.

ONTAP ransomware protection

NetApp is announcing the general availability of its NetApp ONTAP Autonomous Ransomware Protection with AI (ARP/AI) solution, with 99 percent accuracy for detecting ransomware threats. Customers can use ARP/AI to monitor abnormal workload activity and automatically snapshot data at the point in time of attack, so they can respond and recover faster from attacks. ARP/AI uses machine learning to identify threats, and NetApp will consistently release new models. Customers can non-disruptively update those models, independent of ONTAP updates, to defend against the latest ransomware variants.

The BlueXP control plane now integrates with Splunk SIEM to simplify and accelerate threat response by informing stakeholders across an organization’s security operations. BlueXP ransomware protection uses AI-driven data classification capabilities to ensure the most sensitive data is protected at the highest levels. BlueXP also has new User and Entity Behavior Analytics (UEBA) integrations to identify malicious activity in user behavior in addition to the ARP/AI-provided file system signals.

Gagan Gulati, VP and GM for Data Services at NetApp, stated: “Data storage systems are the last line of defense against a cybersecurity incident and NetApp takes that as a responsibility to provide the most secure storage on the planet.”

ASA A-Series

There are three new models: ASA A70, A90 and A1K, the same names as the latest NetApp AFF products announced in May. At the time, we wrote that the A70 and A90 are like storage appliances, having integrated controller and drive shelves, whereas the A1K is modular, with separate 2RU controller and 2RU x 24-slot storage drive chassis. They have Sapphire Rapids gen 4 Xeon SP controller processors and are powered by the ONTAP OS providing unified file and block storage.

NetApp AFF A-Series hardware
AFF A-Series hardware

Sandeep Singh, SVP and GM of Enterprise Storage at NetApp, stated: “With the new NetApp ASA A-Series systems, our customers can modernize their operations to meet the demands of more powerful workloads on block storage without having to choose between operational simplicity and high-end capabilities.” 

NetApp’s ASA arrays are positioned as being block-only for SAN workloads and have a symmetric, active-active controller architecture, but still run ONTAP. The new ASA models use the same AFF A70, A90, and A1K hardware. We envisage that the existing ASA400 is to be succeeded by the ASA A70, the ASA A800 by the ASA A90, and the ASA A900 by the latest ASA A1K. 

NetApp ASA A-Series video showing latest systems
NetApp ASA A-Series video showing latest systems

NetApp’s John Shirley, VP of Product Management for Enterprise Storage,  blogs: “The updated UI incorporates familiar concepts and terminology used by SAN administrators” with “storage units – LUNs and NVMe namespaces – are consolidated on a single page for a cohesive view.” There is “built-in full-stack AIOps for predictive and proactive insights, observability, and optimization.”

To support the new ASA A-Series, NetApp has enhanced its Data Infrastructure Insights service, formerly Cloud Insights, with updates so that customers can better manage visibility, optimization, and reliability for their data infrastructure to increase savings and performance.

NetApp has also added to its portfolio of hybrid flash storage arrays with new FAS 70 mid-range and FAS 90 high-end FAS systems, which offer “affordable, yet high-performing backup storage, enabling a secure cyber vault for recovery from ransomware attacks.” The company’s ONTAP Autonomous Ransomware Protection (ARP) and WORM are available for no additional cost with Cloud Volumes ONTAP (CVO).

There are new features generally available for Google Cloud NetApp Volumes and Azure NetApp Files. For Google Cloud NetApp Volumes, the Premium and Extreme service levels now can provision large volumes starting at 15 TiB that can be scaled up to 1 PiB dynamically in increments of 1 GiB. Google Cloud customers can achieve cost savings through auto tiering, which moves less frequently accessed data to lower-cost storage service levels. 

Azure NetApp Files customers can achieve cost savings through cool access auto tiering, which moves less frequently accessed data to lower-cost storage services. Additionally, users can improve data availability with cross-zone replication, enhancing data protection by replicating volumes across Azure availability zones.

BlueXP, NetApp’s unified management facility, has been updated so that it streamlines the ONTAP upgrade process with a service that identifies potential candidates, validates compatibility, reports on recommendations and benefits, and executes selected updates through intuitive wizards.

The new ASA A-Series systems are available for quoting and will begin shipping shortly. Get a datasheet here. It had not been updated with the three new ASA A-Series systems when we looked, though.

Data protection company pays $47M to stick Clumio in its Commvault

Data protection company Commvault is buying AWS cloud data protector Clumio for $47 million – significantly less than Clumio’s total funding.

Clumio was founded in 2017 and provides SaaS data protection services to Amazon’s S3, EC2, EBS, RDS, SQL on EC2, DynamoDB, VMware on AWS, and Microsoft 365, storing its backups in virtual air-gapped AWS repositories. Its CEO is Rick Underwood, who has been CEO for three months since his promotion from CRO in June. Co-founder and board chair Poojan Kumar was the previous CEO.

Sanjay Mirchandani, Commvault
Sanjay Mirchandani

Commvault CEO Sanjay Mirchandani said: “Combining Commvault’s industry-leading cyber resilience capabilities with Clumio’s exceptional talent, technology, and AWS expertise advances our recovery offerings, strengthens our platform, and reinforces our position as a leading SaaS provider for cyber resilience.”

Kumar added: “At Clumio, our vision was to build a platform that could scale quickly to protect the world’s largest and most complex data sets, including data lakes, warehouses, and other business-critical data. Joining hands with Commvault allows us to get our cloud-native offerings to AWS customers on a global scale.” 

Clumio has raised $262 million in funding, with the latest being a $75 million D-round in February this year, four months before Underwood’s promotion. A Commvault SEC filing says the acquisition price is approximately $47 million, a fraction of Clumio’s total funding, seemingly less than Clumio’s D-round in February. 

Clumio customers include Atlassian, Cox Automotive, Duolingo, and LexisNexis. Clumio says it has experienced a 4x year-over-year growth in annual recurring revenue (ARR).

Poojan Kumar, Clumio
Poojan Kumar

Commvault’s history shows a focus on on-premises data protection with a move to SaaS-based protection with its 2019-launched Metallic offering. It has become a revenue growth driver for Commvault, along with its cyber-resilience product features. The company reached a quarterly revenue high point of $224.7 million in the second quarter of 2024.

Clumio appointed Carol Hague as VP of Marketing in July as Rick Underwood set about growing the company.

A period of consolidation is now operative in the data protection world. It started with Cohesity buying Veritas in February, and now Commvault has bought Clumio.

The asset acquisition, as Commvault calls it, is expected to close in early October 2024, and be immediately accretive to ARR and revenue, and accretive to free cash flow within the next three quarters. The $47 million cost will be funded with cash on hand. This acquisition follows Commvault’s purchase of cloud resiliency supplier Appranix in April this year.

The “asset acquisition” term indicates that Commvault is not buying the entire Clumio company and its liabilities, only specific assets, which have not been identified.

Commvault has reiterated its fiscal second quarter 2025 earnings guidance previously announced on July 30, 2024. The company reported fiscal 2024 revenues of $839.2 million. In its latest quarter it reported cash and cash equivalents of $287.9 million. Its FY Q2 period ends in September and the earnings guidance is for revenues of $220 million +/-$2 million. Clumio won’t contribute much to this; the quarter is nearly over. 

Kumar founded Clumio with CTO Woon Ho Jung and engineering VP Kaustubh Patil. The three previously founded PernixData, which Nutanix bought in August 2016 for 528,517 shares and $1.2 million in cash.

Huawei unveils next-gen OceanStor Dorado all-flash array

Huawei has launched a seventh generation of all-flash OceanStor Dorado storage array products to support mission-critical AI workloads with more performance and resilience.

There are three Dorado product groups: the high-end 8000 and 18000, the mid-range 5000 and 6000, and the entry-level 3000. The v7 Dorado 18000 has three times more performance, Huawei says, than the previous generation, helped by CPU offload hardware including DPU-based SmartNICs separating data flows from control flows. An upgraded FLASHLINK intelligent collaboration algorithm between disk controllers and DPUs enables over 100 million IOPS and “extremely low, 0.03 ms latency” from an up to 32-controller system with 500 PB of capacity.

More offload engines are used for deduplication and compression, and ransomware detection. 

The Dorado features 99.99999 percent single system reliability. Its SmartMatrix full-mesh architecture tolerates multi-layer faults, including those of controller enclosures, disk enclosures, and rack cabinets. It can survive the failure of up to seven out of eight controller enclosures without service interruption. Huawei says it supports ransomware protection for SAN and NAS across all zones, achieving a claimed ransomware detection rate of up to 99.99 percent. Intelligent snapshot correlation analysis and snapshot synthesis technologies ensure 100 percent data availability after recovery. Integrated IO-level continuous data protection means data can be recovered to any point in time.

Huawei OceanStor Dorado

Huawei says it has a unique active-active system for SAN, NAS, and S3 workloads providing load-balancing and failover.

A Data Management Engine (DME) allows for dialog-based O&M and can proactively detect exceptions through AI large language model technologies, improving O&M efficiency five-fold. It provides 30 percent higher IOPS per watt and TB per watt compared to the best unnamed competitor product, according to a Datacenter Dynamics report.

Huawei says the Dorado 18000 has a native parallel architecture for blocks, files, and objects. It can be placed in existing v6 clusters and will be compatible with future v8 clusters.

Yang Chaobin, Huawei’s president of ICT products and solutions, said at the Huawei Connect 2024 Shanghai event: “We have a lot of v6 customers that have been looking forward to seeing our next generation. Because of the American sanctions, we have a lot of limitations politically, so now we are gradually trying to recover from a lot of those difficulties, and our customers are looking for that.”

Huawei has more info on next-gen OceanStor Dorado All-Flash Storage here.

The role of data governance in AI

COMMISSIONED: Robust data governance is key to ethical, compliant, and efficient AI projects. Here’s why the balance between innovation and responsibility is delicate, but crucial.

In a bustling New York office, a data scientist named Emily is racing against the clock. Her team is developing an AI algorithm intended to revolutionize personalized customer experiences. The project is ambitious and promising, with the potential to drive unprecedented business growth. However, Emily has one lingering concern: data governance. Despite her excitement, she knows that without robust data governance, the project could face ethical dilemmas, compliance issues, and even data breaches. Emily’s story is not unique; it’s a reflection of the broader challenges faced by organizations today as they balance the pursuit of innovation with the responsibility of data stewardship.

Artificial Intelligence (AI) has become the cornerstone of modern innovation, driving advancements in various fields such as healthcare, finance, and entertainment. AI’s ability to process and analyze massive amounts of data allows businesses to uncover insights and make decisions that were previously unimaginable. Yet, with great power comes great responsibility. The same data that fuels AI’s capabilities also poses significant challenges in terms of governance, privacy, and ethical use.

Data governance is the framework that ensures data is managed properly throughout its lifecycle. It involves policies, procedures, and technologies that maintain data quality, security, and compliance. For AI to be truly transformative, organizations must prioritize data governance as much as they prioritize AI development.

The Importance of data governance in AI

As organizations increasingly adopt AI technologies, the need for strong data governance becomes essential. Robust data governance ensures that AI systems are not only efficient and accurate but also aligned with legal and ethical standards. Here are four crucial aspects through which data governance enhances AI projects:

– 1) Ensuring data quality: AI algorithms are only as good as the data on which they are trained. Poor-quality data leads to inaccurate models, which can result in flawed business decisions. Data governance ensures that data is accurate, complete, and reliable, providing a solid foundation for AI initiatives.

– 2) Compliance and privacy: With stringent regulations like GDPR and CCPA, compliance is a critical aspect of data governance. AI projects must adhere to these regulations to avoid hefty fines and legal repercussions. Data governance frameworks help organizations manage consent, anonymize data, and implement robust security measures to protect sensitive information.

– 3) Ethical AI: As AI systems become more integrated into decision-making processes, ensuring ethical use of data is paramount. Data governance provides guidelines to prevent biases, ensure fairness, and maintain transparency in AI algorithms. This not only builds trust with customers but also mitigates risks associated with unethical AI practices.

– 4) Operational efficiency: Effective data governance streamlines data management processes, reducing redundancy and improving efficiency. This enables data scientists and analysts to focus on extracting value from data rather than dealing with data quality issues or compliance roadblocks.

PowerScale is a storage solution designed to handle massive amounts of unstructured data, making it an ideal solution for AI applications. It is also a prime example of how technology drive and reinforce strong data governance practice with features such as:

Scalability and performance

Achieving operational efficiency includes maximizing scalability and performance. PowerScale is designed to seamlessly scale to meet the expanding data demands of AI applications while maintaining top-tier performance. Based on internal testing which compared the streaming write of the PowerScale F910 using the OneFS 9.8 distributed file system to the streaming write of the PowerScale F900 using OneFS 9.5 distributed file system, the new F910 delivers faster time to AI insights with up to a 127 percent improved streaming performance (actual results may vary). It accelerates the model checkpointing and training phases of the AI pipeline, keeping GPUs fully utilized with up to 300 PBs of storage per cluster. This ensures uninterrupted model training and prevents GPU idling, effectively accelerating the AI pipeline.

Additionally, PowerScale supports GPU Direct and RDMA (Remote Direct Memory Access) technologies, further optimizing data transfer between storage and GPUs. GPU Direct enables direct communication between GPUs and the storage system, bypassing the CPU, which reduces latency and improves throughput. RDMA enhances this by allowing data to be transferred directly between storage and GPU memory over the network, minimizing CPU involvement and further reducing bottlenecks. Together, these technologies ensure that large datasets are managed efficiently, and that data remains accessible and manageable, fostering high-quality AI development on our AI-ready data platform.

Data security and compliance

PowerScale integrates advanced security features, including encryption, access controls, and audit trails, to protect sensitive data and ensure regulatory compliance. With federal-grade embedded security and real-time API-integrated ransomware detection, it safeguards the entire AI process from attacks and protects your intellectual property from unauthorized access.

PowerScale also supports air-gapped environments, providing an extra layer of security by isolating critical systems from unsecured networks. This ensures that your most sensitive data is kept out of reach from external threats, significantly reducing the risk of cyberattacks. The air-gapped configuration is particularly crucial for industries with stringent compliance requirements, such as finance, healthcare, and government, where the integrity and confidentiality of data are paramount. By combining air-gapped protection with comprehensive security measures, PowerScale offers a robust solution that meets the highest standards of data security and regulatory compliance.

Data lifecycle management

PowerScale provides tools for managing data throughout its entire lifecycle, from creation to archiving, ensuring that data is treated according to governance policies at every stage. This includes not just storage, but also classification, retention, and deletion, which helps organizations maintain compliance with regulatory requirements. By automating these processes, PowerScale reduces the risk of human error, ensuring that data governance is applied consistently. Furthermore, it supports tiering strategies, allowing organizations to move less frequently used data to lower-cost storage while keeping critical data accessible, optimizing both cost and performance as AI workloads evolve.

Flexibility and integration

PowerScale offers the flexibility to build your infrastructure when, where, and how you need it. Its variety of node types and software services enable right-sizing and scaling of infrastructure to match diverse workload requirements. Additionally, PowerScale seamlessly integrates with existing data management tools and workflows, including Hadoop Distributed File System (HDFS), NFS, and SMB protocols. For AI-driven workflows, it supports popular data pipeline tools like Apache Spark and TensorFlow. This broad integration capability makes it easy to fit PowerScale into existing environments, allowing data teams to leverage their current tools while gaining the scalability and performance advantages PowerScale offers.

The balance between innovation and responsibility is delicate but crucial. Organizations must foster a culture that values data governance as much as technological advancement. This involves:

– 1) Leadership commitment: Leaders must prioritize data governance and allocate resources to develop and maintain robust frameworks. This commitment sets the tone for the entire organization and emphasizes the importance of responsible data management.

– 2) Cross-functional collaboration: Data governance is not solely the responsibility of IT departments. It requires collaboration across all functions, including legal, compliance, and business units. This ensures that data governance policies are comprehensive and aligned with organizational goals.

– 3) Continuous improvement: Data governance is an ongoing process that must evolve with changing regulations, technologies, and business needs. Regular reviews and updates to governance policies ensure that they remain effective and relevant.

The journey of balancing innovation and responsibility is ongoing. As AI continues to evolve and integrate into various aspects of our lives, the role of data governance becomes increasingly critical. PowerScale exemplifies how technological solutions can support this balance, providing the tools necessary to manage data effectively and responsibly.

Ultimately, it’s not just about what AI can achieve, but how it’s implemented. Organizations prioritizing data governance will be better positioned to leverage AI’s full potential while maintaining the trust and confidence of their stakeholders. Just like the example given of Emily, businesses must recognize that innovation and responsibility go hand in hand, ensuring a future where AI advancements are achieved with integrity and accountability.

Learn how Dell solutions can help you transform with AI.

Brought to you by Dell Technologies.

Pure Storage extends FlashBlade with file services, capacity boost

Pure Storage is providing a set of fleet and cluster level FlashBlade file services that make it easier to operate and manage large-scale file environments in a cloud-like way, plus an entry-level FlashBlade//S100 system, and doubled Direct Flash Module (DFM) maximum capacity to 150 TB.

FlashBlade is Pure’s all-flash, unified file and object storage system, operated by the Purity OS. Pure is adding two software capabilities for multiple storage classes – Fusion for Files and Zero Move Tiering – plus Real Time Enterprise File services with always-on multiple protocol, auditing, QoS, SMT for file and the AI Copilot. The company is also adding universal (cross-product) licensing credits and a VM Assessment service to help optimize Pure customers’ on-prem and public cloud storage for virtual machine (VM) workloads in light of Broadcom changes to VMware licensing conditions and costs. Overall it claims that legacy file storage is restricting customers’ ability to meet modern file workload requirements.

Chief product officer Ajay Singh said in a statement: ”For years, outdated and rigid legacy file storage has done customers a huge disservice by holding them back and forcing them towards frequent technology refresh cycles. Through the Pure Storage platform, Real-time Enterprise File along with the new VM assessment and Universal Credits empowers our customers to navigate today’s fast-moving, complex business environment with confidence.”

Pure graphic

Pure says its Real-time Enterprise File offering enables file services to dynamically change, adapt, and reconfigure in real time. With Fusion, there are now fleet-level global storage pools – unlimited storage pools without any fixed allocation. Pure arrays can join a global storage pool while keeping data in place.

The Purity OS supports both SMB and NFS file access protocols by default and also logs write requests by default for subsequent auditing with log configuration required. It solves noisy workload neighbor problems by having always-on quality of service to prevent any one workload using excessive resources and bottlenecking performance.

Purity can now virtualize a FlashBlade system into so-called servers. These, Pure claims, “provide data access isolation for file workloads enabling customers to service multiple untrusted domains, present multiple share or export namespaces, or share data between different untrusted domains.”

The Zero Move Tiering idea enables different storage performance classes by having a single QLC DFM storage base and applying faster or slower compute and network resources to data accesses for hot or cold data respectively. It’s quasi-storage tiering in that data does not move between storage tiers, as is usually the case, to save costs by putting cold data on slower and cheaper disk tiers. Instead compute performance classes provide tiered levels of storage access.

Pure graphic

The FlashBlade//S100 fits under the existing performance-centric FLashBlade//S200 and capacity-centric FlashBlade//S500 systems as an entry-level product.  Both the //S200 and //S500 have one to four DFMs with QLC (4bits/cell) flash media and up to 3,000 TB (3 PB) raw capacity. There are up to four DFMs per blade carrier and systems start with seven blades, scaling up to ten blades in a chassis. There can then be up to ten chassis in a cluster. 

The //S100 has four QLC-based DFMs (18/37/75 TB) per blade and starts at 126 TB with seven blades each carrying a single 18 TB DFM. It can scale up to 3 PB of raw capacity. At launch, Pure will support 37 TB DFMs and 18/75 TB DFMs will follow. The //S100 can be non-disruptively upgraded to either the //S200 or //S500.

By introducing its 150 TB DFMs, Pure has doubled the maximum raw capacity of its FlashBlade systems. They can now support up to 4 PB in a 3RU chassis and 6 PB in a 5RU chassis; 40 x 150 TB DFMs, and 60 PB in a cluster. Pure customers can non-disruptively double the capacity of their current 75 TB DFM FlashBlade deployments. The 175 TB DFMS are more than twice the capacity of current 61.44 TB SSDs, such as those supplied by Solidigm, and will enable Pure’s arrays to occupy less datacenter space than competing all-flash arrays using off-the-shelf SSDs.

The Universal Credits scheme lets customers purchase a pool of credits and use them across various services without being locked into specific subscriptions, while providing predictable billing. Customers can gain volume discounts by purchasing Universal Credits and applying them across various Evergreen//One consumption-based service model for storage, Pure Cloud Block Store, and Portworx services. 

The VM Assessment service is included in a Pure1 cloud-based management and monitoring platform subscription. It provides VM performance monitoring, utilization, and rightsizing recommendations with potential subscription impacts for scenario planning.

Shawn Hansen, Pure Storage
Shawn Hansen

The already-announced AI Copilot is here presented as a new way to manage file services using natural language. Users can get a quick, comprehensive view of file services, from performance to capacity, and pinpoint specific user activity and get proactive recommendations to optimize their environment before issues arise.

We asked Shawn Hansen, general manager of Pure’s core platforms unit, if he sees a time when the differences between the FlashArray and FlashBlade on-premises product wither away as they effectively become a single storage resource. “We see that time emerging very quickly,” he said. “Basically, customers will zoom out and they’ll see an SLA, and they’ll see a class of business processes, and then they’ll just change the SLA of what they need, and we will deploy the node. The node will be FlashArray or FlashBlade, but the customer won’t care. That’s the vision that we have. Exactly as you said. In the end, you’re deploying a cloud service. You don’t care about what’s underneath the covers.”

A blog, Leave Legacy Behind with the Pure Storage Platform, provides background information.

Universal Credits are available now. FlashBlade//S100, FlashBlade Zero Move Tiering, and VM Assessment will be generally available in the fourth quarter of Pure’s fiscal 2025, meaning by the end of January. The 150 TB DFMs should arrive by the end of December this year.

Gartner moves Magic Quadrant goalposts for primary storage

Gartner has updated its primary storage platform Magic Quadrant ratings, resulting in three of last year’s Leaders being displaced into the Challengers box.

Update: Huawei comment added – 24 September 2024

Gartner analysts Jeff Vogel and Chandra Mukhyala have redefined a primary storage platform (PSP) as providing “standardized enterprise storage products, along with platform-native service capabilities to support structured data applications. PSP products like primary enterprise storage arrays provide mandatory and common enterprise-class primary storage features and capabilities needed to support the platform. Platform-native services like storage as a service (STaaS) and ransomware protection, with PSP product capabilities, are required to support platform-native services.”

They say the PSP market has evolved “in conjunction with the demand for hybrid, multi-domain platform-native storage services, extending on-premises services to public cloud, edge, and colocation environments.” In effect, it’s no longer enough to provide on-premises block storage array hardware and software. That software has to run in the main public clouds, provide a cloud consumption model and cyber-resiliency in the cloud, on-premises, and in hybrid environments.

Their strategic planning assumptions are:

  • By 2027, consumption-based platform SLA guarantees will replace over 50 percent of product feature requirements in storage selection decisions, up from less than 5 percent in 2024.
  • By 2028, consumption-based storage as a service (STaaS) will replace over 33 percent of enterprise storage capital expenditure (capex), up from less than 15 percent in 2024.
  • By 2028, more than two-thirds of critical application primary storage infrastructure will employ cyber liability detection and protection capabilities, up from less than 5 percent in 2024.

Here’s the 2024 PSP Magic Quadrant (MQ) diagram:

Because of this redefinition, the suppliers in 2023 MQ edition’s Leaders box, shown below, have mostly received lower ratings on the horizontal Completeness of Vision axis and moved to the left, with three – Hitachi Vantara, Huawei, and Infinidat – crossing the Leaders box boundary to become Challengers.

Huawei’s Michael Fan, Marketing VP of Data Storage Product Line, tells us: “Gartner’s decision to place so much emphasis on North America does not give an accurate picture of the global market and risks misleading customers. Huawei remains a world-leader in data storage products and solutions, and is trusted by customers in over 150 countries and regions.”

IEIT Systems was a Challenger last year and has moved down the vertical Ability to Execute axis to become a Niche Player. DDN, a Niche Player last year, has exited the MQ “because it did not meet the minimum requirements and inclusion criteria for platform-native services,”  while Zadara has entered for the first time, as a Niche Player.

This year, Pure Storage has the highest Completeness of Vision and Ability to Execute ratings in the Leaders box, followed by HPE and NetApp, then IBM and Dell.

The Gartner analysts promote the concept of an on-premises software-defined storage system that separates compute and storage resources, which helps compute and capacity resources scale independently and cost-effectively. They mention that Pure Storage does not offer this capability and neither do Hitachi Vantara nor NetApp.

The analysts also note that high-capacity (60 TB or more) QLC flash drives are being offered by some suppliers, but not all, as an alternative to hard disk drive storage. HPE has made the 2024 PSP MQ report available here.

Bootnote

The “Magic Quadrant” is a 2D space defined by axes labeled “Ability To Execute” and “Completeness of Vision,” and split into four squares tagged “Visionaries” and “Niche Players” at the bottom, and “Challengers” and “Leaders” at the top. The best placed vendors are in the top right Leaders box and with a balance between execution ability and vision completion. The nearer they are to the top right corner of that box the better.