Storage news ticker – October 1

By

-

October 1, 2024

Assured Data Protection, the largest global Managed Service Provider (MSP) for Rubrik, has launched in the Middle East offering fully managed backup and cyber recovery services to businesses of all sizes. Assured has been rapidly growing its footprint in 2024 following successful launches in Canada and Latin America. This Middle East expansion is supported by a strategic partnership with Mindware, a leading Value-Added Distributor (VAD) in the region. The collaboration will establish local datacenters to help clients manage data sovereignty issues and minimize latency in data transfer, enhancing operational efficiency and security.

…

Storage exec Mike Canavan — Mike Canavan

Object storage supplier Cloudian has appointed Mike Canavan as worldwide VP of sales to drive revenue growth, customer engagement, and lead all field operations across the worldwide sales team. He has a broad storage industry background, having most recently headed Americas sales at Model9, the mainframe VTL and data migration business bought by BMC. Prior to that he served as Global VP of Sales for the Emerging Solutions Business at Hitachi Vantara, and previously leading global sales for Pure Storage’s FlashBlade business. There was a stint at EMC in his CV as well. Cloudian recently took in $23 million in funding. Combine that with this appointment and it indicates Cloudian has business expansion in mind.

…

Dell and Nvidia have integrated Dell’s AI Factory with Nvidia’s Llama Stack with agentic (GenAI software that makes decisions on your behalf) workflows in mind. The reference architecture uses Dell’s PowerEdge 9680 server fitted with Nvidia H100 GPUs. Llama 3.2 introduces a versatile suite of multilingual models, ranging from 1B to 90B parameters, capable of processing both text and images. These models include lightweight text-only options (1B and 3B) as well as vision LLMs (11B and 90B), supporting long context lengths and optimized for inference with advanced query attention mechanisms. Of the updates for Meta Llama 3.2, a particularly interesting update allows enterprises to use new multimodal models to securely utilize different models for various applications, such as detecting manufacturing defects, enhancing healthcare diagnostic accuracy, and improving retail inventory management. Read about this in a Dell blog.

…

Data mover Fivetran has surpassed $300 million in annual recurring revenue (ARR), up from $200 million in 2023. Fivetran has consistently driven elevated ARR growth and recently reaccelerated year-over-year gains over the past two consecutive quarters.

…

HighPoint Technologies announced PCIe Gen5 and Gen4 x16 NVMe Switch series AICs and Adapters using Broadcom’s PCIe Switch ICs, supporting up to 32x 2.5-inch NVMe devices and speeds up to 60GB/sec and 7.5 million IOPS. They support features like the 2×2 drive mode for enhanced workload distribution, include Broadcom’s Hardware Secure Boot technology, and are natively supported by all modern Linux and Windows platforms. Rocket 1600 Series PCIe Gen5 x16 are equipped with Broadcom’s PEX89048 switch IC and Rocket 15xx series PCIe Gen4 x16 AICs and Adapters utilize Broadcom’s PEX88048 switch IC. Both series provide 48 lanes of internal bandwidth, and enable each AIC or Adapter to allocate 16 lanes of dedicated upstream bandwidth (to the host platform), and 4 lanes to each device port to ensure each hosted device performs optimally. The 2×2 mode splits the connection of a single NVMe SSD into two separate logical “paths,” each of which is assigned 2 PCIe lanes. The operating system will recognize each path as a distinct drive.

…

Hitachi Vantara announced significant growth in its data storage business in the first quarter of its 2024 fiscal year (ended June 30) with a 27 percent increase of quarter-over-quarter (Q/Q) product revenue growth compared to Q1 FY23, exceeding the market CAGR of 11.31 percent. The growth was even more pronounced in the United States, which saw an increase of 54 percent Q/Q compared to the previous year.

…

Storage exe Brian Babineau — Brian Babineau

SaaS-based dat protector HYCU announced its first chief customer officer, Brian Babineau. He joins from Barracuda where he led the success and support organizations for the MSP business before becoming chief customer officer. As HYCU continues to grow its customer base from 4,200+ customers in 78 countries worldwide, Brian’s experience will be instrumental in supporting customers throughout their experience with HYCU. He was an integral part of Barracuda’s shift from an appliance to SaaS business model across several security solutions, and also has extensive experience with working with teams at scale.

…

Cloud provider Lyrid announced its open source database as a service offer based on Percona Everest. Percona announced Everest at the Open Source Summit event last week, and Lyrid is using the Everest cloud data platform to power its service. It uses Percona Everest to provide a flexible and open DBaaS where customers can choose their database and approach without the fear of vendor lock-in. This contrasts with the vast majority of DBaaS options that are tied to specific cloud providers or database options, limiting customer choice. Lyrid offers this service based on its datacenter partners, Biznet Gio and American Cloud, to provide customers with hassle-free database automation at lower costs. Customers can also choose to run this on their own datacenter environments, enjoying a fully configured and privately managed DBaaS without lock-in.

We’re told that what makes Everest unique is that it is fully open source, so any organization can run their choice of database (PostgreSQL, MySQL or MongoDB) on their preferred cloud service, including OpenStack, and on any flavor of Kubernetes as well. Using Kubernetes, Lyrid can deliver the same kind of automated database service that other DBaaS products offer, but at both lower cost and without lock-in.

…

Cloud file services suppluer Nasuni is further integrating with Microsoft 365 Copilot. Through the Microsoft Graph Connector, Nasuni managed data is fully accessible and operational with Microsoft Search and Microsoft 365 Copilot, expanding data access for Microsoft’s AI services. The Graph Connector enables organizations to leverage Nasuni’s managed data repositories, enabling Nasuni managed files to be indexed into Microsoft’s semantic index, and so provide contextually relevant answers and insights across Microsoft 365 applications. There is single-pane-of-glass access to customers’ Microsoft 365 data (including SharePoint and OneDrive) and Nasuni. This unified view allows for efficient searching and interaction with documents across the entire unstructured file stack, inclusive of Nasuni-managed data.

…

NetApp has expanded its AWS relationship with a new Strategic Collaboration Agreement (SCA) to accelerate generative AI efforts, delivering data-rich experiences through workload migration and new application deployments on AWS. The two will enable increased AWS Marketplace purchases, especially for NetApp CloudOps solutions, to streamline processes for customers. Instaclustr by NetApp manages open source vector databases – a crucial component in the delivery of fast and accurate results in RAG architectures. The close collaboration between AWS and NetApp on advanced workloads makes it simpler and faster for customers to unlock value from their data using RAG. NetApp is the only enterprise storage vendor with a first-party data storage service natively built on AWS with Amazon FSx for NetApp ONTAP.

…

Database-as-a-service (DBaaS) supplier Tessell is partnering Microsoft Azure and NetApp to deliver a ubiquitous Copilot for Cloud Databases. It integrates an enterprise-grade Database PaaS with one-click functionality for any database on Azure, leveraging Azure NetApp Files (ANF) as enterprise cloud storage and Tessell as the unified Database Service. For the first time, customers of Azure and NetApp will have access to an enterprise-grade Managed Instance for Oracle on Azure, fully integrated with ANF and supporting any virtual machine (VM) family across all Azure regions. Azure Saving Plans and NetApp effective capacity pricing are available, ensuring Co-Sell incentives and Microsoft Azure Consumption Credits (MACC) enablement. The bundled offering includes 24x7x365 support with a 15-minute response time for issues related to Azure, ANF, and Tessell Oracle PaaS.

Tessell claims customers can experience up to a 45 percent reduction in total cost of ownership (TCO) across four vectors: infrastructure optimization, third-party software optimization, database license optimization, and operational cost optimization. More information here.

…

Nutanix Unified Storage (distributed file system with commodity server-based scale-out) ranked first in the “image classification” workload category and second for the “image segmentation” workload offering top performance, faster throughput, and linear scaling in the latest round of MLPerf benchmarks, the AI storage benchmark by MLCommons. For the image classification workload, a single node was able to power 33 AI accelerators and deliver over 5,970 MB/sec throughput. The same results were achieved at scale, showing linear performance with 32-node deployment driving 1,056 AI accelerators and over 19,1043 MB/sec of throughput. Nutanix said AI is rapidly increasing the demand for sustained high-throughput and high-performance storage solutions due to the massive datasets and complex computations required for training and inference.

…

Oak Ridge National Laboratory (ORNL) is disposing of 32,494 disk drives as it is closing down its Summit supercomputer. The 20 tons of drives are being fed into a mobile ShredPro Secure shredder – a waist-high, 4 foot wide unit. The drives are fed by a technician into an opening at the top of the machine, where counter-rotating metal teeth tear the drives apart and reduce them to small, irregular strips a few inches in size. The mobile shredder can shred one hard drive every 10 seconds, with a theoretical capacity to process up to 3,500 hard drives a day. A conveyor belt gathers the material and deposits the waste into a bin, which is then transferred to larger containers and taken to be recycled through ORNL’s metal recycling program.

Collecting useless HDDs at ORNL for shredding

…

Pure Storage and Rubrik are partnering to add 3-layer protection to Flash Array data with a reference architecture. Layer 1 is Pure’s immutable snapshot technology, with auto-on safe mode governance providing instant recovery from a secure enclave accessible only to designated contacts authenticated through Pure Storage Support. Layer 2 is compliant immutable and on-site backup via the Rubrik Secure Vault, providing anomaly detection, threat monitoring and hunting, sensitive data monitoring, user intelligence and orchestrated recovery. Layer 3 is archival FlashBlade//S and //E storage, with massive scale-out and immutable, cost-effective storage designed for rapid recovery, even for data stored over extended periods. More info available here.

…

Rubrik has partnered with Okta to provide Okta Identity Threat Protection with user context to accelerate threat detection and response. Rubrik shares with Okta important user context such as email and the types of sensitive files they have accessed. By combining Rubrik’s Security Cloud user access risk signals with threat context from other security products used by an organization (such as Endpoint Detection and Response or EDR), Okta can determine overall risk levels more effectively and automate threat response actions to mitigate identity-based threats. A Rubrik blog provides more information.

…

According to Tom’s Hardware, Samsung has announced and is mass-producing PM9E1 Gen 5 M.2 SSDs with speeds up to 14.5GB/sec read and 13GB/sec write bandwidth. The PM9E1 has a 2,400 TBW (Terabytes written) lifespan rating, double that of its PCIe 4 PM9A1 predecessor. It is also 50 percent more power-efficient. Device Authentication and Firmware Tampering Attestation security features are included through the v1.2 Security Protocol and Data Model (SPDM).

…

SK hynix has begun mass production of the world’s first 12-layer HBM3E product with the largest (36GB) capacity of existing HBM to date, using DRAM chips made 40 percent thinner to increase capacity by 50 percent at the same thickness as the previous 8-layer product. It’s increased the speed of memory operations to 9.6Gbit/sec, the highest memory speed available today. If Llama 3 70B, a Large Language Model (LLM), is driven by a single GPU equipped with four HBM3E products, it can read 70 billion total parameters 35 times within a second. Samsung Electronics and SK hynix are expected to post record sales in the third 2024 quarter, driven by AI chip demand. SK hynix could even overtake Intel in sales, becoming the third largest semiconductor supplier.

…

Enterprise app data manager Syniti is working with tyre supplier Bridgestone’s EMEA unit to remove siloes across the organization by consolidating business processes across its SAP ERP Central Component (SAP ECC) system into one SAP S/4HANA Cloud instance allowing the company to improve operational efficiency, data accuracy and set the stage for future growth. Read the full case study here.

…

Research house TrendForce claims concerns over a potential HBM oversupply in 2025 have been growing in the market. Its latest findings indicate that Samsung, SK hynix, and Micron have all submitted their first HBM3e 12-Hi samples in the first half and third quarter of 2024, respectively, and are currently undergoing validation. SK hynix and Micron are making faster progress and are expected to complete validation by the end of this year. The market is concerned that aggressive capacity expansion by some DRAM suppliers could lead to oversupply and price declines in 2025. If an oversupply does occur, it is more likely to affect older-generation products such as HBM2e and HBM3. TrendForce maintains its outlook for the DRAM industry, forecasting that HBM will account for 10 percent of total DRAM bit output in 2025, doubling its share in 2024. HBM’s contribution to total DRAM market revenue is expected to exceed 30 percent given its high ASP.

…

University of Southampton boffins, who are developing 5-dimensional silicon glass storage that basically lasts forever – it’s stable at room temperature for 300 quintillion years – stored the full human genome on a crystal of the glass which is in the Memory of Mankind archive located within a salt cave in Hallstatt, Austria. It’s the usual kind of eye-catching but useless demo of a technology that is years away from commercial use.

The boffins, led by Professor Peter Kazansky, used ultra-fast lasers to inscribe data into 20nm size nano-structured voids orientated within the silica. The crystal is inscribed with a visual key showing the universal elements (hydrogen, oxygen, carbon and nitrogen); the four bases of the DNA molecule (adenine, cytosine, guanine and thymine) with their molecular structure; their placement in the double helix structure of DNA; and how genes position into a chromosome, which can then be inserted into a cell.

The boffins’ release proclaims: “Although it is not currently possible to synthetically create humans using genetic information alone, the longevity of the 5D crystal means the information will be available if these advances are ever made in the future.” Gee whiz.

Storage exec Itay Nebenzahl — Itay Nebenzahl

…

Open source-based hyperconverged cloud technology supplier Virtuozzo has appointed Itay Nebenzahl as its new CFO. He joins Virtuozzo from Logz.io, where he served as CFO and was instrumental in managing the company’s multi-currency cash exposure and global customer portfolio. Prior to Logz.io, Itay held the CFO position at Au10tix Limited where he led a multi-hundred-million dollar round with a leading VC, preparing the team for a multi-million special purpose acquisition company (SPAC) IPO.

…

Storage exec Lauren Vaccarello — Lauren Vaccarello

WekaIO announced that its Data Platform has been certified as a high-performance data store for Nvidia Partner Network Cloud Partners. Nvidia Cloud Partners can leverage the WEKA Data Platform’s performance, scalability, operational efficiency, and ease of use through the jointly validated WEKA Reference Architecture for Nvidia Cloud Partners using Nvidia HGX H100 systems. Also Lauren Vaccarello has been apointed as WEKA’s first CMO. Vaccarello is a veteran marketing executive and celebrated author, board member, and angel investor with a proven track record of accelerating revenue growth for enterprise software companies. She has previously served as the CMO of Salesloft and Talend and held executive leadership positions at Box, Salesforce, and Adroll. Before joining WEKA, she was an entrepreneur-in-residence at Scale Venture Partners. She sits on the boards of Thryv and USA for UNFPA.

…

Xinnor’s xiRAID technology, combined with Lustre, has enhanced HPC capabilities at German research university Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU). Highlights:

Write throughput increased from 11.3GB/sec to 67GB/sec
Read throughput improved from 23.4GB/sec to 90.6GB/sec
Nearly doubled usable storage capacity

Huawei shares views on MLPerf Storage benchmark

By

Chris Mellor

-

October 1, 2024

The MLPerf Storage v1.0 benchmark has generated a lot of interest concerning how vendor scores could and should be compared. Huawei, for its part, argues the samples/sec rating should be used – normalized by storage nodes or their rack units – not the MiB/sec throughput rating.

The benchmark results measure storage system throughput in MiB/sec on three AI-relevant workloads, providing a way to compare the ability of different vendors’ systems to feed machine learning data to GPUs and keep them over 90 percent busy.

We thought the differences between vendors were so extreme that the results should be normalized in some way to make more valid comparisons between vendors. When we asked MLPerf if we should normalize for host nodes in order to compare vendors such as Huawei, Juicedata, HPE, Hammerspace, and others, a spokesperson told us: “The scale of a given submission is indicated by the number and type of emulated accelerators – i.e. ten emulated H100s is 10x the work of one emulated H100 from a storage standpoint. While MLCommons does not endorse a particular normalization scheme, normalizing by accelerators may be useful to the broader community.”

We did that, dividing the overall MiB/sec number by the number of GPU accelerators, and produced this chart:

Huawei thinks that this normalization approach is inappropriate. Jiani Liang, responsible for Huawei’s Branding and Marketing Execution, told us: “You divided the overall MiB/sec number by the number of GPU accelerators for comparing. I don’t think that works at the current benchmark rule definition.

“Per GPU bandwidth is a good metric for AI people to understand how fast the storage can support the GPU training, but only if the same GPU cluster scale is specified, since different GPU numbers means different I/O pressure on storage, thus will affect the bandwidth provided to each GPU. Small GPU clusters lead to small I/O pressure to storage, and leads to a slightly higher per-GPU bandwidth. This trend can also be observed in the graph of your article.

“For example, with the same F8000X from YanRong, in a 12-GPU cluster, the average bandwidth per GPU is 2,783 MiB/sec, but in a 36-GPU cluster, the value is 2,711 MiB/sec. On the other hand, the greater the number of GPUs, the greater the overhead of synchronizing between GPUs.

“We also tested the sync time under different host numbers with the same GPU numbers per host using the benchmark. As you can see from the following chart, as the number of hosts increases and the number of GPUs increases, the proportion of synchronization overhead in the overall time increases, resulting in lower bandwidth per GPU. These two factors will affect the per GPU bandwidth even using the same storage system, resulting in loss of comparability.

“Since currently the benchmark does not specify the total GPU numbers and per server GPU numbers, this metric was incorrectly normalized without the same GPU cluster scale.”

Referring to the MLPerf Storage v1.0 benchmark rules: “the benchmark performance metric is samples per second, subject to a minimum accelerator utilization (AU) defined for that workload. Higher samples per second is better.”

Jiani Liang said: “So the challenge is what is the highest throughput one storage system can provide. I agree with you that we need to normalize the result in some way, since the scale of submitted storage systems are different. Normalizing by the number of storage nodes, or storage rack unit may be better for comparing.”

Comment

A sample, in MLPerf Storage terms, is the unit of data on which training is run – for example, an image or a sentence. A benchmark storage scaling unit is defined as the minimum unit by which the performance and scale of a storage system can be increased. Examples of storage scaling units are nodes, controllers, virtual machines, or shelves. Benchmark runs with different numbers of storage scaling units allow a reviewer to evaluate how well a given storage solution is able to scale as more scaling units are added.

We note that the MLPerf Storage benchmark results table presents vendor system scores in MiB/sec terms per workload type, and not samples/sec, and have asked the organization how samples/sec become MiB/sec. When we hear back, we’ll add in the information.

Crusoe scales up its AI cloud with VAST Data systems

By

Antony Savvas

-

September 30, 2024

“Clean” computing infrastructure provider Crusoe Energy Systems is collaborating with VAST Data to offer Crusoe Cloud customers its new Shared Disks technology, which is a high-performance storage product for AI workloads.

This collaboration will deliver a petabyte scale file system capable of reads of up to 200 MBps per TiB per node.

According to Moody’s Ratings, datacenter electricity consumption is forecast to grow by 23 percent annually between 2023 and 2028, with AI-specific energy usage expected to grow by 43 percent annually over the same period.

In 2022, Crusoe, a solutions provider for the energy industry, launched Crusoe Cloud, a compute infrastructure platform for AI training, inference and HPC workloads. It is powered by 100 percent “clean, stranded or renewable energy”, achieved by co-locating datacenters with sources of clean energy, to lower the cost and mitigate the environmental impact of computing, says the firm. “Stranded” energy is methane being flared or excess production from clean and renewable sources.

Crusoe, which has dual headquarters in Denver and San Francisco and operates in seven countries, currently has around 200MW of total datacenter power capacity at its disposal, some owned by itself, and some at shared datacenter sites. It has plans to “rapidly expand” this capacity.

With the VAST Data Platform/VAST DataStore, Crusoe customers will have access to a NFS solution built for AI workloads at scale, said the partners. “For multi-GPU workloads like AI training, customers will be able to use Shared Disks to ensure they all have performant access to shared datasets,” they said.

“Crusoe chose VAST because of its exceptional ability to deliver the reliable file storage that our customers need without any depletion of performance as AI models are scaled,” said Patrick McGregor, Crusoe chief product officer. “With client data housed full-time through VAST’s platform, customers will be empowered to build and innovate entirely through Crusoe Cloud, as we advance our mission to align the future of computing with the future of the climate.”

“Powered by the VAST Data Platform, Crusoe’s Shared Disks offering delivers the modern AI cloud infrastructure today’s enterprises need to address the challenges of scaling data-intensive AI workloads,” added Chris Morgan, vice president, solutions at VAST Data.”

With Shared Disks, customers will have the ability to create, resize, mount, unmount and delete shared disks using the Crusoe Cloud API, CLI, UI, or by using a Terraform infrastructure-as-code software tool.

Shared Disks can deliver up to 200 MBps of read throughput and 40 MBps of write throughput per TiB of storage provisioned. The Shared Disks are made available to a single project in an organization, with encryption at rest to deliver secure services to all customers at scale.

Through granular quality-of-service policies that prevent multi-tenant I/O contention, the disks deliver the performance and data access customers need for AI workloads from a single cluster, without being impacted by other tenants, we are told.

Veeam integrates with Palo Alto Networks on attack response

By

Antony Savvas

-

September 30, 2024

Veeam Software is integrating its data protection reporting with cybersecurity software vendor Palo Alto Networks, to enable customers to respond quicker to attacks. New Veeam apps are being integrated with Palo Alto’s Cortex XSIAM and Cortex XSOAR systems.

Veeam is said to be the first Palo Alto partner to independently design and develop a data collector, dashboards, and reports for Cortex XSIAM.

“This powerful integration enables our customers to better protect their backups and respond to cyberattacks faster, tightening their security posture and helping to ensure reliable, rapid and trusted recovery,” said Dave Russell, SVP of strategy at Veeam.

The partners said traditional tools struggle to scale for large enterprises, resulting in a high volume of alerts and overwhelming manual processes for security teams. The integrated technology centralizes, scales, and automates data monitoring and incident response. Palo Alto’s AI-driven security operations center (SOC) platform now works with Veeam’s recovery capabilities, so organizations can identify and respond to cyberattacks faster.

“We are collaborating with Veeam to respond and react more quickly to threats targeting organizations’ critical data,” said Pamela Cyr, VP of technical partnerships at Palo Alto Networks.

The Veeam apps leverage a bi-directional API connection to monitor, detect, and respond to security incidents impacting critical business data and data backups. The Veeam app integrated with Cortex XSIAM brings data from Veeam Backup & Replication and VeeamONE environments into Cortex XSIAM, providing a centralized view of data and backup security-related activity.

The Veeam app working with Cortex XSOAR enables regular API queries against Veeam Backup & Replication and VeeamONE, monitoring for significant security events or alerts. Both applications are included at no charge to Veeam Data Platform Advanced and Premium customers.

The two firms said the integration will help ensure “efficient and effective” incident management while meeting recovery time objectives (RTO), recovery point objectives (RPO), and supporting industry compliance regulations with automated ransomware recovery.

The Veeam app integrated with Cortex XSOAR is available now for download in the Cortex Marketplace. The app integrated with Cortex XSIAM will be “available soon”, said Veeam.

Red Canary, a managed detection and response (MDR) provider, has also just integrated its technology with Cortex XSIAM, to offer fully managed SOC services through Palo Alto.

Earlier this week, Veeam acquired SaaS backup firm Alcion.

Micron reveals MLPerf Storage benchmark results for SSDs

By

Chris Mellor

-

September 28, 2024

SSD supplier Micron announced MLPerf v1.0 Storage Benchmark results for its 7.68 TB 9550 NVMe SSD, saying it offers the performance required to support a large number of accelerators and AI workloads.

The first set of MLPerf Storage v1.0 benchmark results, testing storage product and system throughput for accelerators (GPUs) in AI training runs, was published a few days ago. Micron said it couldn’t make its test results public then as it was in the quiet period leading up to its SEC quarterly results announcement.

Micron says the 9550 can sustain up to:

58x H100 accelerators training ResNet50 at 10.2 GBps
13x H100 accelerators training CosmoFlow at 7.2 GBps
4x H100 accelerators training 3D-Unet at 9.9 GBps

 or

115x A100 accelerators training ResNet50 at 10.5 GBps
20x A100 accelerators training CosmoFlow at 7.1 GBps
8x A100 accelerators training 3D-Unet at 9.3 GBps

Micron also published MLPerf v1.0 Storage results for its 30.72 TB 6500 ION SSD in support, it said, of AI use cases requiring high storage capacity. It can sustain up to:

72 A100 accelerators training ResNet50 at 3.6 GBps
15 A100 accelerators training CosmoFlow at 5.3 GBps
3 A100 accelerators training 3D-Unet at 4.47 GBps

or:

37 H100 accelerators training ResNet50 at 6.66 GBps
9 H100 accelerators training CosmoFlow at 4.98 GBps
1 H100 accelerator training 3D-Unet at 2.9 GBps

As a way of trying to compare Micron’s SSD results with other submitted results, we charted the overall 3D-Unet workload scores running with H100 GPUs with these Micron SSDs included:

As you can see, the Micron SSDs barely register on the chart as their raw MiB/s number is so low. Normalizing these scores per GPU, dividing the MiB/s rating by the number of accelerators (GPUs), we get the following much more readable chart:

The 6500 has the highest result at 2,914 MiB/s, with the 9400 in fourth place with 2,856.5 MiB/s. The 9550 lags the rest with 2,486 MiB/s.

Ideally, we would like to produce similar charts for the 3D-Unet workload on A100 GPUs, and the CosmoFlow and ResNet50 workloads running with A100 and, separately, H100 GPUs.

The MLPerf Storage benchmark permits both single drive and multiple drive system submissions, although each type of submission can hold vastly different dataset sizes. Our understanding is that single drives could be used in real-world inferencing workloads, but not training runs, as the datasets would generally be too small. We note that MLPerf Storage tests training performance, not inferencing workloads.

See Micron’s video blog about its MLPerf Storage SSD results here and check out a technical brief here.

Analysis and comment

The MLPerf Storage spreadsheet table is quite difficult to read. It seems to be ordered by the public ID column, which seems counter-intuitive as people will surely want to search for vendors. Secondly, the vendors are not organized alphabetically; Micron appears in two separate entry groups, for example. That makes it harder to find all the Micron results.

Then there are three workloads and two accelerator types, which means you have to separate out the workload and the accelerator type to be able to compare vendors. Some systems are available while others are in preview. Some results are comparable with others – so-called Closed – while others are not – so-called Open. A person inspecting the MLPerf Storage results table has to understand which supplier is good at what workload with what system, which accelerator, and whether the system test is Closed or Open, and then whether it is available or in preview. This is possibly the most complicated and hard-to-understand benchmark this writer has encountered.

Workload vs overall performance

MLPerf Storage is a multi-workload benchmark with 3D-Unet, CosmoFlow, and ResNet50 training scenarios. The SPEC SFS 2020 benchmark is another multi-workload affair. It’s a file-serving benchmark with five different workloads: software builds, video streaming (VDA), electronic design automation, virtual desktop infrastructure, and database. Each category test results in a numerical score and an overall response time (ORT). There is no system price measure and so no price-performance rating.

A third multi-workload benchmark is the Storage Performance Council’s SPC-2 benchmark, which measures overall storage array performance in throughput (MBPS) and (discounted) price-performance terms. These numbers are calculated from three component workloads – large file processing, large database query, and video-on-demand – and the test results present an overall throughput score, SPC-2 MBPS, and a price-performance value based on the test system price divided by the MBPS rating. Vendor and system comparisons are relatively simple to make.

Unlike SPC-2, MLPerf Storage has no overall measure of performance across its three workloads and no price-performance measure either, making supplier and system comparisons more difficult.

Pure CEO on single storage environments, hyperscalers buying flash, and PII protection

By

Chris Mellor

-

September 27, 2024

Pure Storage CEO Charlies Giancarlo believes that customers need a single- and multi-protocol storage environment to supply and store data for modern needs, and not restricted block or file-specific platforms or many different and siloed products that are difficult to operate and manage.

He was present at a Pure Accelerate event in London, UK, and his views on this – and on the hyperscaler flash-buying opportunity – came apparent during an interview. We started by asking him what, supposing he was meeting a customer who had been having a conversation with Vast Data, he would suggest they think about that company.

Charles Giancarlo: “Vast is certainly an interesting company. They focused on a non-traditional use case when they first got started, which was large-scale data that required fast processing for a short period of time. And they’ve made a lot of themselves over over the intervening year or two. They have a very different strategy than Pure. Their strategy is, basically, they created a system, and they’re attempting to make that one system … as the solution for everything.

“Our strategy rather, is that we’ve created one operating environment for block, file and object, across two different hardware architectures – scale up and scale out – but the main focus being to really virtualize storage across the the enterprise, such that the enterprise can create a cloud of data rather than individual arrays.”

He went on to say that: “Enterprise storage never made the transition that was made for personal storage – or for, let’s say, networks or compute – where you were able to virtualize the environment.”

Giancarlo suggested that enterprise on-premises developers moved to the cloud “because they could set up a customized infrastructure in the period of about an hour, through GUIs and through APIs. Your developers are not able to do that with enterprise. Why is that? Well, you have IP networks, you have Ethernet networks, just like the cloud does. You have your own virtualization, whether that was VMware or something else. So you have that. But it’s your storage that’s not been virtualized, and so your developers don’t have the ability to just set up new compute and open up new storage or access data that already exists with just a few clicks and through APIs.

“With Pure Storage now, a customer can manage their entire data environment as a single cloud of storage where they can set up the policies and the processes whereby their data is managed, and have that managed automatically in an orchestrated way – but furthermore, be able to share that storage among all of their different application environments.”

Blocks & Files: And they can only do that in the cloud? Because Azure, AWS and Google have taken the time and trouble to put an abstraction software layer in place to enable them to do that. So in theory, you can take that same concept and bring it on premises?

Charles Giancarlo: “Exactly. You finish my story for me, because you already have it in place for compute and network, and now we complete the circle, if you will, with the storage side, with APIs. And what’s behind our capability is Kubernetes. We’re already put you on your Kubernetes journey with this, and now you can create that virtual storage.

“[The software] is really now an orchestration layer. All of the APIs already exist. Your employees already understand how to use either VMware [and are] starting to become familiar with Kubernetes and containers.

“What’s really interesting about this is that enterprises, of course, have requirements that they have to fulfil – whether they’re regulatory or compliance or even to just to fulfil the needs of the enterprise itself – which may at times be different from those of the developer.

“What I mean by that is the developer may not be thinking about resiliency, or how fast to come back from a failure, or when things should be backed up, right? So the organization now can set up policies for their data. They can set up different storage classes and then make those storage classes available by API to their developers, so the developers can choose a storage class that’s already been predefined by the organization to fit the organization’s needs. So it’s really a beautiful construct to allow an enterprise to operate more like a cloud.”

Blocks & Files: Pure has put Cloud Block store in place, which crudely speaking could be looked as Flash Array in the cloud. Is FlashBlade in the cloud coming?

Charles Giancarlo: “Flash blade in the cloud is coming. Think of it this way. Both of them are [powered by the ] Purity OS. Cloud Block Store looks like FlashArray in the cloud, but more importantly, it looks like block in the cloud. So what you’re asking really about is, what about file in the cloud? Because once up in the cloud, it is scale-out by definition, and FlashBlade is nothing but a scale-out version of FlashArray, right? So, the next step we want to take is file in the cloud, which we are working on. But we still want to see the block environment grow. Object in the cloud we’re uncertain about right now.”

Blocks & Files: There’s a very big S3 elephant in that forest.

Charles Giancarlo: “We’ve talked very, very long and hard about about block in the cloud to where we could provide a superior service at lower cost to the customer. With file in the cloud, we believe we’ll be able to do the same thing, albeit it’s taken us longer to get to understand what that would look like. Object, of course, was invented in the cloud to begin with, pretty much – not exactly, but for all intents and purposes, it was. And we have to find that if we can’t improve upon that, then it wouldn’t make sense. Certainly, what we want is make it fully compatible with S3 and and Azure.”

Blocks & Files: You don’t want simply to have say, a cloud object store, which is simply an S3 gateway.

Charles Giancarlo: “Of course. If we’re not adding value, we’re not going to do it.”

Blocks & Files: Do you think that Pure will continue its current growth rate, or that it will settle down to some extent, or even possibly accelerate?

Charlie Giancarlo: “I think we have the opportunity to accelerate for several reasons. One is – I’m already on record, so I’m not going to change it now – that we should win our first hyperscaler this year.”

Blocks & Files: It will be a revolutionary event, if you do that, because they’ll be buying from you. At the moment, they buy raw disk drives from Seagate, etc., but sure as heck, don’t buy disk arrays from anybody.

Charlie Giancarlo: “That’s right. And not as much, but they also buy raw SSDs. Roughly 90 percent of the top five hyperscalers, and maybe 70 to 80 percent of the top 10 hyperscalers, are hard disk-based.

“And there’s no question that, at some point in time, in my opinion, it will be all flash. Whether it’s us or SSD. That being said, because of our direct flash technology, we’re at the forefront of this, and we think we bring other value. We’re fairly certain we bring other value to the hyperscaler beyond just the fact that it’s flash rather than disc.

“We’re trying to be very careful about when we’re talking about when we refer to a hyperscaler design win and a hyperscaler buying our infrastructure. We’re separating that, for example, from AI, even if it’s a hyperscaler buying for AI purposes. The reason is, when we’re talking about the hyperscaler, we’re talking about them buying it for their core customer-facing storage infrastructure.

“In most of the hyperscalers, they have three or four unique layers of storage. Just three or four, you know, some with the lowest possible cost. And I’m talking about online now, not tape, from the lowest price-performance right to their highest price performance. And every one of their services – of which there could be hundreds or thousands – use one of the three or four infrastructures that they have in place.

“What we’re talking about at first, is just replacing, let’s call it the nearline disk environment. But frankly, what we found is, as we get further along in our conversations, they say, ‘Well, if we’re going to use your technology for that, we might as well use it for all of the layers’ – because we’re not performance-limited, because we’re flash. And so, if you make sense at the lowest price layer, well you also make sense at higher performance layers.”

Blocks & Files: Seagate has been having enormous problems getting its HAMR disk drives qualified by hyperscalers.

Charlie Giancarlo: “There are two problems with disk that that Seagate won’t admit. And really I’m not trying to be competitive with disk. But the first is, the I/O doesn’t get any better. You can double the density, and the I/O doesn’t get any better. So eventually it just gets so big that you can’t really use all the capacity that’s in the system.

“And the second is the power space and cooling doesn’t get doesn’t get any better. And so between those two things we’ll be at that point in flash. We’re not there yet, but we’ll be at the point where they can give away the discs, but the infrastructure costs will be more than the full system cost.

“The flash chip that’s not being accessed uses practically no power, almost no power. And of course, the chips themselves are getting denser. So it’s not as if you’re using twice as many chips. You’re using the same number and with twice the amount of capacity on each chip. So the thing that uses most of the power out of the DFM [Direct Flash Module] is our microcontroller that that runs the firmware that we download into it. Think of it as a co-processor. We don’t need more than one per DFM. We don’t really have any RAM … a SSD has a lot of DRAM on it that use a lot of power, so we’re lower power than SSDs as well.”

Blocks & Files: Could you envisage a day when a Pure array controller includes GPUs?

Charlie Giancarlo: “I don’t see the purpose of a GPU for the for the sole purpose of of accessing storage and for delivering storage. It wouldn’t accelerate, for example, the recovery of storage. So there’s several questions that would come from that. One is, could the GPU be used for other things – such as maybe some AI enhancement to the way that data is masked or interpreted before being written. The second thought that comes to my mind would be, would a customer want to place an AI workload of one type or another on the same controller?

“I think I would tend to think not on the second one, because it’s too constraining to the enterprise. Because there’s always going to be some constraint, whatever that is – scale, speed, whatever. I think most enterprises would want to separate out their choice of compute – a GPU compute platform – from the storage. Keep application compute and storage largely separate.

“We have been thinking – and I have nothing to to announce right now – but we have been thinking that there might be reasons why customers may want to do some AI work on data being written. In order to do things such as auto-masking, to be able to separate out some types of data from other types of data into different buckets that then could be handled in the background, differently than other buckets, to maybe vectorize the data, you could think of it as vectorization.

“Customers have told me this directly. They don’t know which of their files contain PII (Personally Identifiable Information). They don’t know this until after it’s been stolen and they’ve been ransomed.

“So imagine now that you had an engine that could somehow, auto-magically, start to do some amount of separation of the type of data that gets written, such that your PII data is held in a more secure environment than the non-PII data. If your non-PII data is stolen, OK, well, you don’t like it being stolen – but it’s not quite as damaging [as PII data being stolen]. So there are ideas such as that where the answer is maybe, possibly, more to see.”

MLPerf Storage V1.0 results show critical role of next gen storage in AI model training

By

Guy Matthews

-

September 26, 2024

SPONSORED FEATURE: MLCommons just released the result of MLPerf Storage Benchmark V1.0 which suggests that Huawei’s OceanStor A800 all flash array beats its competition by offering almost double the total throughput of its nearest rival.

The benchmark contains three workloads of 3D-Unet, resnet50, and cosmoflow. Compared with V0.5, V1.0 removed Bert workload, added resnet50 and cosmoflow, when NVIDIA H100 and A100 were also added to accelerator types.

Huawei participated in the 3D-Unet workload test using an 8U dual-node OceanStor A800 and it successfully supported the data throughput requirement of 255 simulated NVIDIA H100s for training, by providing a stable bandwidth of 679 GB/s and maintaining over 90 percent accelerator utilization.

The objective of MLPerf Storage Benchmark is to test the maximum number of accelerators supported by the storage system and the maximum bandwidth that the storage system can provide while ensuring optimal accelerator utilization (AU).

Workload	Bandwidth requirement for each accelerator
Workload	H100	A100
3D-Unet	2727MB/s	1385MB/s
Resnet50	176MB/s	90MB/s
cosmoflow	539MB/s	343MB/s

_{Source: MLCommons}

The data above indicates that to obtain high benchmark bandwidth, more accelerators need to be simulated. 3D-Unet H100 has the highest bandwidth requirement for storage among the workloads. This means that if the same number of accelerators are simulated, 3D-Unet H100 can exert the greatest access pressure on storage.

_{Source: Huawei}

It’s important to note that the accelerator numbers and the bandwidth of each computing node do not directly reflect storage performance. Rather, they indicate the server performance of the computing nodes. Only the total number of accelerators (simulated GPUs) and the overall bandwidth can accurately represent the storage system’s capabilities.

“The number of host nodes is not particularly useful for normalization,” said an MLCommons spokesperson. “The scale of a given submission is indicated by the number and type of emulated accelerators – ie ten emulated H100s is 10x the work of one emulated H100 from a storage standpoint”.

You can read more about how the MLPerf Storage v1.0 Benchmark results are compiled and presented here.

_{Source: Huawei}

This result indicates that the OceanStor A800 is ahead of the curve in one important aspect: its total throughput registered 1.92x that of the second-place player, while the throughput per node and per rack unit were 2.88x and 1.44x that of the runner-up respectively (the full MLPerf Storage Benchmark Suite Results are available here).

Additionally, different from traditional storage performance test tools, the MLPerf Storage Benchmark also has strict requirements on latency. For a high-bandwidth storage system, when the quantity of accelerators is increased to provide higher access pressure to the storage system, a stable low latency is a must to prevent AU reduction and to achieve expected bandwidth. In the V1.0 test results, OceanStor A800 also appears capable of providing stable and low latency for the training system even when the bandwidth is high, which can help to maintain high accelerator utilization.

_{Source: Huawei}

GenAI advancing with storage development

In a global survey of AI usage conducted by independent analyst firm McKinsey, 65 percent of respondents revealed that they are now regularly using generative AI (GenAI), nearly double the number recorded by a previous McKinsey survey 10 months earlier.

While regular AI is designed to work with existing datasets, GenAI algorithms focus on the creation of new content that closely resembles authentic information. This ability is creating a range of possibilities across numerous verticals.

From software, finance to fashion, autonomous vehicles, most of varied GenAI use cases depend on the use of large language models (LLMs) to create the right kind of applications and workloads. When GenAI and LLMs work cooperatively with each other, it is also putting a strain on underlying storage architectures – a slow update of the data fed into large AI models could lead to poor results, including so-called AI hallucinations where a large AI model can start to fabricate inaccurate answers.

Most technology companies are busy striving to resolve the challenges with storage products and solutions. The V1.0 test result indicates that the OceanStor A800 can provide data services for AI training and the maximization of GPU/NPU computing utilization, whilst also supporting cluster networking and providing high-performance data services for large-scale training clusters.

Huawei launched the OceanStor A800 High-Performance AI Storage in 2023 specifically to boost the performance of large model training and help organizations accelerate the rollout of applications based on those large AI models. During the recent HUAWEI CONNECT 2024 event, Dr. Peter Zhou – Vice President of Huawei and President of Huawei Data Storage Product Line – said that this new long-term memory storage system can significantly boost large AI model training and inference capabilities, and help various industries step into what he called the “digital-intelligent era”.

Sponsored by Huawei.

Samsung pitches latest SSD at automotive AI … which is now properly a Thing

By

Antony Savvas

-

September 26, 2024

With AI now widely deployed across the automotive sector, Samsung Electronics says it has developed the first PCIe 4.0 automotive SSD based on eighth-generation (236-layer) vertical NAND (V-NAND).

The new auto SSD, AM9C1, delivers on-device AI capabilities in automotive applications.

With about 50 percent improved power efficiency compared to its predecessor, the AM991, the new 256GB auto SSD will deliver sequential read and write speeds of up to 4,400 megabytes per second (MBps) and 400 MBps, respectively, says the vendor.

“We are collaborating with global autonomous vehicle makers and providing high-performance, high-capacity automotive products,” said Hyunduk Cho, vice president and head of the automotive group at Samsung Electronics’ memory business. “We will continue to lead the physical AI memory market. that encompasses applications from autonomous driving to robotics technologies.”

Built on Samsung’s 5-nanometer (nm) controller and providing a single-level cell (SLC) namespace feature, the AM9C1 is currently being sampled by “key partners”, said Samsung, and is expected to begin mass production by the end of this year.

Samsung plans to offer multiple storage capacities for the SSD, ranging from 128 GB to 2 TB, to address the growing demand for higher-capacity automotive SSDs. The 2 TB model, which is set to offer the industry’s largest capacity in this product category, is scheduled to start mass production “early next year,” the supplier said.

The SSD is deemed to satisfy the automotive semiconductor quality standard AEC-Q1003 Grade 2, ensuring stable performance over a wide temperature range of -40°C to 105°C.

Samsung received ASPICE CL3 authentication for its 5G mobile and automotive UFS 3.1 flash storage product in March this year. CL3 is the highest standard for software and firmware in the automotive industry, with Western Digital also achieving it this year.

Earlier this month, Samsung said it was targeting the general AI market through the first mass production of its 1 TB quad-level cell QLC 9th-generation V-NAND.

Nvidia and Hive team up to tackle problem of rogue AI content

By

Antony Savvas

-

September 26, 2024

Hive, a provider of proprietary AI solutions to understand, search, and generate content, is integrating its AI models with Nvidia NIM microservices in private clouds and on-premises datacenters.

Nvidia NIM, part of the Nvidia AI Enterprise software platform, provides models as optimized containers, and is designed to simplify and accelerate the deployment of custom and pre-trained AI models across clouds, datacenters and workstations.

“Our cloud-based APIs process billions of customer requests every month. However, the ability to deploy our models in private clouds or on premises has emerged as a top request from prospective customers in cases where data governance or other factors challenge the use of cloud-based APIs,” said Kevin Guo, co-founder and CEO of Hive. “Our integration with Nvidia NIM allows us to meaningfully expand the breadth of customers we can serve.”

Existing Hive customers include the likes of Reddit, Netflix, Walmart, Zynga, and Glassdoor.

The first Hive models to be made available with Nvidia NIM are AI-generated content detection models, which allow customers to identify AI-generated images, video, and audio. The continued emergence of generative AI tools comes with a risk of misrepresentation, misinformation, and fraud, presenting challenges to the likes of insurance companies, financial services firms, news organizations, and others, says Hive.

“AI-generated content detection is emerging as an important tool for helping insurance and financial services companies detect attempts at misrepresentation,” said Justin Boitano, vice president of enterprise AI software products at Nvidia. “With NIM microservices, enterprises can quickly deploy Hive’s detection models to help protect their businesses against fraudulent content, documents and claims.”

Hive is also offering internet social platforms a no-cost, 90-day trial for its technology.

“The newfound ease of creating content with generative AI tools can come with risks to a broad set of companies and organizations, and platforms featuring user-generated content face unique challenges in managing AI-generated content at scale,” said Guo. “We are offering a solution to help manage the risks.”

Hive plans to make additional models available through Nvidia NIM “in the coming months”, including content moderation, logo detection, optical character recognition, speech transcription, and custom models through Hive’s AutoML platform.

Micron revenues surge 93% driven by AI demand

By

Chris Mellor

-

September 26, 2024

AI-driven server memory, particularly GPU high-bandwidth memory (HBM), and SSD demand sent Micron revenues in its final FY 2024 quarter to $7.75 billion, 93 percent higher year-on-year.

It made a net profit of $887 million in the quarter ended August 29, contrasting with the $1.4 billion loss a year ago. Full FY 2024 revenue was $25.1 billion, 62 percent higher year-over-year, with a $778 million net profit, versus FY 2023’s $5.83 billion loss.

Sanjay Mehrotra, Micron — Sanjay Mehrotra

President and CEO Sanjay Mehrotra stated: ”Micron delivered a strong finish to fiscal year 2024, with fiscal Q4 revenue at the high end of our guidance range and gross margins and earnings per share (EPS) above the high end of our guidance ranges. In fiscal Q4, we achieved record-high revenues in NAND and in our storage business unit. Micron’s fiscal 2024 revenue grew over 60 percent; we expanded company gross margins by over 30 percentage points and achieved revenue records in datacenter and in automotive.”

He added: “Our NAND revenue record was led by datacenter SSD sales, which exceeded $1 billion in quarterly revenue for the first time. We are entering fiscal 2025 with the best competitive positioning in Micron’s history. We forecast record revenue in fiscal Q1 and a substantial revenue growth with significantly improved profitability in fiscal 2025.”

Financial summary

Gross margin: 36.5 percent vs -9 percent a year ago
Free cash flow: $323 million vs -$758 million last year
Cash, marketable investments, and restricted cash: $9.2 billion vs $10.5 billion a year ago
Diluted EPS: $1.18 vs -$1.07 a year ago.

Micron revenues — FY 2024 has seen four straight quarters of increasing growth rates and Micron is set for revenue records in FY 2025

Micron makes two products for SSDs, DRAM and NAND, with DRAM revenues in the quarter rising 93 percent year-over-year to $5.33 billion and NAND up 96.3 percent to $2.4 billion. These products are sold into four markets, where the rosy revenue picture is:

Compute and networking: $3 billion, up 152 percent year-over-year
Mobile: $1.9 billion, up 55 percent
Storage: $1.7 billion, up 127 percent
Embedded: $1.2 billion, up 36 percent

The compute and networking business unit is growing fastest, followed by storage. The key demand driver is generative AI. Micron said multiple vectors will drive AI memory demand over the coming years: Growing model sizes and input token requirements, multi-modality, multi-agent solutions, continuous training, and the proliferation of inference workloads from the cloud to the edge. It sees no sign of any AI bubble with customers turning away from the tech.

In end-market terms, the strongest DRAM sector is HBM, needed for GPUs, and Micron expects the total addressable HBM market “to grow from approximately $4 billion in calendar 2023 to over $25 billion in calendar 2025. As a percent of overall industry DRAM bits, we expect HBM to grow from 1.5 percent in calendar 2023 to around 6 percent in calendar 2025.”

Mehrotra said: “We have a robust roadmap for HBM and are confident we will maintain our time-to-market, technology and power efficiency leadership with HBM4 and HBM4E.” In the earnings call, he commented: “We look forward to delivering multiple billions of dollars in revenue from HBM in fiscal year ’25.”

Micron is also seeing “a recovery in traditional compute and storage” and has “gained substantial share in datacenter SSDs” where it “achieved a quarterly revenue record with over a billion dollars in revenue in datacenter SSDs in fiscal Q4, and our fiscal 2024 datacenter SSD revenues more than tripled from a year ago.”

The company expects that “PC unit volumes remain on track to grow in the low single-digit range for calendar 2024. We expect unit growth to continue in 2025 and accelerate into the second half of calendar 2025, as the PC replacement cycle gathers momentum with the rollout of next-gen AI PCs, end of support for Windows 10 and the launch of Windows 12.”

Smartphones are being affected by AI as well, with Micron saying: “Recently, leading Android smartphone OEMs have announced AI-enabled smartphones with 12 to 16 GB of DRAM, versus an average of 8 GB in flagship phones last year … Smartphone unit volumes in calendar 2024 are on track to grow in the low-to-mid single-digit percentage range, and we expect unit growth to continue in 2025.”

Micron achieved a fiscal year record for automotive revenue in 2024 where infotainment and ADAS are driving long-term memory and storage content growth. Its automotive demand is being constrained as the industry adjusts the mix of EV, hybrid, and traditional vehicles to meet changing customer demand. It expects “a resumption in our automotive growth in the second half of fiscal 2025.”

Western Digital reckons that the 3D NAND layer count race is over as each layer count addition adds a diminishing return. This will lengthen the period between layer count transitions. Micron agrees with this view, saying: “NAND technology transitions generally provide more growth in annualized bits per wafer compared to the NAND bit demand CAGR expectation of high-teens … We anticipate longer periods between industry technology transitions and moderating capital investment over time to align industry supply with demand.”

Micron invested $8.1 billion in capex in FY 2024 and expects to increase this by something like 35 percent in fiscal 2025, driven mostly by greenfield fab construction and HBM manufacturing facilities. Its investments in facilities and construction in Idaho and New York will support its long-term demand outlook for DRAM and will not contribute to bit supply in fiscal 2025 and 2026.

Outlook

Mehrotra said: “We are entering fiscal 2025 with the strongest competitive positioning in Micron’s history … We look forward to delivering a substantial revenue record with significantly improved profitability in fiscal 2025, beginning with our guidance for record quarterly revenue in fiscal Q1.”

HBM will contribute to this: “We expect to ramp our HBM3E 12-high output in early calendar 2025 and increase the 12-high mix in our shipments throughout 2025 … Our HBM is sold out for calendar 2024 and 2025, with pricing already determined for this time frame.”

The revenue outlook for the next quarter (Q1 FY 2025) is $8.7 billion +/- $200 million, an 84 percent increase at the midpoint on the year-ago number. The full-year outlook was not given.

Generative AI training and inference is set to boost Micron’s revenues. Let Mehrotra have the final words: “With the advent of AI, we are in the most exciting period that I have seen for memory and storage in my career.”

MLPerf AI benchmark tests how storage systems keep GPUs busy

By

Chris Mellor

-

September 26, 2024

The MLPerf Storage benchmark combines three workloads and two types of GPU to present a six-way view of storage systems’ ability to keep GPUs busy with machine learning work.

This benchmark is a production of MLCommons – a non-profit AI engineering consortium of more than 125 collaborating vendors and organizations. It produces seven MLPerf benchmarks, one of which is focused on storage and “measures how fast storage systems can supply training data when a model is being trained.”

MLCommons states: “High-performance AI training now requires storage systems that are both large-scale and high-speed, lest access to stored data becomes the bottleneck in the entire system. With the v1.0 release of MLPerf Storage benchmark results, it is clear that storage system providers are innovating to meet that challenge.”

Oana Balmau, MLPerf Storage working group co-chair, stated: “The MLPerf Storage v1.0 results demonstrate a renewal in storage technology design. At the moment, there doesn’t appear to be a consensus ‘best of breed’ technical architecture for storage in ML systems: the submissions we received for the v1.0 benchmark took a wide range of unique and creative approaches to providing high-speed, high-scale storage.”

The MLPerf Storage v1.0 Benchmark results provide, in theory, a way of comparing different vendors’ ability to feed machine learning data to GPUs and keep them over 90 percent busy.

However, the results are presented in a single spreadsheet file with two table sets. This makes comparisons between vendors – and also between different results within a vendor’s test group – quite difficult. To begin with, there are three separately tested workloads – 3D Unet, Cosmoflow, and ResNet50 – each with MiB/sec scores, meaning that effectively there are three benchmarks, not one.

The 3D UNet test looks at medical image segmentation using “synthetically generated populations of files where the distribution of the size of the files matches the distribution in the real dataset.” Cosmoflow is a scientific AI dataset using synthetic cosmology data, while ResNet50 is an image classification workload using synthetic data from ImageNet. All three workloads are intended to “maximize MBit/sec and number of accelerators with >90 percent accelerator utilization.”

These three workloads offer a variety of sample sizes, ranging from hundreds of megabytes to hundreds of kilobytes, as well as wide-ranging simulated “think times” – from a few milliseconds to a few hundred milliseconds. They can be run with emulated Nvidia A100 or H100 accelerators (GPUs), meaning there are actually six separate benchmarks.

We asked MLPerf about this and a spokesperson explained: “For a given workload, an emulated accelerator will place a specific demand on the storage that is a complex, non-linear function of the computational and memory characteristics of the accelerator. In the case here, an emulated H100 will place a greater demand on the storage than an emulated A100.”

There are two types of benchmark run division: Closed, which enables cross-vendor and cross system comparisons; and Open, which allows for interesting results intended to foster innovation. Open allows more flexibility to tune and change both the benchmark and the storage system configuration to show off new approaches or new features that will benefit the AI/ML community. But Open explicitly forfeits comparability to allow showcasing innovation. Some people might think having two divisions is distracting rather than helpful.

Overall there are seven individual benchmarks within the MLPerf Storage benchmark category, all present in a complicated spreadsheet that is quite hard to interpret. There are 13 submitting organizations: DDN, Hammerspace, HPE, Huawei, IEIT SYSTEMS, Juicedata, Lightbits Labs, MangoBoost, Nutanix, Simplyblock, Volumez, WEKA, and YanRong Tech, with over 100 results across the three workloads.

David Kanter, head of MLPerf at MLCommons, said: “We’re excited to see so many storage providers, both large and small, participate in the first-of-its-kind v1.0 Storage benchmark. It shows both that the industry is recognizing the need to keep innovating in storage technologies to keep pace with the rest of the AI technology stack, and also that the ability to measure the performance of those technologies is critical to the successful deployment of ML training systems.”

We note that Dell, IBM, NetApp, Pure Storage and VAST Data – all of whom have variously been certified by Nvidia for BasePOD or SuperPOD use – are not included in this list. Both Dell and IBM are MLCommons members. Benchmark run submissions from all these companies would be most interesting to see.

Hammerspace noted: “It is notable that no scale-out NAS vendor submitted results as part of the MLPerf Storage Benchmark. Well-known NAS vendors like Dell, NetApp, Qumulo, and VAST Data are absent. Why wouldn’t these companies submit results? Most likely it is because there are too many performance bottlenecks in the I/O paths of scale-out NAS architectures to perform well in these benchmarks.”

Comparing vendors

In order to compare storage vendors on the benchmarks, we need to separate out their individual MLPerf v1.0 benchmark workload type results using the same GPU on the closed run type – such as 3D Unet-H100-Closed. When we did this for each of the three workloads and two GPU types, we get wildly different results, even within a single vendor’s scores, making us concerned that we are not really comparing like with like.

For example, we separated out and charted a 3D Unet-H100-Closed result set to get this graph:

Huawei scores 695,480 MiB/sec while Juicedata scores 5,536 MiB/sec, HPE 5,549 MiB/sec, and Hammerspace 5,789 MiB/sec. Clearly, we need to somehow separate the Huawei and similar results from the others, or else normalize them in some way.

Huawei’s system is feeding data to 255 H100 GPUs while the other three are working with just two H100s – obviously a completely different scenario. The Huawei system has 51 host compute nodes and the other three have none specified (Juicedata) and one apiece for HPE and Hammerspace.

We asked MLPerf if we should normalize for host nodes in order to compare vendors such as Huawei, Juicedata, HPE, and Hammerspace. The spokesperson told us: “The number of host nodes is not particularly useful for normalization – our apologies for the confusion. The scale of a given submission is indicated by the number and type of emulated accelerators – ie ten emulated H100s is 10x the work of one emulated H100 from a storage standpoint. While MLCommons does not endorse a particular normalization scheme, normalizing by accelerators may be useful to the broader community.”

We did that, dividing the overall MiB/sec number by the number of GPU accelerators, and produced this chart:

We immediately see that Hammerspace is most performant – 2,895 MiB/sec (six storage servers) and 2,883 MiB/sec (22 storage servers) – on this MiB/sec per GPU rating in the 3D Unet workload closed division with H100 GPUs. Lightbits Labs is next with 2,814 MiB/sec, with Nutanix next at 2,774 MiB/sec (four nodes) and 2,803 MiB/sec (seven nodes). Nutanix also scores the lowest result – 2,630 MiB/sec (32 nodes) – suggesting its effectiveness decreases as the node count increases.

Hammerspace claimed it was the only vendor to achieve HPC-level performance using standard enterprise storage networking and interfaces. [Download Hammerspace’s MLPerf benchmark test spec here.]

Huawei’s total capacity is given as 457,764TB (362,723TB usable) with Juicedata having unlimited capacity (unlimited usable!), HPE 171,549.62TB (112,596.9TB usable), and Hammerspace 38,654TB (37,339TB usable). There seems to be no valid relationship between total or usable capacity and the benchmark score.

We asked MLPerf about this and were told: “The relationship between total or usable capacity and the benchmark score is somewhat submission-specific. Some submitters may have ways to independently scale capacity and the storage throughput, while others may not.”

Volumez

The Volumez Open division test used the 3D Unet workload with 411 x H100 GPUs, scoring 1,079,091 MiB/sec; the highest score of all on this 3D Unet H100 benchmark, beating Huawei’s 695,480 MiB/sec.

John Blumenthal, Volumez chief product officer, told us: “Our Open submission is essentially identical to the Closed submission, with two key differences. First, instead of using compressed NPZ files, we used NPY files. This approach reduces the use of the host memory bus, allowing us to run more GPUs per host, which helps lower costs. Second, the data loaded bypasses the Linux page cache, as it wasn’t designed for high-bandwidth storage workloads.”

Volumez submitted a second result, scoring 1,140,744 MiB/sec, with Blumenthal explaining: “In the second submission, we modified the use of barriers in the benchmark. We wanted to show that performing a barrier at the end of each epoch during large-scale training can prevent accurate measurement of storage system performance in such environments.”

YanRong Tech

YanRong Tech is a new vendor to us. A spokesperson, Qianru Yang, told us: “YanRong Tech is a China-based company focused on high-performance distributed file storage. Currently, we serve many leading AI model customers in China. Looking globally, we hope to connect with international peers and promote the advancement of high-performance storage technologies.”

We understand that the firm’s YRCloudFile is a high-performance, datacenter-level, distributed shared file system product built for software-defined environments, providing customers with a fast, highly scalable and resilient file system for their AI and high-performance workloads.

NetApp unveils all-flash SAN arrays, AI tech at INSIGHT 2024

By

Chris Mellor

-

September 25, 2024

NetApp announced all-flash SAN arrays, a generative AI vision, and AI-influenced updates across its product line at its NetApp INSIGHT 2024 event in Las Vegas.

It has begun the Nvidia certification process for its ONTAP AFF A90 storage array with Nvidia’s DGX SuperPOD AI infrastructure. This certification will complement and build upon NetApp ONTAP’s existing certification with the DGX BasePOD. NetApp’s E-Series is already SuperPOD-certified.

ONTAP now has a directly integrated AI data pipeline, allowing ONTAP to make unstructured data ready for AI automatically and iteratively by capturing incremental changes to the customer data set, performing policy-driven data classification and anonymization. It then generates highly compressible vector embeddings and stores them in a vector database integrated with the ONTAP data model, ready for high-scale, low-latency semantic searches and retrieval-augmented generation (RAG) inferencing.

NetApp separately announced today a new integration with Nvidia AI software that can leverage the global metadata namespace with ONTAP to power enterprise RAG for agentic AI. The namespace can unify data stores for the tens of thousands of ONTAP systems. The overall architecture brings together NetApp’s AIPod, ONTAP, the BlueXP unified control plane, and Nvidia’s NeMo Retriever and NIM microservices.

Harvinder Bhela, NetApp chief product officer, stated: “Combining the NetApp data management engine and Nvidia AI software empowers AI applications to securely access and leverage vast amounts of data, paving the way for intelligent, agentic AI that tackles complex business challenges and fuels innovation.”

NetApp customers will be able to discover, search, and curate data on-premises and in the public cloud, based on a set of criteria, honoring existing policy-based governance policies. Once the data collection has been established through BlueXP, it can be dynamically connected to NeMo Retriever, where the dataset will be processed and vectorized to be accessible for enterprise GenAI deployments with appropriate access controls and privacy guardrails.

This, NetApp claims, “creates the foundation for a generative AI flywheel to power next-generation agentic AI applications that can autonomously and securely tap into data to complete a broad range of tasks to support customer service, business operations, financial services and more.”

Other AI news

NetApp is working to provide an integrated and centralized data platform to ingest, discover, and catalog data across all its native cloud services. It is also integrating its cloud services with data warehouses and developing data processing services to visualize, prepare, and transform data. The prepared datasets can then be securely shared and used with the cloud providers’ AI and machine learning services, including third-party offerings.

Krish Vitaldevara, SVP Platform at NetApp, said: “NetApp empowers organizations to harness the full potential of GenAI to drive innovation and create value across diverse industry applications. By providing secure, scalable, and high-performance intelligent data infrastructure that integrates with other industry-leading platforms, NetApp helps customers overcome barriers to implementing GenAI.”

NetApp’s AIPod with Lenovo ThinkSystem servers for Nvidia OVX converged infrastructure system, designed for enterprises aiming to harness GenAI and RAG capabilities, is now generally available.

FlexPod AI, the converged system built with Cisco UCS compute, Cisco networking, and NetApp storage, now has new AI features. When running RAG, it simplifies, automates, and secures AI applications.

Additionally, F5 and NetApp announced an expanded collaboration to accelerate and streamline enterprise AI capabilities using secure multi-cloud networking solutions from F5 and NetApp’s suite of data management solutions. This collaboration leverages F5 Distributed Cloud Services to streamline the use of large language models (LLMs) across hybrid cloud environments. F5 said that by integrating F5’s secure multi-cloud networking with NetApp’s data management, enterprises can implement RAG solutions efficiently and securely, enhancing the performance, security, and utility of their AI systems.

ONTAP ransomware protection

NetApp is announcing the general availability of its NetApp ONTAP Autonomous Ransomware Protection with AI (ARP/AI) solution, with 99 percent accuracy for detecting ransomware threats. Customers can use ARP/AI to monitor abnormal workload activity and automatically snapshot data at the point in time of attack, so they can respond and recover faster from attacks. ARP/AI uses machine learning to identify threats, and NetApp will consistently release new models. Customers can non-disruptively update those models, independent of ONTAP updates, to defend against the latest ransomware variants.

The BlueXP control plane now integrates with Splunk SIEM to simplify and accelerate threat response by informing stakeholders across an organization’s security operations. BlueXP ransomware protection uses AI-driven data classification capabilities to ensure the most sensitive data is protected at the highest levels. BlueXP also has new User and Entity Behavior Analytics (UEBA) integrations to identify malicious activity in user behavior in addition to the ARP/AI-provided file system signals.

Gagan Gulati, VP and GM for Data Services at NetApp, stated: “Data storage systems are the last line of defense against a cybersecurity incident and NetApp takes that as a responsibility to provide the most secure storage on the planet.”

ASA A-Series

There are three new models: ASA A70, A90 and A1K, the same names as the latest NetApp AFF products announced in May. At the time, we wrote that the A70 and A90 are like storage appliances, having integrated controller and drive shelves, whereas the A1K is modular, with separate 2RU controller and 2RU x 24-slot storage drive chassis. They have Sapphire Rapids gen 4 Xeon SP controller processors and are powered by the ONTAP OS providing unified file and block storage.

NetApp AFF A-Series hardware — AFF A-Series hardware

Sandeep Singh, SVP and GM of Enterprise Storage at NetApp, stated: “With the new NetApp ASA A-Series systems, our customers can modernize their operations to meet the demands of more powerful workloads on block storage without having to choose between operational simplicity and high-end capabilities.”

NetApp’s ASA arrays are positioned as being block-only for SAN workloads and have a symmetric, active-active controller architecture, but still run ONTAP. The new ASA models use the same AFF A70, A90, and A1K hardware. We envisage that the existing ASA400 is to be succeeded by the ASA A70, the ASA A800 by the ASA A90, and the ASA A900 by the latest ASA A1K.

NetApp ASA A-Series video showing latest systems

NetApp’s John Shirley, VP of Product Management for Enterprise Storage, blogs: “The updated UI incorporates familiar concepts and terminology used by SAN administrators” with “storage units – LUNs and NVMe namespaces – are consolidated on a single page for a cohesive view.” There is “built-in full-stack AIOps for predictive and proactive insights, observability, and optimization.”

To support the new ASA A-Series, NetApp has enhanced its Data Infrastructure Insights service, formerly Cloud Insights, with updates so that customers can better manage visibility, optimization, and reliability for their data infrastructure to increase savings and performance.

NetApp has also added to its portfolio of hybrid flash storage arrays with new FAS 70 mid-range and FAS 90 high-end FAS systems, which offer “affordable, yet high-performing backup storage, enabling a secure cyber vault for recovery from ransomware attacks.” The company’s ONTAP Autonomous Ransomware Protection (ARP) and WORM are available for no additional cost with Cloud Volumes ONTAP (CVO).

There are new features generally available for Google Cloud NetApp Volumes and Azure NetApp Files. For Google Cloud NetApp Volumes, the Premium and Extreme service levels now can provision large volumes starting at 15 TiB that can be scaled up to 1 PiB dynamically in increments of 1 GiB. Google Cloud customers can achieve cost savings through auto tiering, which moves less frequently accessed data to lower-cost storage service levels.

Azure NetApp Files customers can achieve cost savings through cool access auto tiering, which moves less frequently accessed data to lower-cost storage services. Additionally, users can improve data availability with cross-zone replication, enhancing data protection by replicating volumes across Azure availability zones.

BlueXP, NetApp’s unified management facility, has been updated so that it streamlines the ONTAP upgrade process with a service that identifies potential candidates, validates compatibility, reports on recommendations and benefits, and executes selected updates through intuitive wizards.

The new ASA A-Series systems are available for quoting and will begin shipping shortly. Get a datasheet here. It had not been updated with the three new ASA A-Series systems when we looked, though.

NewsPaperStorages and File System News

NewsPaperStorages and File System News

Storage news ticker – October 1

Huawei shares views on MLPerf Storage benchmark

Comment

Crusoe scales up its AI cloud with VAST Data systems

Veeam integrates with Palo Alto Networks on attack response

Micron reveals MLPerf Storage benchmark results for SSDs

Analysis and comment

Workload vs overall performance

Pure CEO on single storage environments, hyperscalers buying flash, and PII protection

MLPerf Storage V1.0 results show critical role of next gen storage in AI model training

GenAI advancing with storage development

Samsung pitches latest SSD at automotive AI … which is now properly a Thing

Nvidia and Hive team up to tackle problem of rogue AI content

Micron revenues surge 93% driven by AI demand

MLPerf AI benchmark tests how storage systems keep GPUs busy

Comparing vendors

Volumez

YanRong Tech

NetApp unveils all-flash SAN arrays, AI tech at INSIGHT 2024

Other AI news

ONTAP ransomware protection

ASA A-Series

ABOUT US

FOLLOW US