SPONSORED FEATURE: MLCommons just released the result of MLPerf Storage Benchmark V1.0 which suggests that Huawei’s OceanStor A800 all flash array beats its competition by offering almost double the total throughput of its nearest rival.
The benchmark contains three workloads of 3D-Unet, resnet50, and cosmoflow. Compared with V0.5, V1.0 removed Bert workload, added resnet50 and cosmoflow, when NVIDIA H100 and A100 were also added to accelerator types.
Huawei participated in the 3D-Unet workload test using an 8U dual-node OceanStor A800 and it successfully supported the data throughput requirement of 255 simulated NVIDIA H100s for training, by providing a stable bandwidth of 679 GB/s and maintaining over 90 percent accelerator utilization.
The objective of MLPerf Storage Benchmark is to test the maximum number of accelerators supported by the storage system and the maximum bandwidth that the storage system can provide while ensuring optimal accelerator utilization (AU).
Workload | Bandwidth requirement for each accelerator | |
H100 | A100 | |
3D-Unet | 2727MB/s | 1385MB/s |
Resnet50 | 176MB/s | 90MB/s |
cosmoflow | 539MB/s | 343MB/s |
Source: MLCommons
The data above indicates that to obtain high benchmark bandwidth, more accelerators need to be simulated. 3D-Unet H100 has the highest bandwidth requirement for storage among the workloads. This means that if the same number of accelerators are simulated, 3D-Unet H100 can exert the greatest access pressure on storage.
Source: Huawei
It’s important to note that the accelerator numbers and the bandwidth of each computing node do not directly reflect storage performance. Rather, they indicate the server performance of the computing nodes. Only the total number of accelerators (simulated GPUs) and the overall bandwidth can accurately represent the storage system’s capabilities.
“The number of host nodes is not particularly useful for normalization,” said an MLCommons spokesperson. “The scale of a given submission is indicated by the number and type of emulated accelerators – ie ten emulated H100s is 10x the work of one emulated H100 from a storage standpoint”.
You can read more about how the MLPerf Storage v1.0 Benchmark results are compiled and presented here.
Source: Huawei
This result indicates that the OceanStor A800 is ahead of the curve in one important aspect: its total throughput registered 1.92x that of the second-place player, while the throughput per node and per rack unit were 2.88x and 1.44x that of the runner-up respectively (the full MLPerf Storage Benchmark Suite Results are available here).
Additionally, different from traditional storage performance test tools, the MLPerf Storage Benchmark also has strict requirements on latency. For a high-bandwidth storage system, when the quantity of accelerators is increased to provide higher access pressure to the storage system, a stable low latency is a must to prevent AU reduction and to achieve expected bandwidth. In the V1.0 test results, OceanStor A800 also appears capable of providing stable and low latency for the training system even when the bandwidth is high, which can help to maintain high accelerator utilization.
Source: Huawei
GenAI advancing with storage development
In a global survey of AI usage conducted by independent analyst firm McKinsey, 65 percent of respondents revealed that they are now regularly using generative AI (GenAI), nearly double the number recorded by a previous McKinsey survey 10 months earlier.
While regular AI is designed to work with existing datasets, GenAI algorithms focus on the creation of new content that closely resembles authentic information. This ability is creating a range of possibilities across numerous verticals.
From software, finance to fashion, autonomous vehicles, most of varied GenAI use cases depend on the use of large language models (LLMs) to create the right kind of applications and workloads. When GenAI and LLMs work cooperatively with each other, it is also putting a strain on underlying storage architectures – a slow update of the data fed into large AI models could lead to poor results, including so-called AI hallucinations where a large AI model can start to fabricate inaccurate answers.
Most technology companies are busy striving to resolve the challenges with storage products and solutions. The V1.0 test result indicates that the OceanStor A800 can provide data services for AI training and the maximization of GPU/NPU computing utilization, whilst also supporting cluster networking and providing high-performance data services for large-scale training clusters.
Huawei launched the OceanStor A800 High-Performance AI Storage in 2023 specifically to boost the performance of large model training and help organizations accelerate the rollout of applications based on those large AI models. During the recent HUAWEI CONNECT 2024 event, Dr. Peter Zhou – Vice President of Huawei and President of Huawei Data Storage Product Line – said that this new long-term memory storage system can significantly boost large AI model training and inference capabilities, and help various industries step into what he called the “digital-intelligent era”.
Sponsored by Huawei.