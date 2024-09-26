SPONSORED POST MLCommons just released the result of MLPerf Storage Benchmark V1.0 which contains three workloads of 3D-Unet, resnet50, and cosmoflow. Compared with V0.5, V1.0 removed Bert workload, added resnet50 and cosmoflow, when NVIDIA H100 and A100 were also added to accelerator types.

3D-Unet workload requires the highest bandwidth

Huawei participated in the 3D-Unet workload test using an 8U dual-node OceanStor A800 and it successfully supported the data throughput requirement of 255 simulated NVIDIA H100s for training, by providing a stable bandwidth of 679 GB/s and maintaining over 90% accelerator utilization.

The objective of MLPerf Storage Benchmark is to test the maximum number of accelerators supported by the storage system and the maximum bandwidth that the storage system can provide while ensuring optimal accelerator utilization (AU).

Workload Bandwidth requirement for each accelerator H100 A100 3D-Unet 2727MB/s 1385MB/s Resnet50 176MB/s 90MB/s cosmoflow 539MB/s 343MB/s

The data above indicates that to obtain high benchmark bandwidth, more accelerators need to be simulated. 3D-Unet H100 has the highest bandwidth requirement for storage among the workloads. This means that if the same number of accelerators are simulated, 3D-Unet H100 can exert the greatest access pressure on storage.

It’s important to note that the accelerator numbers and the bandwidth of each computing node do not directly reflect storage performance; rather, they indicate the server performance of the computing nodes. Only the total number of accelerators (simulated GPUs) and the overall bandwidth can accurately represent the storage system’s capabilities. As the spokesperson from MLPerf said,” The number of host nodes is not particularly useful for normalization. The scale of a given submission is indicated by the number and type of emulated accelerators – ie ten emulated H100s is 10x the work of one emulated H100 from a storage standpoint”.

This result shows that OceanStor A800 is ahead of the curve: its total throughput is 1.92x that of the second-place player, while the throughput per node and per rack unit are 2.88x and 1.44x that of the runner-up, respectively.

Additionally, different from traditional storage performance test tools, MLPerf Storage Benchmark also has strict requirements on latency. For a high-bandwidth storage system, when the quantity of accelerators is increased to provide higher access pressure to the storage system, a stable low latency is a must to prevent AU reduction and to achieve expected bandwidth. In V1.0, OceanStor A800 also demonstrates capability to provide stable and low latency for the training system even when the bandwidth is high, which helps to maintain high accelerator utilization.

GenAI advancing with storage development

In a global survey of AI usage conducted by independent analyst firm McKinsey, 65% of respondents revealed that they are now regularly using Gen AI, nearly double the number revealed by a previous McKinsey survey 10 months earlier.

While regular AI is designed to work with existing datasets, Gen AI algorithms focus on the creation of new content that closely resembles authentic information. This ability is creating a range of possibilities across numerous verticals.

From software, finance to fashion, autonomous vehicles, most of varied Gen AI use cases depend on the use of large language models (LLMs) to create the right kind of applications and workloads. When Gen AI and LLMs work cooperatively with each other, it is also putting a strain on underlying storage architectures – a slow update of the data fed into large AI models could lead to poor results, including so-called AI hallucinations where a large AI model can start to fabricate inaccurate answers.

Currently, leading technology companies are striving to resolve the challenges with storage products and solution. Proved by the V1.0 test result, OceanStor A800 can provide ultimate data services for AI training and maximize GPU/NPU computing utilization, meanwhile supporting cluster networking and providing high-performance data services for large-scale training clusters.

Huawei launched the OceanStor A800 High-Performance AI Storage in 2023, to empower large model training, accelerating the rollout of large AI model applications. During recent HUAWEI CONNECT 2024, Dr. Peter Zhou, Vice President of Huawei and President of Huawei Data Storage Product Line, said that this new AI storage system significantly boosts large AI model training and inference capabilities, based upon the new paradigm of long-term memory storage, helping various industries effortlessly step into the digital-intelligent era.

With innovations in mind and by practice, Huawei has shown its commitment to redefining data storage with a special focus on real world customer challenges and the demands those customers face in building AI-ready data infrastructure that delivers real value. As an industry-leading high-performance AI storage, OceanStor A800 is designed to accelerate the training and inference of industry-specific models, paving the way to the AI era.