Huawei unveils full-stack AI data lake platform

Huawei has developed its own data lake software as part of a full-stack approach to providing AI data storage and pipelining for AI training and inference workloads.

This was presented at Huawei’s IDI (Innovative Data Infrastructure) Forum in Munich last month by Dr Peter Zhou, President of its Data Storage Product Line, and other Huawei speakers. The foundation is based on three Huawei storage systems: OceanStor A Series for fast access data, OceanStor Pacific for nearline data, with dynamic tiering between the two, and OceanProtect for backing up data from the Pacific system.

The OceanProtect E8000 can store up to 16 PB of data with a 255 TB/hour throughput. We’re told the backup function protects A-Series data by directly backing it up to OceanProtect systems. Huawei’s AI entropy analysis function is claimed to have a 99.99 percent ransomware backup attack detection rate.

There are two software layers above this storage array foundation: a data management layer and an AI tool chain layer.

Huawei slide
Yellow items are Huawei’s own products

The data management layer is occupied by Data Management Engine (DME) functions. It provides a single and central management interface to Huawei storage, third-party storage, switches, and hosts using APIs. There are three Huawei DME software products here: DME Omni-Dataverse, DME IQ, and eDataInsight.

The system functions in this layer include Huawei’s data warehouse, a vector database, data catalog, data lineage, version management, and access control.

The Omni-Dataverse is a global file system and data management framework designed to eliminate data silos across geographically dispersed datacenters by providing a single data namespace. This enables the presentation of a unified virtual data repository – a warehouse or data lake – covering multiple geographically dispersed and separate silos on premises or in the public cloud or a hybrid on-prem/cloud setup.

It provides the means for data to be ingested, indexed, processed, curated, and made available for AI training and inference sessions and other data-using applications. Huawei says the system can rapidly index and/or retrieve exabyte-scale datasets, being capable of processing over 100 billion files in seconds using more than 15 search criteria.

Huawei slide

In general, Omni-Dataverse provides data retrieval, lineage, versioning, and access control capabilities. It includes dynamic tiering between the A-Series and Pacific arrays. The vector database aspect is in development and Huawei may partner with a third-party supplier for this functionality. 

As data ages and some of it falls into disuse or expires, then the software can verify data lineage and delete obsolete items.

DME IQ is a cloud operations and maintenance platform using big data analytics and AIOps to provide automated fault reporting and real-time problem tracking.

The top AI tool chain layer in Huawei’s AI stack is for making the datasets from the data lake data available for processing by various hardware engines through pipelines and third-party toolsets such as LangChain. There are Huawei iData and ModelEngine components here, with iData data ingestion and enablement facilitation, along with both model and application enablement.

Huawei says the Model Engine provides an end-to-end AI tool chain to deliver and schedule jobs across both dedicated and shared pools of CPUs, NPUs, and GPUs. Huawei supports the GPUDirect protocol for files and is working on support for the GPUDirect Object protocol.

Huawei slide

The DCS (Data Center Solution) is a datacenter virtualization concept integrating computing, storage, networking, and management. A core virtualization platform within it is eSphere, which provides the virtualization layer. It uses Omni-Dataverse to access a unified global namespace and so operate on data sets within it.

The eContainer function is, roughly, the containerization equivalent, integrating with Huawei’s Kubernetes-based Cloud Container Engine (CCE).

The resource management side of Huawei’s data lake stack provides xPU scheduling, multi-tenancy, and an AI Copilot. There is an AI-powered DataMaster component here, integrating the AI Copilot, to enhance its O&M capabilities. The AI Copilot provides intelligent Q&A via natural language queries, automated guidance for troubleshooting and maintenance tasks, and proactive system health checks.

Comment

Huawei has devised the basics of a full AI stack to support AI training and inference workloads. The only other suppliers with a similar storage hardware to unified data lake to pipelining and model use concept are Dell, with its AI Factory, and VAST Data. Other suppliers such as DDN, Hammerspace, Pure Storage, and WEKA are all building out their AI stacks, with HPE, IBM, and NetApp doing so as well.

We think that Dell, VAST, and the others will have an unimpeded run in the US market due to restrictions on Huawei, but elsewhere they might find Huawei a formidable competitor. It will be able to provide upgrade-to-DME AI stack messages to all of its existing customers as well as prospect greenfield sites, and capitalize on any anti-Trump sentiment out there. There is also a cloud service provision aspect to this, but we’ll look at that another day.