Home Blog Page 79

Samsung to offer SSDs on subscription

Samsung office
Samsung office

Samsung has published a blog discussing customers renting its petabyte-scale SSD (PBSSD) architecture products through a MinIO partnership.

It’s not that Samsung actually has a petabyte-capacity SSD – more that it wants to overcome customer resistance to capex purchases of large amounts of SSD capacity by changing to an opex subscription-style approach.

The blog notes that “applications today are demanding both high performance and capacity in excess of 10PB” due to analytics and AI/ML training workloads. Analytics workloads, it claims, can need from 1PB to 1EB of flash storage.

Samsung discusses a PBSSD server design – based on gen 4 AMD EPYC CPUs with 32-84 cores, a single socket, and 128 PCIe 5 lanes. It can support 16x E3.s 15.36 TB NVMe SSDs in a 1 RU chassis – 244TB in total. What’s more: “An upcoming system design will support 1PB in a 2 RU chassis.”

The system provides 232GB/sec sequential read bandwidth and 98GB/sec sequential write bandwidth, 9.5 million random read IOPS, and 5.1 million random write IOPS. Samsung proposes to offer this on a subscription basis: “Customers place an order for the desired storage capacity in increments of 244TB (one PBSSD unit) and subscription duration (1, 3, or 5 years) at a fixed monthly fee. No hidden or usage-based costs.”

Samsung notes that the MinIO partnership means MinIO object storage software holds the data on the SSDs. “MinIO uses familiar HTTP GET and PUT calls to initiate object transfer and management sequence. This cloud native approach is far more efficient than using half-century old file-oriented commands like copy, move, and delete. Moreover, MinIO bakes content, metadata, version, and security into the recipe for handling objects, simplifying the task of maintaining data integrity and recoverability across servers anywhere.”

According to the blog, MinIO and Samsung are creating “a seamless storage fabric for the most demanding workloads. Sixteen directly connected PCIe 5 lanes, compute power up to 84 cores, and sophisticated power reduction techniques result in breathtaking performance with sustainability. MinIO operations, such as maintaining data integrity and rebuilding after a data loss event, take place with virtually no performance impact on active requests.”

However, Samsung is not locked into MinIO. “When ordering products, customers can take advantage of Samsung partnerships with software-defined storage providers to deliver pre-certified object and file solutions like MinIO, Weka, and vSAN. Customers are also free to engage separately with the SDS software provider of their choosing, or even to use open source software like Ceph.”

Samsung and its PBSSD concept can be viewed at the exhibition area of Nvidia’s GTC event at booth #528.

Storage news ticker – March 21

An AWS blog explains how you can use InfluxDB as a database engine in Amazon Timestream. This makes it easier to run near-real-time time-series applications using InfluxDB and open source APIs, including open source Telegraf agents that collect time-series observations.

PCIe retimer and CXL device startup Astera Labs set the pricing of its IPO of 19.8 million shares at $36 apiece – aiming to raise around $712.8 million, with gross proceeds expected to be around $604.4 million. That would give it a valuation of about $5.5 billion. The shares are expected to begin trading on Nasdaq Global Select Market under the the ticker symbol “ALAB” on March 20.

CloudCasa by Catalogic announced the newest version of its CloudCasa software. It adds new migration and replication workflows to simplify Kubernetes use cases such as migrating on-premises clusters to cloud, migrating cloud to cloud, replicating production environments for test/dev and disaster recovery, and migrating locally between various Kubernetes configurations. It has new cloud integration and manageability features, extending and improving the backup, restore, and disaster recover capabilities of CloudCasa, as well as its ability to centrally manage Velero installations in large and complex environments. 

IBM’s latest Storage Scale System exceeds 340GB/sec sequential read bandwidth and 160GB/sec write bandwidth with current code and firmware from just 4U of rack space. There are 16x 200GbitE links in total. Client nodes use RoCE and each has a 200GbitE link. Check out a LinkedIn post here.

An update to IBM Power VS (Virtual Server) provides DR and Backup-as-a-Service as well as SAP installation, cloud security and compliance, and automated migration for IBM AIX and i. Details in a blog. (IBM Power Virtual Server is a family of configurable multi-tenant virtual IBM Power servers with access to IBM Cloud services.)

Cloud backup supplier Keepit announced the results available to organizations leveraging Keepit SaaS data protection. It gives customers up to 90 percent faster targeted restore time following a ransomware attack. And it limits the impact of a ransomware attack for the composite organization the study has based its calculations on “by allowing it to recover and restore data quickly, preventing data loss and reducing downtime. This benefit is worth $819,100.”

MSP backup supplier N-able announced Cove’s “Master of Disaster Recovery” Class – a free online course aimed at supporting disaster preparedness for MSPs worldwide. The class leverages Cove’s Disaster Recovery as a Service (DRaaS) capabilities to impart essential knowledge and skills to the MSP community in the event of a data breach. The classes will be conducted via GoToWebinar every two weeks on an ongoing basis. Upcoming 2024 sessions include April 2. 

HCI and hybrid multi-cloud vendor Nutanix has released the findings of its sixth annual Enterprise Cloud Index (ECI) survey and research report, which measures global enterprise progress with cloud adoption. The use of hybrid multi-cloud models is forecast to double over the next one to three years. Get more details here.

Veeam-focused object storage backup target supplier Object First is having its Ootbi products sold by Pedab in 11 Northern European countries.

German open source vector database startup Qdrant says its Rust-based database is being used by Elon Musk’s open release of the Grok AI model. There have been more than five million downloads of the Qdrant database.

The Turing Trust, the technology recycling and education charity founded by the family of Alan Turing, has received a donation of 100 hard drives from Seagate Technology. The hard drives will be installed in computers destined for Malawi to increase access to digital skills by students in primary and secondary schools across the country.

Ceph-based array manufacturer SoftIron announced VM Squared – a virtualization software as an alternative to VMware’s vSphere product suite. It installs in 30 minutes or less and there is a VM Squared migration tool that migrates an entire VMware vSphere estate quickly and easily. More details here.

Broadcom’s VMware unit has announced the GA of VMware Live Recovery, providing cyber and data resiliency for VMware Cloud Foundation environments. It combines enterprise-grade disaster recovery with purpose-built cyber recovery with a unified management experience across clouds. There are two underlying technologies: VMware Live Cyber Recovery (formerly VMware Cloud Disaster Recovery/Ransomware Recovery) and VMware Live Site Recovery (formerly VMware Site Recovery Manager). Check out a VMware web page to learn more.

Cloud storage provider Wasabi and Network-as-a-Service supplier Console Connect are collaborating to support seamless on-premises-to-cloud migration and cloud-to-cloud migration as well as instant multi-cloud provisioning.

CXL a no-go for AI training

Analysis. Computer Express Link (CXL) technology has been pushed into the backseat by the Nvidia GTC AI circus, yet Nvidia’s GPUs are costly and limited in supply. Increasing their memory capacity to enable them to do more work would seem a good idea, so why isn’t CXL – and its memory pooling – front and center in the Nvidia GPU scramble?

CXL connects pools of DRAM across the PCIe bus. There are three main variants:

  • CXL 1 provides memory expansion, letting x86 servers access memory on PCIe-linked accelerator devices such as smartNICs and DPUs;
  • CXL 2 provides memory pooling between several servers hosts and a CXL-attached device with memory;
  • CXL 3 provides memory sharing between servers and CXL devices using CXL switches.

All three have a coherent caching mechanism, meaning that the local CPU level 1 and instruction caches, containing a subset of what is in memory, have uniform content. CXLs 1 and 2 are based on the PCIe 5 bus with CXL 3 using the PCIe 6 bus. Access to external memory via CXL adds latency.

All the memory that is accessed, shared or pooled in a CXL system needs a CXL access method, meaning PCIe 5 or 6 bus access and CXL protocol support. The DRAM in x86 servers and the GDDR memory in GPUs is suitable. However, high-bandwidth memory (HBM) integrated with GPUs via an interposer in Nvidia’s universe is not suitable, as it has no PCIe interface.

AMD’s Instinct M1300A accelerated processing unit (APU), with combined CPU and GPU cores and a shared memory space, has a CXL 2 interface. Nvidia’s Grace Hopper superchip, with Armv9 Grace CPU and Hopper GPUs, has a split memory space.

SemiAnalysis analyst Dylan Patel writes about CXL and GPUs in his subscription newsletter. He observes that Nvidia’s H100 GPU chip supports NVLink, C2C (to link to the Grace CPU) and PCIe interconnect formats. But the PCIe interconnect scope is limited. There are just 16 PCIe 5 lanes which run overall at 64GB/sec, whereas NVlink and C2C both run at 450GB/sec – seven times faster. Patel notes that the I/O part of Nvidia’s GPUs is space-limited and Nvidia prefers bandwidth over standard interconnects – such as PCIe. 

Therefore the PCIe area on the chip will not grow in future, and may shrink.

There’s much more detail in Patel’s newsletter but it’s behind a subscription paywall.

The takeaway is that there will be no CXL access to an Nvidia GPU’s high-bandwidth memory. However, x86 CPUs don’t use NVLink and having extra memory in x86 servers means memory-bound jobs can run faster – even with added latency for external memory access.

It then follows that CXL will not feature in AI training workloads when they run on GPU systems fitted with HBM – but it may have a role in datacenter x86/GDDR-GPU servers running AI tuning and inference workloads. We also may not see CXL having a role in edge systems, as these will be simpler in design than datacenter systems and require less memory overall.

Storage news ticker – March 20

Data streamer Confluent announced that its Flink service is now available on AWS, Google Cloud, and Microsoft Azure. The cloud-native service enables reliable, serverless stream processing, which allows customers like Airbnb, Uber, and Netflix to gain timely insights from live data streams. This helps them to offer consumers real-time services – from personalized recommendations to dynamic pricing. It also announced the release of Tableflow, software that unites analytics and operations with data streaming in a single click to feed data warehouses, data lakes, and analytics engines.

NoSQL cloud database supplier Couchbase announced financial results for its fourth quarter and fiscal year ended January 31. Total revenue for the quarter was $50.1 million – an increase of 20 percent year-over-year, with a net loss of $21.4 million. Subscription revenue for the quarter was $48.1 million – an increase of 26 percent year-over-year. Total revenue for the year was $180 million – an increase of 16 percent year-over-year, with a loss of $80.2 million. Subscription revenue for the year was $171.6 million, an increase of 20 percent year-over-year. It expects next quarter’s revenue to be $48.1-$48.9 million.

Decentralized storage provider Cubbit announced general availability of its DS3 Composer, which is used to build virtual private S3-compatible public clouds. It uses DS3 technology – a multi-tenant, S3-compatible object store developed by Cubbit. DS3 Composer collects and aggregates new and recycled resources across the edge, on-prem, and public cloud – exposing them as an S3 object store repository via a SaaS control plane. Resources are organized in geo-distributed networks, and each network node can provide access and capacity via the S3 protocol. Cubbit has technology partnerships with HPE and with Equinix.

Mainframe supplier Fujitsu and Amazon Web Services (AWS) announced an expanded partnership to provide assessment, migration, and modernization of legacy mission critical applications running on on-premises mainframes and Unix servers onto the AWS Cloud.

NAKIVO’s agent-based backup support for Proxmox virtual machine data is now available.

Alessandra Yockelson of storage company NetApp
Alessandra Yockelson

NetApp has hired a chief human resources officer – Alessandra Yockelson, with over 25 years of experience – to drive a new chapter of NetApp’s culture and growth globally. She comes from a similar role at Pure Storage, where she led organizational change efforts that resulted in global performance scaling. Before that she was a chief talent officer at HPE. Normally we’d pass over such an appointment and praise, but NetApp describes her as a technology industry titan. What does that mean? We think NetApp is in for enhanced diversity in its hiring practices as it seeks to broaden the talent pool for managers, directors, and execs.

Paul Hiemstra of storage company Panasas
Paul Hiemstra

Panasas has hired a new CFO, Paul Hiemstra, declaring his role will be instrumental in shaping strategic direction for the executive team and board, ultimately driving profitability, growth, and value creation. Ken Claffey became Panasas CEO in September last year and the biz has been searching for a CFO for three months. Elliot Carpenter was Panasas CFO from March 2016 to August 2020.

Hiemstra’s background includes 11 years at Cray, where he ascended from corporate treasurer to head of investor solutions. In this latter position, he played a pivotal role in the 2019 integration of Cray and HPE, serving as CFO of HPE’s HPC and AI divisions. 

Earlier this month, Pure Storage announced self-service capabilities across its Pure1 storage management platform and Evergreen portfolio. More than 30 percent of Pure’s customer base uses ActiveCluster. Upgrades to the Purity operating environment for ActiveCluster required time-consuming manual effort, as customers jumped from one upgrade to another in an effort to keep servers in sync. With Autonomous Upgrades, customers can simply invoke a Purity upgrade, leaving the heavy lifting to Pure Storage via its Pure1 platform, and freeing up the time previously spent managing the process themselves. 

Also, in the event of a ransomware anomaly, Pure1 now recommends snapshots from which customers can restore their affected data (both locally and remote), eliminating cumbersome, manual snapshot catalog reviews, which can take anywhere from hours to days. 

Computational storage supplier ScaleFlux says it will integrate the Arm Cortex-R82 processor in its forthcoming line of enterprise Solid State Drive (SSD) controllers, following the newly announced SFX 5016. It says this is a strategic move to leverage the processor’s high performance and energy efficiency. The Cortex-R82, is the highest performance real-time processor from Arm and the first to implement the 64-bit Armv8-R AArch64 architecture, representing, ScaleFlux claims, a significant advancement in processing power and efficiency for enterprise storage systems. Perhaps ScaleFlux computational storage drives will do more than compression in the future.

SK hynix has begun volume production of HBM3E, the newest AI memory product with ultra-high performance, for supply to a customer (Nvidia) from late March.

Software RAID supplier Xinnor has a white paper titled “High Performance Storage Solution for PostgreSQL Database in Virtual Environment, Boosted by xiRAID Engine and KIOXIA PCIe5 Drives.” The paper presents detailed benchmarking results comparing the performance of different storage configurations, including vHOST Kernel Target with Mdadm and SPDK vhost-blk target protected by Xinnor’s xiRAID Opus (Optimized Performance in User Space). It says xiRAID provides 30–40 percent better transaction per second than Mdadm in select-only benchmarks, outperforms Mdadm by over 20 times in degraded mode, ensuring high performance even in the event of drive failures. 

Also xiRAID provides 30–40 percent better transaction per second than Mdadm in select-only benchmarks and demonstrates superior performance in write operations, outpacing Mdadm by six times in small block writes and five times in TPC-B-like script benchmarks. The scalability of xiRAID on virtual machines allows for the consolidation of servers, resulting in significant cost savings and simplified storage infrastructure. Download the white paper here.

Open source vector database supplier Zilliz says its Milvus 2.4 release has a groundbreaking GPU indexing feature powered by Nvidia’s CUDA-Accelerated Graph Index for Vector Retrieval (CAGRA). It claims GPU indexing represents a significant milestone in vector database technology, propelling Milvus 2.4 further ahead of traditional CPU-based indexes like HNSW. Leveraging the power of GPU acceleration, Milvus 2.4 delivers remarkable performance gains – particularly under large datasets, ensuring lightning-fast search responses and unparalleled efficiency for developers.

Milvus 2.4 also introduces support for GPU-based brute force search, further enhancing recall performance without sacrificing speed. Milvus 2.4 is now available for download. Explore the latest features and enhancements on the Milvus website.

HPE pushes packet of next-gen AI products in partnership with Nvidia

HPE is the latest vendor to roll out a portfolio of GenAI training and inference products amid plans to use Nvidia GPUs and microservices software announced at Nvidia’s GTC 2024 event this week.

The edge-to-datacenter, hybrid on-premises, and public cloud approach is being brought to the GenAI table by HPE, along with its Cray-based supercomputing capabilities, enterprise ProLiant servers, Aruba networking,  Ezmeral data fabric, and GreenLake for file storage. Where competitor Dell is stronger in storage, HPE is stronger in supercomputing and edge networking. The two are roughly equal in server tech and HPE is arguably further advanced in cloud computing with its GreenLake scheme.

HPE CEO and president Antonio Neri said: “From training and tuning models on-premises, in a colocation facility or the public cloud, to inferencing at the edge, AI is a hybrid cloud workload. HPE and Nvidia have a long history of collaborative innovation, and we will continue to deliver co-designed AI software and hardware solutions that help our customers accelerate the development and deployment of GenAI from concept into production.”

HPE is announcing: 

  • Availability of GenAI supercomputing systems with Nvidia components
  • Availability of GenAI enterprise computing systems with Nvidia components
  • Enterprise retrieval-augmented generation (RAG) reference architecture using Nvidia’s NeMo microservices
  • Preview of Machine Learning Inference Software using Nvidia’s NIM microservices
  • Planned future products based on Nvidia’s Blackwell platform

The supercomputing system was announced at SC23 as a turnkey and pre-configured system featuring liquid-cooled Cray AMD EPYC-powered EX2500 supercomputers, with EX254n blades, each carrying eight Nvidia GH200 Grace Hopper chips. It includes Nvidia’s AI Enterprise software and the system can scale to thousands of GH200s. A solution brief doc has more information.

HPE Cray supercomputer
HPE Cray supercomputer

The turnkey version is a limited configuration supporting up to 168 GH200s and is meant for GenAI training. The obvious comparison is with Nvidia’s SuperPOD and the DGX GH200 version of that supports up to 256 GH200s. Dell has no equivalent to the Cray supercomputer in its compute arsenal and is a full-bodied SuperPOD supporter.

HPE’s enterprise GenAI system was previewed at HPE’s Discover Barcelona 2023 event in December and is focused on AI model tuning and inference. It’s rack-scale and pre-configured, being built around 16 x ProLiant DL380a x86 servers,  64 x Nvidia L40S GPUs, BlueField-3 DPUs, and Nvidia’s Spectrum-X Ethernet networking. The software includes HPE’s machine learning and analytics software, Nvidia AI Enterprise 5.0 software with new NIM microservices for optimized inference of GenAI models, NeMo Retriever microservices, and other data science and AI libraries.

It’s been sized to fine-tune a 70 billion-parameter Llama 2 model. A 16-node system will run the model in six minutes, we’re told.

The HPE Machine Learning Inference Software is in preview and enables customers to deploy machine learning models at scale. It will integrate with Nvidia’s NIM microservices to deliver foundation models using pre-built containers optimized for Nvidia’s environment.

The enterprise RAG reference architecture, geared to bringing a customer’s proprietary digital information into the GenAI fold, consists of Nvidia’s NeMo Retriever microservices, HPE’s Ezmeral data fabric software, and GreenLake for File Storage (Alletra MP storage hardware twinned with VAST Data software).

This ref architecture is available now and will, HPE says, offer businesses a blueprint to create customized chatbots, generators, or copilots.

Nvidia has announced its Blackwell architecture GPUs and HPE will support this with future products.

Comment

What’s absent from HPE’s GTC news is full-throated support for Nvidia’s BasePOD and SuperPOD GPU supercomputers. HPE’s storage does not support GPUDirect, apart from the OEM’d VAST Data software forming its GreenLake for File Storage service. Competitors Dell, DDN, Hitachi Vantara, NetApp, Pure Storage, VAST Data, and WEKA are all signed up members of the SuperPOD supporters’ club. Their collective Cray support is a lot weaker.

WEKA intros turnkey appliance to plug into SuperPOD

Parallel file system supplier WEKA has devised an on-premises WEKApod storage appliance to plug into Nvidia’s SuperPOD.

Update. SuperPod details updated and image updated as well. 19 March 2024.

SuperPOD is Nvidia’s rack-scale architecture designed for deploying its GPUs. It houses 32 to 256 DGX H100 AI-focused GPU servers and connects over the InfiniBand NDR network. A DGX A100 is a 8RU chassis containing 8 x H100 GPUs, meaning up to 2,048 in a SuperPOD, 640 GB Memory, dual Xeon 8480C CPUs and BlueField IO-accelerating DPUs.

Nilesh Patel, WEKA
Nilesh Patel

WEKA chief product officer Nilesh Patel said: “WEKA is thrilled to achieve Nvidia DGX SuperPOD certification and deliver a powerful new data platform option for enterprise AI customers … Using the WEKApod Data Platform Appliance with DGX SuperPOD delivers the quantum leap in the speed, scale, simplicity, and sustainability needed for enterprises to support future-ready AI projects quickly, efficiently, and successfully.”

WEKApod is a turnkey hardware and software appliance, purpose-built as a high-performance data store for the DGX SuperPOD. Each appliance consists of pre-configured storage nodes and software for simplified and faster deployment. A 1 PB WEKApod configuration starts with eight storage nodes and scales up to hundreds. It uses Nvidia’s ConnectX-7 400 GBps InfiniBand network card and integrates with Nvidia’s Base Command manager for observability and monitoring.

Nvidia SuperPOD.

WEKA has supplied storage already for BasePOD use and its WEKApod is certified for the SuperPOD system, delivering up to 18.3 million IOPS, 720 GBps sequential read bandwidth, and 186 GBps write bandwidth from eight nodes. That’s 90 GBps/node when reading and 23.3 GBps/node when writing.

WEKApod

WEKA claims its Data Platform’s AI-native architecture delivers the world’s fastest AI storage, based on SPECstorage Solution 2020 benchmark scores. The WEKApod’s performance numbers for an 8-node 1PB cluster are 720 Gbps read bandwidth and 18.3 million IOPS. Per 1RU node that means 90 GBs read and 23.3 GBps write bandwidth and 2.3 million IOPS. A WEKA slide shows the hardware spec:

A WEKA blog declares: “Over the past three decades, computing power has experienced an astonishing surge, and more recent advancements in compute and networking have democratized access to AI technologies, empowering researchers and practitioners to tackle increasingly complex problems and drive innovation across diverse domains.”

But with storage, WEKA says: “Antiquated legacy storage systems, such as those relying on the 30-year-old NFS protocol, present significant challenges for modern AI development. These systems struggle to fully utilize the bandwidth of modern networks, limiting the speed at which data can be transferred and processed.” 

They also struggle with large numbers of small files. WEKA says it fixes these problems. In fact, it claims: “WEKA brings the storage leg of the triangle up to par with the others.”

Find out more about the SuperPOD-certified WEKApod here.

Comment

Dell, DDN, Hitachi Vantara, HPE, NetApp, Pure Storage, and VAST Data have all made announcements relating to their product’s support for and integration with Nvidia’s GPU servers at GTC. Nvidia GPU hardware and software support is now table stakes for storage suppliers wanting to play in the GenAI space particularly for GenAI training workloads. Any supplier without such support faces being knocked out of a storage bid for AI workloads running on Nvidia gear.

Nvidia GTC storage news roundup

Cohesity announced a collaboration with NVIDIA for its Gaia Gen AI software to include Nvidia’s NIM microservices. Gaia will be integrated with NVIDIA’s AI Enterprise software. Nvidia has also invested in Cohesity. Gaia nearest and dearest now get:

  • Domain-specific and performant generative AI models based on the customers’ Cohesity-managed data, using NVIDIA NIM. Customers can fine-tune their large language models with their data and adapt them to fit their organisation.
  • A tool for customers to query their data via a generative AI assistant to gain insights from their data, such as deployment and configuration information, security, and more.
  • Cohesity’s secondary data to build Gen AI apps that provide insights based on a customer’s own data.
  • Gen AI intelligence to data backups and archives with NVIDIA NIM.

Distributor TD SYNNEX’ Hyve hardware building business unit has announced an array of products tailored for the AI lifecycle at NVIDIA GTC. They include:

  • NVIDIA MGX / OCP DC-MHS 2U Optimized Generational Platforms for Compute, Storage and Ruggedized Edge 
  • OAM-UBB 4U Scalable HPC-AI Reference Platform                                     
  • Next Generation AI-Optimized ORv3 Liquid-Cooled Enclosure                                                                                   
  • Various 1U Liquid-Cooled Inference Platforms

Real-time streaming analytics data warehouse supplier Kinetica, which has integrated ChatGPT to its offering, is adding a real-time RAG capability, based on NVIDIA’s NeMo Retriever microservices and low-latency vector search using NVIDIA RAPIDS RAFT technology. Kinetica has built native database objects that allow users to define the semantic context for enterprise data. An LLM can use these objects to grasp the referential context it needs to interact with a database in a context-aware manner. All the features in Kinetica’s GEN AI offering are exposed to developers via a relational SQL API and LangChain plugins.

Kinetica claims its real-time generative AI offering removes the requirement for reindexing vectors before they are available for query. Additionally, we’re told it can ingest vector embeddings 5X faster than the previous unnamed market leader, based on the VectorDBBench benchmark.

Lenovo has announced that hybrid AI systems, built in collaboration with NVIDIA, and already optimized to run NVIDIA AI Enterprise software for production AI, will now provide developers access to the justannounced NVIDIA microservices, including NIM and NeMo Retriever. Lenovo has expanded the ThinkSystem AI portfolio, featuring two new 8-way NVIDIA GPU systems that are purpose-built to deliver massive computational capabilities. 

They are designed for Gen AI, natural language processing (NLP), and large language model (LLM) development, with support for the HGX AI supercomputing platform, including H100 and H200 Tensor Core GPUs and the Grace Blackwell GB200 Superchip, as well as Quantum-X800 InfiniBand and Spectrum-X800 Ethernet networking platforms. The new Lenovo ThinkSystem SR780a V3 is a 5U system that uses Lenovo Neptune liquid cooling.

Lenovo claims it’s the leading provider of workstation-to-cloud support for designing, engineering and powering OVX systems and the Omniverse development platform. It’s partnering with NVIDIA to build accelerated models faster using MGX modular reference designs. 

NVIDIA launched enterprise-grade generative AI microservices that businesses can use to create and deploy custom applications on their own platforms. The catalog of cloud-native microservices built on top of NVIDIA’s CUDA software, includes NIM microservices for optimized inference on more than two dozen popular AI models from NVIDIA and its partner ecosystem.  

NIM microservices provide pre-built containers powered by NVIDIA inference software — including Triton Inference Server and TensorRT-LLM — which, we’re told, enable developers to reduce deployment times. They provide industry-standard APIs for domains such as language, speech and drug discovery to enable developers to build AI applications using their proprietary data hosted in their own infrastructure. Customers will be able to access NIM microservices from Amazon SageMaker, Google Kubernetes Engine and Microsoft Azure AI, and integrate with popular AI frameworks like Deepset, LangChain and LlamaIndex.

Box, Cloudera, Cohesity, Datastax, Dropbox, NetApp and Snowflake are working with NVIDIA microservices to help customers optimize their RAG pipelines and integrate their proprietary data into generative AI applications. 

NVIDIA accelerated software development kits, libraries and tools can now be accessed as CUDA-X microservices for retrieval-augmented generation (RAG), guardrails, data processing, HPC and more. NVIDIA separately announced over two dozen healthcare NIM and CUDA-X microservices.

NVIDIA has introduced a storage partner validation program for Its OVX computing systems. OVX servers have L40S GPUs and include AI Enterprise software with Quantum-2 InfiniBand or Spectrum-X Ethernet networking, as well as BlueField-3 DPUs. They’re optimized for generative AI workloads, including training for smaller LLMs (for example, Llama 2 7B or 70B), fine-tuning existing models and inference with high throughput and low latency.

NVIDIA-Certified OVX servers are available and shipping from GIGABYTE, HPE and Lenovo.

The validation program provides a standardized process for partners to validate their storage appliances. They can use the same framework and testing that’s needed to validate storage for the DGX BasePOD reference architecture. Each test is run multiple times to verify the results and gather the required data, which is then audited by NVIDIA engineering teams to determine whether the storage system has passed. The first OVX validated storage partners are DDN, Dell (PowerScale), NetApp, Pure Storage and WEKA.

SSD controller array startup Pliops has a collaboration with composability supplier Liqid to create an accelerated vector database offering to improve performance, capacity and resource requirements. Pliops says its XDP-AccelKV addresses GPU performance and scale by both breaking the GPU memory wall and eliminating the CPU as a coordination bottleneck for storage IO. It extends HBM memory with fast storage to enable terabyte-scale AI applications to run on a single GPU. XDP-AccelKV is part of the XDP Data Services platform, which runs on the Pliops Extreme Data Processor (XDP).

Pliops has worked with Liqid to create an accelerated vector database product, based on Dell servers with Liqid’s LQD450032TB PCIe 4.0 NVMe SSDs, known as the Honey Badger, and managed by Pliops XDP. Sumit Puri, president, chief strategy officer & co-founder at Liqid, said: “We are excited to collaborate with Pliops and leverage their XDP-AccelKV acceleration solution to address critical challenges faced by users running RAG applications with vector DBs.”

Snowflake and NVIDIA announced an expanded agreement around adding NVIDIA AI products to Snowflake’s data warehouse. NVIDIA accelerated compute powers several of Snowflake’s AI products:

  • Snowpark Container Services
  • Snowflake Cortex LLM Functions (public preview)
  • Snowflake Copilot (private preview)
  • Document AI (private preview)

We have updated our table of suppliers’ GPUDirect file access performance in the light of GTC announcements;

A chart show the suppliers’ relative positions better;

There is a divide opening up between DDN, Huawei, IBM, VAST Data and WEKA on the one hand and slower performers such as Dell (PowerScale), NetApp and Pure Storage on the other. Recharting to show performance density per rack unit shows the split more clearly.

We note that no analysis or consultancy business such as ESG, Forrester, Futurum, Gartner, or IDC has published research looking at supplier’s GPUDirect performance. That means there is no authoritative external validation of these results.

Pure Storage unveils anti-hallucination AI tech at Nvidia’s GTC

Pure Storage is announcing anti-hallucinatory Retrieval-Augmented Generation (RAG) reference architectures at Nvidia’s GTC event to add an organization’s own data to GenAI chatbots and make their answers more accurate.

GTC, Nvidia’s Global Technology Conference, is being held this week in San Jose, CA, with an AI focus. 

Rob Lee, Pure CTO, said: “Pure Storage recognized the rising demand for AI early on, delivering an efficient, reliable, and high-performance platform for the most advanced AI deployments. Embracing our longstanding collaboration with Nvidia, the latest validated AI reference architectures and generative AI proofs of concept emerge.”

There are four aspects to Pure’s news:

  • Retrieval-Augmented Generation (RAG) Pipeline for AI Inference: This uses Nvidia’s NeMo Retriever microservices and GPUs and Pure’s all-flash storage for enterprises using their own internal data for faster AI training. 
  • Vertical RAG Development: First, Pure Storage has created a financial services RAG system in conjunction with Nvidia to summarize and query massive datasets with higher accuracy than off-the-shelf LLMs. Financial services can use AI to create instant summaries and analysis from various financial documents and other sources. Additional RAGs for healthcare and public sector are to be released. 
  • Certified Nvidia OVX Server Storage Reference Architecture: Pure Storage has achieved OVX Server Storage validation, providing enterprise customers and channel partners with storage reference architectures, validated against benchmarks to provide a strong infrastructure foundation for cost and performance-optimized AI hardware and software solutions. This validation complements Pure Storage’s certification for Nvidia’s DGX BasePOD announced last year. 
  • Expanded Investment in AI Partner Ecosystem: Pure Storage has new partnerships with ISVs like Run.AI and Weights & Biases. Run.AI optimizes GPU utilization through advanced orchestration and scheduling, while the Weights & Biases AI Development platform enables ML teams to build, evaluate, and govern the model development life cycle.

In a supporting quote supplied by Pure, ESG principal analyst Mike Leone said: ”Rather than investing valuable time and resources in building an AI architecture from scratch, Pure’s proven frameworks not only mitigate the risk of expensive project delays but also guarantee a high return on investment for AI team expenditures like GPUs.”

Read more in a Pure blog – Optimize GenAI Applications with Retrieval-augmented Generation from Pure Storage and Nvidia – which should be live from 10pm GMT/3pm PT, March 18.

Bootnote

Nvidia’s OVX systems are designed to build virtual worlds using 3D software applications and to operate immersive digital twin simulations in Nvidia’s Omniverse Enterprise environment. They are separate from Nvidia’s AI-focused DGX GPU servers.

Hitachi Vantara launches industry-specific AI systems with Nvidia

Hitachi iQ is a new set of industry-specific AI systems using Nvidia DGX and HGX GPUs and Hitachi Vantara storage.

Hitachi Vantara is reshaping itself with a renewed focus on storage and data infrastructure focus under CEO Sheila Rohra and a refreshed exec team. This includes ex-NetApp SVP Cloud Engineering Octavian Tanese, who is now Hitachi Vantara’s chief product officer. Hitachi iQ is being announced at Nvidia’s GTC event in San Jose, CA, this week.

Rohra said: “As part of the Hitachi family, we have strong expertise in multiple industrial market segments including energy, transportation, and manufacturing, which makes it possible to accelerate digital transformation powered by AI.”

Hitachi iQ will have various consumption schemes and its first AI framework offering will arrive between Apirl and June. Features include:

  • Nvidia BasePOD certification
  • High-end Nvidia HGX GPU server offering
  • Mid-range PCIe architecture system with Nvidia H100 and L40S GPUs
  • Nvidia AI Enterprise software support
  • Gen 5 Hitachi Content Software for File (HCFS) release with accelerated storage nodes for AI workloads

HCFS is WekaIO’s rebranded WekaFS file system software (Weka Data Platform). It is a POSIX-compliant file system integrated with Hitachi Vantara’s object storage. Data in HCFS can be accessed through NFS, SMB, and POSIX stored in PCIe gen 4 NVMe SSDs or via S3-accessed objects in a secondary storage tier. HCFS supports Nvidia’s GPUDirect storage protocol that bypasses the storage server’s CPU and memory and sends data from the NVMe drives direct to the GPU’s memory using RDMA. It can run on-premises or in the public cloud.

Charlie Boyle, DGX platform VP at Nvidia, said: “Enterprises across industries are building AI factories to turn their data into intelligence. With solutions built with Nvidia DGX infrastructure and software, Hitachi Vantara customers will be able to create AI Centers of Excellence to turbocharge their generative AI strategies.”

Dell is also using the AI factory term in connection with its Nvidia-based hardware releases.

Nvidia powers Dataloop and NetApp AI solutions with NIM and NeMo

NIM and NeMo Retriever microservices from Nvidia are being integrated by Dataloop and NetApp into their AI products.

These microservices enable retrieval-augmented generation (RAG) in which generally trained GenAI large language models gain access to a proprietary and private user data such as spreadsheets, presentations, Word docs, mails, POs, whitepapers, etc. This allows these models to augment their training data with real-world user information, which we’re told leads to more accurate and responsive conversational interactions.

Dataloop

Israeli startup Dataloop AI, which supplies end-to-end AI project life cycle software, will work with Nvidia to integrate its AI Enterprise  software platform into Dataloop software. AI Enterprise includes Nvidia’s NIM and NeMo Retriever microservices. Dataloop reckons generative AI could automate up to 70 percent of all business activities, but sustaining AI projects can be difficult, with many companies not making it out of the pilot stages. Providing Nvidia’s technology directly on the Dataloop platform means customers can use a unified, orchestrated platform.

The Dataloop Nvidia integration features:

  • Dataloop’s marketplace to provide access to the Nvidia API catalog, offering performance-optimized, foundation models and API endpoints that can be deployed with Nvidia NIM microservices
  • Ready-made AI systems powered by Nvidia AI, such as pipelines for LLM applications, including summarization and chatbots
  • Nvidia NeMo text embedding-based data curation and search capabilities enabled within the indexed datasets managed on Dataloop

NetApp

NetApp is supporting Nvidia’s NeMo Retriever GenAI microservices, which connect large language models (LLMs) to proprietary data for RAG to generate more accurate responses. 

It says customers will now be able to “talk to their data,” accessing proprietary business data on NetApp storage without having to compromise security or privacy. NetApp and Nvidia have co-developed a NeMo RAG capability that can access data stored on NetApp ONTAP systems, both on-premises and in the top three public clouds. 

The data can include spreadsheets, documents, presentations, technical drawings, images, meeting recordings, or even data from their ERP or CRM systems through simple prompts. NetApp says there are more than 500 joint NetApp-Nvidia customers. It has systems built on Nvidia’s DGX BasePOD, certified for DGX SuperPOD, and the new OVX systems storage validation program.

NetApp’s “Talk to your Data” capabilities will be demonstrated at GTC, booth #1616 in the AI Center of Excellence.

Dell building Nvidia-powered AI Factory for business analytics

Dell is building an Nvidia-based AI Factory so customers can use Dell servers, storage, and networking to serve proprietary data to RAG-integrated AI inferencing systems for better analyses.

Update. PowerScale performance numbers and product availabilities added. 19 March 2024.

It claims that by 2025, two thirds of business will have a combination of GenAI and Retrieval-Augmented Generation (RAG) to help with domain specific self-service knowledge discovery. This will, according to an IDC AI report, improve decision efficacy by 50 percent. Dell says it has an AI portfolio that will help customers adopt GenAI better than any other supplier.

Wedbush financial analyst Matt Bryson told subscribers: “The ripple impact that starts with the golden Nvidia chips is now a tidal wave of spending hitting the rest of the tech world for the coming years. We estimate a $1 trillion-plus of AI spending will take place over the next decade as the enterprise and consumer use cases proliferate globally in this Fourth Industrial Revolution.”

Dell wants its AI portfolio to ride this tidal wave, and it has six components:

  • AI Factory
  • GenAI with Nvidia and RAG
  • GenAI model training
  • PowerEdge servers with Nvidia GPUs
  • PowerScale storage for Nvidia’s SuperPOD systems
  • Dell data lakehouse

These are buttressed by Dell’s professional services organization and supplied through its APEX subscription-based business model.

iPhone sceeen grab from a webinar

The AI Factory is the end-to-end system that includes the other items on the list. It is slated to integrate Dell’s compute, storage, client device, and software capabilities with Nvidia’s AI infrastructure and software suite, underpinned by a high-speed networking fabric. 

The GenAI infrastructure is composed from PowerEdge servers with Nvidia GPUs, PowerScale scale-out filers, and PowerSwitch networking with added RAG, which uses Nvidia microservices such as NeMo. Dell says new data can be rapidly ingested to update the semantic search database. However, as we have noted, ingesting all of an organization’s proprietary data into a semantic search database is a non-trivial exercise.

Dell has reference architectures combining its server, storage, and networking gear with Nvidia GPU servers and software. Its x86 servers include the R760xa, XE8640, and XE9690. The Nvidia GPUs include the L40s and H10, but newer models will be supported too.

The XE9680 uses Nvidia’s H200 GPU plus the newly announced and more powerful air-cooled B100 and the liquid-cooled HGX B200. The HGX H200 has twice the performance and half the TCO of the HGX H100 GPU.

Dell will also support Nvidia’s GB200 superchip with its up to 20x better processing performance and 40x lower TCO for at-scale inferencing, compared to the eight-way HGX H100 GPU. The GB200 is positioned as a real-time inferencing chip, capable of running multi-trillion parameter models.

All this kit can be used, Dell says, both for AI model training and inference. It can use BlueField-3 Ethernet or InfiniBand networking. The PowerEdge 760xa uses Nvidia’s Omniverse OVX 3.0 platform while the XE9680 has Nvidia’s Spectrum-X Ethernet AI fabric.

Dell’s PowerScale storage now has Nvidia DGX SuperPOD validation, the first Ethernet-connected storage to get that qualification, with PowerScale claimed to exceed the SuperPOD benchmark requirements. Dell has not disclosed specific benchmark numbers.

A Dell PowerScale Solutions Overview doc says PowerScale now (PCIe 5-based F710 with OneFS 9.7) has 2x faster steaming writes and reads performance than the previous generation (PCIe 3-based F600 and OneFS 9.4) plus up to 90 percent higher performance/watt and up to 2.5x improvement in high concurrency workloads.

In GPUDirect terms the F600 did 1.26 GBps/node sequential writes and and 5.26 GBps sequential reads. A 2x improvement makes that 2.52 GBps/node read bandwidth and 10.52 GBps/node write bandwidth for the F710. Other systems go much faster. DDN’s AI400X2 Turbo does 37.5 GBps/node when writing and 60 GBps/node when reading

The data lakehouse element of this soup-to-nuts AI portfolio uses Starburst Presto software, Kubernetes-organized lakehouse system software, and scale-out object storage based on Dell’s ECS, ObjectScale or PowerScale storage products.

Availability

  • Dell AI Factory with NVIDIA is available globally through traditional channels and Dell APEX now.
  • Dell PowerEdge XE9680 servers with NVIDIA B200 Tensor Core GPUs, NVIDIA B100 Tensor Core GPUs and NVIDIA H200 Tensor Core GPUs have expected availability later this year. 
  • Dell Generative AI Solutions with NVIDIA – RAG is available globally through traditional channels and Dell APEX now.
  • Dell Generative AI Solutions with NVIDIA – Model Training will be available globally through traditional channels and Dell APEX in April 2024.
  • The Dell Data Lakehouse is now available globally.
  • Dell PowerScale is validated with NVIDIA DGX SuperPOD with DGX H100 and NVIDIA OVX solutions now. 
  • The Dell Implementation Service for RAG is available in select locations starting May 31.
  • Dell infrastructure deployment services for model training is available in select locations starting March 29.

    Bootnote

    The “IDC Futurescape: Worldwide Artificial Intelligence and Automation 2024 predictions” report costs $7,500.

    DDN turbocharges AI400X2 appliance to beef up AI performance

    DDN has lifted the covers off AI400X2 Turbo appliance, new AI hardware with more network ports and memory compared to its AI400X2 predecessor.

    The AI400X2 Turbo is an update on the existing ExaScaler AI400X2 Lustre parallel filesystem storage array. It has twice the number of Nvidia ConnectX 400G network cards and the controller has 30 percent more memory. The updated system can, we’re told, now deliver 75 GBps write bandwidth, 120 GBps read bandwidth and 3 million IOPS, a 33 percent performance improvement over the AI400X2. This can be via GPUDirect or without it.

    James Coomer, DDN senior veep for Products, told B&F: ”NVIDIA GPUDirect Storage allows the IO to bypass the Compute Side CPU when moving data between storage and GPU memory. This results in higher bandwidths, lower latency and lower CPU consumption. It’s just one of a set of optimizations we use to create an optimal datapath between the filesystem and the AI application.”

    DDN has a solid presence amongst Nvidia’s CSPs, who supply GPUs-as-a-Service, counting Bitdeer, Lambda, NAVER, OCI, Scaleway and Vultr as customers for its appliances, with OCI also using its ExaScaler software with NVMesh.

    A spokesperson at DDN told us that Nvidia’s EOS SuperPOD, the largest DGX H100 SuperPOD, uses its AI400X2 storage appliance.

    Coomer explained how DDN optimizes storage for AI workloads: “At DDN, we have 3 primary approaches to make AI applications go faster by reducing IO wait times: 1) optimize exactly for the specific, common AI framework behaviors  (the POSIX call, the number of threads, the IO patterns, etc.); 2) cross-stack optimizations (integrating the storage SW up the stack – e.g. with NVIDIA DGX, with network, with containers); and 3) making the storage itself fast (optimizing backend and HW/SW integration).”

    “Our experience with AI frameworks is that we need to (a) optimize for the map call since that’s often how AI frameworks request data, (b) optimize the data sent to 1 thread, (c) make writes go fast since checkpoints are a major component, (d) leverage client cacheing to reduce unnecessary data transfers, (e) ensure that threading scales well so that additional application IO threads get linearly scaling data volumes, and, (f) [use] Nvidia GPUDirect Storage.”

    The BlueField-3 booster

    DDN wants to push the performance limits further: ”We also are planning to talk more about our adoption of NVIDIA Bluefield-3 DPU as well as fantastic performance (we are pushing over 7 GBps to 1 thread!) with NVIDIA Grace CPUs to enable new efficiencies in the datacenter.” 

    “DDN will be leveraging DPUs both in DDN EXAScaler and DDN Infinia, but also starting to use DPUs in a novel way for much higher gains in performance and overall infrastructure efficiency.”

    “DDN Infinia is a storage platform that is implemented entirely in containers, with different low-level storage functions running in different containers and all those containers working together to form a single scalable service. This architecture allows us to pull out some services from the core storage and execute them in DPUs, even DPUs hosted on [the] compute side. We do this to take advantage of the scaling, additional CPU resources and to change the whole end-to-end datapath, reducing latency and infrastructure for the entire system.”

    VAST Data has recently adopted BlueField-3 DPUs to run its storage controller software.