Uncategorized

UnifabriX taking CXL external memory mainstream

January 15, 2025

Israel-based UnifabriX, founded in 2020 by Ronen Hyatt (CEO and chief architect), Danny Volkind (CTO), and Micha Rosling (chief business officer), has taken in $11 million in seed funding to develop Smart Memory Fabric systems based around CXL external memory sharing and pooling technology. The intention is to sidestep the memory capacity limitations of individual CPU and GPU server systems by connecting external memory pools using the CXL (Computer Express Link) scheme, which is based on the PCIe cabling standard.

UnifabriX and Panmnesia are two of the most active CXL-focused startups. We looked at Panmnesia yesterday and now turn our attention to UnifabriX.

It had developed a Smart Memory Node with 32 TB of DDR5 DRAM in a 2RU chassis by April 2023, and now has its MAX (Memory Accelerator) composable memory device based on UnifabriX software and semiconductor IP.

MAX provides a software-defined memory fabric pool, featuring adaptive memory sharing, and using CXL and UALink cabling and concepts, several of which are mentioned in the slide above. We’ll look at the system-level architecture and then try to make sense of the cabling spaghetti.

Hyatt talked about this slide: “On top of our FabriX Memory OS, which is a hardened Linux … we have a stream processor that can manipulate the stream of data and the stream of protocols as they come into the memory pool. And this is programmable hardware. You can think of it like the P4 concept that grew in switches and internet switches where you can parse the data as it goes on the fly and edit the protocol messages as they go in and out.

“So you see here the frontend ports, the six frontend ports go to the host. Today there are CXL 1.1 and 2.0. We have deck and fabric ports and we accelerated the link there to 112G, much faster than CXL supports today. This is NVLink 4-equivalent in terms of speed and we are working on prototyping 224G, which is the equivalent of NVLink 5. Yes, it’s the bandwidth. We wanted to get the highest bandwidth possible on the backend side, on the fabric, when you connect multiple MAX appliances, one to each other.”

CXL cabling situation

The PCIe, CXL, and UALink situation is complex. We should note that there are five CXL standard generations between CXL 1 and CXL 3.1, with also a sixth, CXL 3.2, now available. This adds optimized memory device monitoring and management, extended security, performance monitoring, and is backwards-compatible with prior CXL specifications.

Hyatt tells us: “PCIe was originally built to live inside a platform, serving as a short-distance interconnect superseding PCI, between a CPU and peripheral devices, therefore it does not have a developed ecosystem of cabling. Larger-scale use cases of PCIe emerged only later, with ‘PCIe Fabrics’ that pooled and disaggregated devices such as NVMe storage, NICs, and GPUs.

“Those use cases did not require a lot of bandwidth, and therefore were comfortable with utilizing narrow x4 switch ports and x4 SFF-8644 (mini-SAS) cabling. A few examples here and here.

“The emergence of CXL over PCIe Gen 5 created a new demand for high-performance PCIe cabling that is capable of delivering much higher bandwidth for memory transactions. Since PCIe did not have such solutions ready, the market found interim solutions by utilizing cabling systems from the Ethernet domain, such as:

QSFP-DD MSA (x8) – a denser form factor of QSFP, originally created for Ethernet, Fibre Channel, InfiniBand and SONET/SDH. Some people used it (and still use it today) for PCIe x8 connections. See here.
CDFP MSA (x16) – originally developed for 400G Ethernet (16 x 25G lanes), but later certified de-facto for PCIe Gen 5. See here and here.

“Today, the PCIe ecosystem is aligning around the OSFP MSA cabling system, with OSFP (x8) and its denser variant OSFP-XD (x16) that both support the latest signaling rate of 224G PAM4 per lane (for example, 8 x 200G = 1.6 Tbps Ethernet), and are therefore also compatible with PCIe Gen 5/CXL 1.1, 2.0 (32G NRZ), PCIe Gen 6/CXL 3.x (64G PAM4), and PCIe Gen 7/CXL 4.x (128G PAM4). i.e. this OSFP cabling system is future-proof for at least two generations ahead in the PCIe domain. It is also ready for UALink that reuses Ethernet IO at the electrical level. One cable to rule them all.”

Nvidia showed a way forward here, with Hyatt explaining: “It takes a lot of market education to bring memory fabrics into the datacenter. Nvidia jumped in to help when it introduced the DGX GH200 system with its NVLink memory fabric, creating a large, disaggregated 144 TB pool of memory. CXL and UALink are the open comparables of NVLink. They all support native load/store memory semantics.

“Nvidia taught the world that memory fabrics (by NVLink) are superior to networks (by InfiniBand). We tend to agree.”

He said: “UnifabriX developed a Fabric Manager (FM) compliant with CXL 3.2 FM APIs including support for DCD (Dynamic Capacity Device), i.e. it is capable of provisioning and de-provisioning memory dynamically, on-demand, using standard, open, CXL APIs. I haven’t seen another DCD Fabric Manager out there, so this may be one of the first FMs that you would encounter that actually does the work.”

There are a couple of other points. Hyatt said: “We are able to mix and match CXL ports and UALink ports, meaning we can provide memory on demand to both CPUs and to GPUs. The UALink connector is based on Ethernet IO, so the same connector, the same OSFP and OSFP XD, is going to be used for both CXL and UALink. You just change the personality of the port.”

Working silicon

The company demonstrated its memory pool dynamically altering in size and composed out to host processors on demand and then returned to the pool. UnifabriX is already earning revenue, with deployments in the data analytics, high-performance computing, public and private cloud areas.

Hyatt said: “We have a few hyperscaler customers [where] the system is there running with the real workloads currently on Emerald Rapids platform and shifting soon towards Granite Rapids and Turin systems with AMD.”

“We have quite a few new customers in different segments of the market, not just the hyperscalers and the national labs. We have drug discovery companies, DNA sequencing. Turns out there are a lot of use cases that sit under the HPC umbrella where people need a lot of memory. Sometimes they need bandwidth, sometimes they need capacity. But having the ability to grow memory on demand and doing it dynamically brings a lot of value, not just on the TCO side.”

He explained: “You see the cloud, the public cloud, national labs. We started with the national labs and animation studios. There’s a lot of digital assets and you need to do rendering and processing, and they’re all working with fast storage systems these days, but they’re not fast enough for what they need. So having a memory pool in between helps to accelerate the whole process.”

Processing in memory

Hyatt talked about MAX being able to do some processing: “It has processing capabilities, which we found very useful for HPC. So we have processing-in-memory or near-memory capabilities. This works great for sparse memory models, for instance, in HPC where you have very large models that fit into petabytes and you need to abstract the memory address space. So you actually expose a huge address space externally.

“But internally you do the mapping. And this is part of the memory processing that we do here. And this is one example. We have an APU, which is an application processing unit which is exposed to the customer, where the customer can run their own code over containers. So if they want to do something on the memory, like, for instance, checking for malicious code, checking for some abnormal patterns within the memory, this is something that they can run internally. We provide that capability.”

Go to market

How does UnifabriX go to market? Hyatt said: “Currently, we work directly with end customers. And the reason we do it is because this is part of the product definition, like getting the feedback of what customers need. So you don’t want the channel in between because then you lose a lot of the feedback.

“But we are already engaged with partners. Some of them are platform OEMs that want to have a memory pool as part of their product portfolio. So think about all the big guys that have storage systems and think of a memory pool as a storage server, but it works on memory. So most of the paradigms and the semantics that go with storage would be replicated to the memory world and we are working with them.

“And on top of that we have several channels, some are specialized for HPC. There are OEM vendors that build unique servers and unique appliances for the HPC market. And HPC is really interested in having the memory bandwidth that CXL provides. There are several system integrators that build the whole racks and ship systems with GPUs and with a lot of compute power. And they actually pack together GPUs, servers, storage, and memory together, and ship it as a rack.”

UnifabriX is planning a new funding round in the second half of 2025.

The fab process side is developing, with Hyatt saying: “Currently, our silicon is seven nanometer and we plan to have a five nanometer TSMC silicon later, in 2026, early 2027.” This aligns with PCIe Gen 6, as Hyatt pointed out: “CXL itself is moving from PCIe Gen 5 to Gen 6, so we have to upgrade the process. Gen 6 comes with mixed signals … that needs five nanometer to be efficient on power.”

We’ll follow up with an article looking at UnifabriX’s MAX device.

Bootnote

QSFP – Quad Small Form-factor Pluggable standard referring to transceivers for optical fiber or copper cabling, and providing speeds four times their corresponding SFP (Small Form-factor Pluggable) standard. The QSFP28 variant was published in 2014 and allowed speeds up to 100 Gbps while the QSFP56 variant was standardized in 2019, doubling the top speeds to 200 Gbps. A larger variant Octal Small Format Pluggable (OSFP) had products released in 2022 capable of 800 Gbps links between network equipment.

OSFP MSA – Octal Small Form Factor Pluggable (OSFP) Multi Source Agreement (MSA). The OSFP (x8) and its denser OSFP-XD (x16) variants both support the latest signaling rate of 224G PAM4 per lane (for example 8 x 200G = 1.6 Tbps Ethernet). They are compatible with PCIe Gen5 / CXL 1.1, 2.0 (32G NRZ), PCIe Gen6 / CXL 3.x (64G PAM4) and PCIe Gen7 / CXL 4.x (128G PAM4). This OSFP cabling system is future-proof for 2 generations ahead in the PCIe domain. It is also ready for UALink that reuses Ethernet IO at the electrical level.

CDFP – CDFP is short for 400 (CD in Roman numerals) Form-factor Pluggable, and designed to provide a low cost, high density 400 Gigabit Ethernet connection.

UnifabriX taking CXL external memory mainstream

CXL cabling situation

Working silicon

Processing in memory

Go to market

Bootnote

ABOUT US

FOLLOW US

XenData Z20 slings media files from SMB to cloud and back

Arcitecta rolls out Mediaflux Real-Time to streamline global media workflows

Storage news ticker – March 31