Panmnesia expands GPU memory pool with CXL

South Korea’s Panmnesia has shown CXL memory sharing with GPUs for AI workloads at the OCP Global Summit.

CXL (Compute Express Link) is a PCIe bus extension technology providing a way to expand memory outside a server or workstation and, with the v3.0 and 3.1 standards, sharing a pool of external memory between servers and workstations across scale-out CXL fabrics. Panmnesia is developing a v3.1 CXL switch chip and SoC (System on Chip), and claims that its CXL 3.1 IP has latency below 100 nanoseconds. It has demonstrated this technology at the OCP Summit with its CXL-GPU concept, using GPU compute nodes sharing external CXL-accessed memory, and suggesting this can be used for GenAI training and inferencing work. 

A spokesperson stated that Panmnesia provided “a blueprint for practical adoption of CXL technologies in AI datacenters.” A conceptual diagram illustrates the idea:

B&F conceptual Panmnesia diagram
B&F conceptual diagram

Currently, if an AI training or inference workload that requires GPU computation is memory-bound, then one answer is to endure the slowness as data is transferred to the memory from storage. Another answer is to add another GPU with high-bandwidth memory (HBM) to a GPU server, add more DRAM if that’s possible, or even add another GPU server. Panmnesia enables compute and memory disaggregation so that each can be scaled up or down separately, with potential sharing of terabytes of memory. It says its CXL-enabled AI cluster can “reduce the inference latency by about six times compared to the existing storage/RDMA (Remote Direct Memory Access)-based system.” 

Further, this CXL memory sharing can be configured dynamically, with GPU and memory resources being composed for specific workloads. 

The company said customers “can reduce the construction cost of AI datacenters since it is possible to expand the memory capacity by simply equipping additional memory and CXL devices for memory expansion, without purchasing unnecessary server components.”

There was interest in its technology from potential customers. The spokesperson said: “Executives and employees from global IT companies attending the OCP Global Summit such as Google, Microsoft, Supermicro, and Gigabyte showed significant interest in our CXL 3.1 solutions. … Several server manufacturers expressed strong demand for Panmnesia’s CXL 3.1 Switch SoC, which is planned for availability in the second half of next year.”

Check out a video showing a RAG workload running on a Panmnesia shared memory system here.