Kioxia and WEKA help calculate Pi to record 300 trillion decimal places

The Linus Tech Tips team has calculated Pi, the ratio of a circle’s circumference to its diameter, to a record 300 trillion decimal places with the help of AMD, Micron, Gigabyte, Kioxia, and WEKA.

Linus Sebastian and Jake Tivy of the Canadian YouTube channel produced a video about their Guinness Book of Records-qualified result. 

From left, Linus Sebastian and Jake Tivy

The setup involved a cluster of nine Gigabyte storage servers and a single Gigabyte compute node, all running the Ubuntu 22.04.5 LTS operating system. This was because the memory-intensive calculation was done using the Y-cruncher application, which uses external storage as RAM swap space. Eight storage servers were needed to provide the storage capacity needed.

The storage servers were 1 RU Gigabyte R183-Z95-AAD1 systems with 2 x AMD EPYC 9374F CPUs and around 1 TB DDR5 ECC memory. They were fitted with dual ConnectX-6 200 GbE network cards, giving each one 400 GbE bandwidth. Kioxia NVMe CM6 and CD6 series SSDs were used for storage. Overall, there was a total of 2.2 PB of storage spread across 32 x 30.72 TB CM6 SSDs and 80 x 15.36 TB CD6 SSDs for 245 TB per server.

The compute node was a Gigabyte R283-Z96-AAE1 with dual 96-core EPYC 9684X 3D V-Cache CPUs and 196 threads/CPU. There was 3 TB of DRAM, made up from 24 x 128 GB sticks of Micron ECC DDR5 5600MT/s CL46 memory. It was equipped with 4 x Nvidia ConnectX-7 200 GbE network cards with 2 x 200 GbE ports/card and 16 x PCIe Gen 5 bandwidth per slot capable of approximately 64 GBps bidirectional throughput. There was a total of 1.6 Tbps of throughput, around 100 GBps to each 96-core CPU.

WEKA parallel access file system software was used, providing one file system for all nine servers to Y-cruncher. The WEKA software was tricked into thinking each server was two servers so that the most space-efficient data striping algorithm could be used. Each chunk of data was divided into 16 pieces, with two parity segments – 16+2 equaling 18 – matching the 18 virtual nodes achieved by running two WEKA instances per server.

They needed to limit the amount of data that flowed between the CPUs to avoid latency build-ups from one CPU sending data to the other CPU’s network cards and there was also a limited amount of memory bandwidth.

Avoiding cross CPU memory transfers

Tivy configured two 18 GB WEKA client containers – instances of the WEKA application running on the compute node – to access the storage cluster. Each container had 12 cores assigned to it to match the 3D V-Cache of the Zen 4 CPU die. This has 12 chiplets sharing 32 MB of L3 cache with 64 MB layered above for a 96 MB total. Tivy did not want Y-cruncher’s buffers to spill out of that cache because that would mean more memory copies and more memory bandwidth would be needed.

WEKA’s write throughput was around 70.14 GBps and this was over the network with a file system. Read testing showed up to 122.63 GBps with a 2 ms latency. The system was configured with 4 NUMA nodes per CPU, which Y-cruncher leveraged for better memory locality. When reconfigured to a single NUMA node per socket, bandwidth increased increased to 155.17 GBps.

The individual Kioxia CD6 drive read speeds were in excess of 5 GBps. Tivy set the record for the single fastest client usage, according to WEKA. Tivy said WEKA subsequently broke that record with GPUDirect storage and RDMA.

Tivy says Y-cruncher uses the storage like RAM, as RAM swap space. That’s why so much storage is needed. The actual Y-cruncher output of 300 trillion digits is only about 120 TB compressed. Their system used about 1.5 PB of capacity at peak. Y-cruncher is designed for direct-attached storage without hardware or software RAID, as it implements its own internal redundancy mechanisms.

The Linus Tech Tips Pi project started its run on August 1, 2024, and crashed 12 days later due to a multi-day power outage. The run was restarted and then executed past shorter-term power cuts and air-conditioning failures, meaning the calculation had to stop and restart, to complete 191 days later. At that time, they discovered that the 300 trillionth digit of Pi is 5.

Pi calculation history

Here is a brief history of Pi calculation records:

  • Around 250 BCE: Archimedes approximated π as approximately 22/7 (~3.142857).
  • 5th century CE: Chinese mathematician Zu Chongzhi calculated π to 7 digits (3.1415926), a record that lasted almost 1,000 years.
  • 1596: Ludolph van Ceulen calculated π to 20 digits using Archimedes’ method with polygons of up to 262 sides.
  • 1706: John Machin developed a more efficient arctangent-based formula, calculating π to 100 digits.
  • 1844: Zacharias Dase and Johann Martin Strassmann computed π to 200 digits.
  • By 1873: William Shanks calculated π to 707 digits. Only the first 527 digits were correct due to a computational error.
  • By 1949: The ENIAC computer calculated π to 2,037 digits, the beginning of computer-based π calculations.
  • By 1989: The Chudnovsky brothers calculated π to over 1 billion digits on a supercomputer.
  • 1999: Yasumasa Kanada computed π to 206 billion digits, leveraging the Chudnovsky algorithm and advanced hardware.
  • 2016: 22.4 trillion digits computed by Peter Trueb taking 105.524 days to compute,
  • 2019: Emma Haruka Iwao used Google Cloud to compute π to 31.4 trillion digits (31,415,926,535,897 digits).
  • 2021: Researchers at the University of Applied Sciences of the Grisons, Switzerland, calculated π to 62.8 trillion digits, using a supercomputer, taking 108 days and nine hours.
  • 2022: Google Cloud, again with Iwao, pushed the record to 100 trillion digits, using optimized cloud computing.

 Wikipedia has a more detailed chronology.

Comment

This record could surely be exceeded by altering the Linus Tech Tips configuration to one using GPU servers, with an x86 CPU host, GPUs with HBM, and GPUDirect links to faster and direct-attached (Kioxia CM8 series) SSDs. Is a 1 quadrillion decimal place Pi calculation within reach? Could Dell, HPE, or Supermicro step up with a GPU server? Glory awaits.