WEKA moves file data at 2TB/sec on Oracle’s cloud

Weka
Weka video

WEKA says it has moved file data on Oracle’s cloud at close to 2TB/sec to servers using its scale-out, parallel file system software.

The Oracle Cloud Infrastructure (OCI) public cloud provides bare-metal servers for compute shapes. A shape is a template specifying the type of CPU, number of CPU cores, RAM, and networking speed. Oracle and WEKA validated WEKA’s performance when running inside the OCI.

Oracle’s Principal Solutions Architect, Pinkesh Valdria, claimed in a blog late last week: “The performance that WEKA and OCI can provide to customer workloads is fantastic… this combination of performance and scale along with the elasticity that OCI provides, allows you to successfully host modern EDA, life sciences, financial analysis, and even more traditional enterprise workloads on OCI.”

WEKA says of Oracle’s cloud: “Oracle Cloud Infrastructure (OCI) is a top-tier Hyperscale cloud that provides XaaS compute and application services to Oracle customers, and offers a lot more than you might expect, including AI/ML and GPU workloads.”

The validation test involved runs using 80 or 373 x bare metal (BM.Optimized3.36) compute shapes with 36 cores, 512GB of RAM, 3.8TB of local NVMe SSD, 100Gbits RoCE (RDMA over Converged Ethernet), and 2 x 50Gbit/s network links. WEKA was configured to use six cores leaving 30 for running the OS and application software on the same server.

Test data was generated using Flexible IO (Fio) with 1MB blocks for large block workloads and 4KB ones for small block workloads. Here is a table of the results followed by a chart:

WEKA test results
WEKA performance chart

No doubt there were intermediate server number configurations, but WEKA and Oracle have highlighted the 80 and 373-server results. A near 2TB/sec throughput is certainly a hero number.

One possible comparison is with WEKA’s Nvidia GPUDirect performance of 97.9GB/s of throughput to 16 NVIDIA A100 GPUs and 113.1GB/sec to an Nvidia DGX-2 server. Presumably WEKA could deliver higher bandwidth if more Nvidia GPUs were targeted so requiring more WEKA nodes.

Analyst house ESG has validated WEKA performance on a number of benchmarks but doesn’t specifically call out IOPS and GB/sec bandwidth. Similarly the many WEKA STAC benchmarks typically don’t identify general OPS or throughput numbers.

WEKA’s own stats for its AWS performance calls out large file (1MB) read performance of over 100GB/sec and small file (4KB) read performance of 5 million-plus IOPS across 16 EC2 instances. [Conveniently for Oracle’s marketers, their lower, 80-server, 5.5 million read IOPS number exceeds the AWS 5 million-plus IOPS result. It wouldn’t look so good if the OCI result was lower than the AWS one.]

These AWS-WEKA numbers are a long way short of the 80 and 373 instances used in the OCI test runs but that’s no reason to think that WEKA’s AWS performance is less than its OCI performance.

Our thinking is that WEKA could scale to the same OCI levels of performance on AWS if it increased the number of compute instances; it is scale-out software, after all. On a linear scale it would need roughly 56 AWS compute instances, of the type used in the 16-instance test, to reach the OCI 373-server result of 17.4 million IOPS.

There aren’t equivalent data sheets for WEKA running in the Azure or GCP clouds that we could find in WEKA’s resources web page.