Penguin Computing has launched a hefty, fast Ceph-based object storage system for analytics and machine learning workloads. And it has a hefty name to match – ‘Penguin Computing DeepData solution with Red Hat Ceph Storage’.
DeepData comprises Penguin servers, Ceph from Red Hat and Seagate’s Exos E 5U84 disk chassis, to build a Ceph cluster. This can hold petabytes of object data and streams it out in parallel from the cluster nodes.
Seagate said it conducted extensive testing with Penguin Computing and Red Hat to optimise DeepData’s storage performance.
The Exos E 5U84 is classed as a JBOD (Just a Bunch of Disks) and contains 84 drives in a 5U rack chassis. It has a 12Gbit/s SAS interface with a maximum of 14.4GB/sec deliverable from a single I/O controller.
A blog by Red Hat Principal Solutions Architect Kyle Bader states: “We were able to achieve a staggering 79.6 GiB/s aggregate throughput from the 10-node Ceph cluster utilised for our testing.” That’s 85.5GB/sec from a disk-based data set composed of 350 million objects.
Each of the cluster configuration’s storage nodes were configured with an E 5U84 equipped with 84 x 16GB disk drives, 12Pib (12.9PB) in total across the cluster. The servers ran Ceph software, and used TLC SSDs to store object metadata; block allocations, checksums, and bucket indexes, and so provide faster object data access. The server hardware is not specified in the blog.
Bader writes: “We combined these [16TB] drives with Ceph’s space-efficient erasure coding to maximise cost efficiency. Extracting every bit of usable capacity is especially important at scale. … We fine-tuned the radosgw, rados erasure coding, and bluestore to work together towards our goals.”