Cisco has validated Minio as an object store for its scale-out Data Intelligence Platform. Minio joins Cloudian, Scality, Ceph and SwiftStack as approved suppliers.
Cisco launched the Data Intelligence Platform for business analytics, big data and AI workloads in June last year. The system concept includes two storage tiers, a Hadoop data lake layer for hot data and an HDFS/object store for warm data.
Each of the three elements in the diagram above can scale out independently.
Cisco has published a Solution Overview document describing how Minio fits in as a high-performance object store.
Cisco and MinIO
Cisco states that the MinIO system is fast, with a benchmark result of 12.8GB/sec aggregate read throughput quoted/ The company sais there is “a clear case for the Hadoop stack to operationalize data tiering from the Hadoop Distributed File System (HDFS) to MinIO, offering an attractive alternative for today’s critical workloads. The current design offers a proven deployment model for enterprise Hadoop while enabling a second, highly economical second tier of warm storage.”
For Cisco, the MinIO tier complements the Hadoop tier. That’s the same role played by the Ceph, Cloudian, Scality and SwiftStack object studs when the platform was launched in June. Yet MinIO has demonstrated that its software is faster than Hadoop.
Data Intelligence Platform buyers that use a single MinIO storage tier t could avoid buying the Data Lake/Hadoop part of the system and save a lot of rackspace and money too.
For Cisco the hardware is based around three types of its UCS server. The AI/compute engine box uses UCS C240 M5 (2-socket, 2U, Xeon SP, Optane support, 26 x 2.5-inch drives) and C480 M5 ML (8 x NVIDIA GPUs, 4U, Xeon SP, Optane support, 24 x 2.5-inch drives) servers.
The apps running in the compute engine can access a Hadoop data lake tier, again using C240 M5 servers or an object storage tier using S3260 servers. These come in a 4U box holding up to 56 x 3.5-inch drives with dual Xeon SP or E5-2600 v4 controllers.
A hardware picture from a Cisco reference architecture document makes things clearer:
According to Cisco, the Data Intelligence Platform provides:
- Extremely fast ingestion and engineering of data performed at the data lake
- An AI computing farm, allowing different types of AI frameworks and computing resources (GPU,
- CPU, and FPGA) to work on this data for additional analytics processing
- A storage tier, allowing the gradual retirement of data that has been worked on to a dense storage system with a lower cost per terabyte, reducing TCO.