MinIO is faster than Hadoop. Farewell, NAS and SAN?

Object storage software house MinIO has demonstrated its storage can run up to 93 per cent faster than a Hadoop system.

In its latest benchmarks, published this week, MinIO was faster than a Hadoop file system (HDFS) configuration. In the test set-up both systems were run in the Amazon public cloud. There was an initial data generation procedure and then three Hadoop process execution times were examined – Sort, Terasort and Wordcount – first using HDFS and then MinIO software.

MinIO was slower than Hadoop running the data generation process but faster with the Sort, Terasort and Wordcount tests. It was also faster overall, based on summing the time taken for the data generation and test runs.

In a blog post announcing the results, MinIO CMO Jonathan Symonds exclaimed: “Basically, modern, high performance object storage has the flexibility, scalability, price, APIs, and performance. With the exception of a few corner cases, object storage will win all of the on-prem workloads.”

In other words, “Goodbye, NAS and SAN.”

Here’s the table of test results:

The benchmarks are charted below:

MinIO’s best result was 93 per cent faster at the Sort run.

Summing the MinIO and HDFS data generation times to the test run times we see that MinIO (3,700 secs) was overall faster than HDFS (4,337 secs.) You can check out the Minio HDFS benchmark details.

Last month the company published benchmarks showing its S3-compatible open source software was faster than Presto and Apache Spark.

Background

Hadoop systems rely on multiple compute + storage nodes, each handling a subset of the overall data set. It involves three copies of the raw data, for reliability, and large numbers of nodes as the data sets increase in size. This means hundreds of servers, potentially.

An object storage system is inherently more reliable at holding data than a Hadoop system and does not need to make three copies. The amount of compute resource to run an analysis can be tailored to the workload instead of being drawn from the HDFS nodes.

MinIO said analytics using its object storage software can typically run on fewer servers than an HDFS system, and needs less disk or SSD capacity to hold the data. This saves time and money. 

MinIO’s software can run on-premises or in the public cloud.