IDC brings out AI-Ready Data Storage Infrastructure paper

An IDC paper discussing AI-Ready Data Storage Infrastructure (AI-RDSI) has landed and is being distributed by Hammerspace.

The paper is the first in a 4-part series, with the other parts covering the Voice of the Customer, Competitive Landscape, and Market Size and Forecast.

The AI-RDSI document’s introductory IDC Opinion section says that “less than half of AI pilot projects advance into production.” It declares that “organizations must must approach AI projects from a data-centric perspective.” The authors also say “vendors must be prepared to operate within an ecosystem of partners and competitors to provide a full-stack AI infrastructure offering.”

An AI-RDSI is defined as:

The IDC authors talk about data logistics, the journey of data from creation or ingest throughout an organization’s data processing environment, with a diagram illustrating the concept:

The AI system requires a single source of data truth, by either “having a Copy Data Management capability, or single unified metadata environment across all storage.”

There are five primary attributes of such a data infrastructure:

  • Performance – data throughput, IOPS, latency, network bandwidth and performance-intensive computing demands, noting “Achieving high throughput may require the use of technologies such as parallel file systems or parallel NFS (pNFS).”
  • Scale
  • Service levels – with 99.999 percent cited as a common requirement.
  • Data logistics
  • Data trust 

There is much more detail as the analysts dive into each of these sections, and talk about an AI-RDSI ontology and software taxonomy. They wrap things up by providing advice to IT suppliers and IT buyers. A final synopsis declares that  “Far too many AI projects fail. …we believe insufficient attention to the storage infrastructure, resulting in projects stymied by data silos, poor data quality, and insufficient storage performance.”

IDC Research VP, Infrastructure Software Platforms, Worldwide Infrastructure Research, Phil Goodwin, states at the end: “This study helps IT suppliers to define AI-ready data storage product requirements and IT buyers to identify the appropriate solutions for their needs.”

Hammerspace liked the content of this IDC primary research paper so much they obtained a reprint license.

Comment

We note the IDC paper ignores fast access object storage using flash hardware and GPU Direct for Objects – see Cloudian, Scality and MinIO – positioning object stores as suitable for moderate or lower-performance needs:

It declares that data availability is important:

With 1 PB of data and 99.999 percent  availability we calculate that 0.001 percent  of the data is at risk of being unavailable; 0.001 percent  of 1 PB = 0.00001 x 1 x 1015 = 1 x 1010 bytes or 10GB.

In the object storage world Scality’s RING and Cloudian’s Hyperstore offer 14 nines (99.999999999999 percent) data durability and availability, meaning 1KB will be unavailable, just 0.00001 percent  of 10 GB, which is better.