Unstructured data management is becoming a major focus for startups and supplier product development is bound to reflect competitive forces and a quest for differentiation.
IDC forecasts that by 2025, the global datasphere will grow to 163ZB (i.e., a trillion gigabytes). That’s 10 times the 16.1ZB of data generated in 2016. According to Forbes, Gartner analysts estimate around 80 per cent of enterprise data today is unstructured, meaning not held in structured databases.
It is growing fast as the machine-generated component, from online sales and other transactions for example, increases. The unstructured data market is vast. Rubrik reckons just the Copy Data Management part of it, comprising data protection and recovery, archiving, and replication, is worth $48 billion.
The best known players are Cohesity, Rubrik and Komprise. But there is still all to play for and a bunch of venture-backed “fast followers ” have emerged in their wake. These include Igneous Systems, Elastifie and WekaIO.
Igneous Systems has launched a trio of products to address the Unstructured Data Management as-a-Service (UDMaaS) market.
The Seattle-based company was founded in 2013 and has raised $41.6m to date in early stage venture funding. In the early days, it set out to build object-storing nanoservers – disk drives with attached ARM processors. From there it evolved into a hybrid cloud storage vendor and now it is pivoting again into unstructured data management.
In August 2018, the company released a set of NAS integrations with Pure Storage, Qumulo and Isilon and these underpin its new services.
So how do you pronounce ‘UDMaaS’?
The SaaS offerings are DataDiscovery, DataFlow and DataProtect, which can use NAS and object storage systems on- and off-premises as source repositories.
Igneous’ announcement stated:
- DataDiscovery is underpinned by an Igneous InfiniteIndex and provides a unified and intelligent system of record for all file and object data. It creates aggregated views of the unstructured data, utilising AdaptiveScan and InfiniteIndex to answer specific questions like “How much?,” “How old?,” “Where is it?,” and “How fast is it growing?”
- DataProtect – Backup and Archive for Primary NAS, Object, and Cloud. This uses the high performance movement of Intellimove (multi-threaded, latency-aware data movement) with API integration for NetApp, Isilon, Qumulo and Pure Flashblade. Metadata stored in the InfiniteIndex. AdaptiveScan discovers change.
- DataFlow provides self-service and automated dataset movement for analysis, simulations and collaboration, applicable to any operation requiring intelligent data movement at scale, including machine learning pipelines, IoT data workflows and BioInformatics pipelines. It’s driven by end-users such as scientists and engineers
Igneous says it “delivers superior data visibility, efficient data protection and automated data movement to data-centric organizations … regardless of where datasets reside.” That is “unstructured” data sets, meaning files and objects, and at scale, meaning tens of petabytes.
It provides “continuous data visibility, search and classification; backup, archive and disaster recovery of all files and objects; and automated dataset movement based on data lifecycle requirements …on a single ’as-a-Service’ solution, for all unstructured data.”
I have made a schematic diagram that shows these three services:
Christian Smith, VP of Products at Igneous, provides a canned quote; “Our customers get the benefits of visibility, classification, analysis, protection and data mobility without having to rip and replace their existing storage investments. And our unprecedented performance and support for any NAS or object interface and private or public cloud service gives agility and choice back to our customers.”
Noted that “unprecedented performance [and] support for any NAS or object interface and private or public cloud service”? We’ll take these things literally.
Igneous says its products are capable of delivering data scan-and-compare rates of over “419,000 files/second per job, data-movement rates of over 21,000 files/second per job, [and] throughput starting at 1.2GB/sec for a basic configuration.”
Our Q&A with Igneous
To find out more we asked Igneous some questions about its three UDMaaS services and spokesperson Mike Bradshaw provided the answers.
B&F: Where do these services run?
Igneous: Services run on-premises, or in the cloud.
B&F: Are agents used?
Igneous: No – this is all agentless. Data is moved and scanned through native file or object interfaces (NFS, SMB, S3, etc.) utilising API integration where available.
B&F: What will drive customers to buy these services?
Igneous: UDMaaS customers have fast growing data or a density that caused legacy data management models to break down. Igneous customers were facing the following problems:
- DataProtect: The data protection in their environment used legacy technology, was cumbersome and often did not meet SLAs
- DataFlow: Having built a pipeline, they got stuck trying to move data fast between different data platforms
- DataDiscovery: Didn’t have a handle on the data they already had
Increasingly, unstructured data is targeted to the public cloud as part of a workflow orchestration or automating a protection/archiving solution.
B&F: Just unstructured data?
Igneous: Igneous does not protect any structured data sources or VM’s and has no plans to do so. Our future is focused on unstructured data: where it lives, where it’s going to live, and how to manage it.
B&F: Can you provide a performance example?
Igneous: A recent benchmark with Pure Flashblade delivered 226,517 files/sec in one job for an export with 1B files and a 10 per cent change rate while source side system latency stayed under 5ms.
This performance continues to scale with additional jobs.
B&F: What characterises Intellimove?
Igneous: Data movement is about keeping networks full; Igneous IntelliMove is about using concurrency and intelligent handling of all file sizes to keep networks full.
B&F: What’s your view of your competition?
Igneous: [We don’t] see many competing in our space of UDMaaS. These customers recognize best of breed in UDMaaS for performance, scale, and as-a-Service delivery. Very few comparisons on performance exist publicly.
However, a few vendors have been trying to claim file capabilities for data protection. Most recently Rubrik did a webinar where it talked about protection performance with PureFlashblade.
The best rate they talked about was 40,000 files/sec for SCAN only, no mention of compare rate to determine change. The largest fileset tested had 3.125M files in a balanced directory structure
They mentioned the impact to Pure Flashblade was 10ms – this is a huge impact for an all-flash array.
They talked about breaking up large exports with hundreds of millions of files into more exports, never gave numbers for 750 million files. Igneous does not believe data protection should drive application requirements, and all customers have that “problem” export that has hundreds of millions of files.
We believe this shows their market is one where structured data protection is the dominant priority and the customer has only “some” files.
B&F: What about Cohesity?
Igneous: They appear to be going after protecting structured data, and replacing NetApp as a primary NAS data source. If they are successful in the latter, Igneous will treat them like any other data source for Igneous DataDiscovery, Igneous DataFlow and Igneous DataProtect.
So far so very good. With the DataDiscovery metadata farming facility and its IntelliIndex metadata store, Igneous has built a foundation to add many more data services.
The rationale is that customers have so many heterogeneous data stores that no one supplier’s data management system can manage everything. Hence a separate supplier’s dedicated product is needed. This is similar to the Hammerspace view of the world .
We also think Komprise’s file lifecycle data management metadata handling capability will bring it into play in the same market, and IBM will enter it with Spectrum Discovery.