The Datadobi StorageMAP v7.2 release has added metadata and reporting facilities so that it can better lower costs, help customers get greener, and track a wider range of object storage data.
StorageMAP software scans and lists a customer’s file and object storage estates, both on-premises and in the public cloud, and can identify orphaned SMB protocol data. There are new archiving capabilities that allow customers to identify and relocate old or inactive data to archive storage, freeing up primary data stores on flash or disk. Datadobi cites a Gartner study, saying: “By 2028, over 70 percent of I&O [infrastructure and operations] leaders will implement hybrid cloud storage strategies, a significant increase from just 30 percent last year.” The implication is that managing a hybrid on-prem/public cloud file and object data estate efficiently will become more important.

CRO Michael Jack stated: “Unstructured data continues to grow at an unprecedented pace, yet many I&O leaders still struggle to gain appropriate levels of visibility and control over their environments.”
StorageMAP has a metadata scanning engine (mDSE) with parallelized multi-threaded operations, a metadata query language (mDQL), an unstructured data workflow engine (uDWE) and an unstructured data mobility engine (uDME) to move between storage tiers and locations. It works across on-premises and public cloud environments, converts SMB and NFS files to S3 objects, and is deployed as a Linux VM.
Datadobi file scanning uses multi-threading to process directory structures in parallel. As object storage, with its flat address space, does not have any nested file/directory structure, StorageMAP’s scanning engine splits the object namespace into subsets, scanning them in parallel to lower scan times.
New metadata facilities in v7.2 let customers track costs, carbon emissions, and other StorageMAP tags with greater precision.
v7.2 introduces automated discovery for Dell ECS and NetApp StorageGRID object stores, enabling customers to identify their tenants and their associated S3 buckets. It extends its orphan data functionality to NFS environments so that it can identify and report on data not currently owned by any active employee. This feature applies to all data accessed over the SMB and NFS protocols.
The new software can find and classify data suitable for GenAI processing, “enabling businesses to feed data lakes with relevant, high-quality datasets” for use in retrieval-augmented generation (RAG). An enhanced licensing model lets customers scale their use of StorageMAP’s features according to their specific requirements.
Datadobi told us that, today, data is found based on customer-supplied criteria specific to intrinsic metadata and assigned enriched metadata such as StorageMAP tags. This means that customers can do searches on files satisfying certain metadata properties. Metadata such as last access, owner (active or deactivated), path of the file, type of file, etc. This is what we call “intrinsic metadata”, i.e. metadata that comes with the file (or object) in the storage system. These sources of metadata help to identify which data is not relevant for feeding into AI training or for use in RAG-based queries.
In the future StorageMAP will employ algorithms that examine patterns in metadata that can point to potentially interesting files/objects that are candidates for deep scanning that will analyze the content. Output from that analysis can result in tags being assigned. It can also serve as a guide for files/objects to copy to storage, feeding GenAI, Agentic AI, or other next-generation applications. The core problem with deep scanning massive numbers of files/objects is the time required to conduct the scans. Therefore, creative methods are required to containerize the deep scans and focus them on subsets of data that will allow meaningful insight to be determined in a timely fashion. Solutions that execute brute-force scans of billions of files/objects will literally run for years, which is not tenable.
Bootnote
The Gartner study is titled “Modernize File Storage Data Services with Hybrid Cloud.” The report has three recommendations:
- Implement hybrid cloud data services by leveraging public cloud for disaster recovery, burst for capacity, burst for processing and storage standardization.
- Build a three-year plan to integrate your unstructured file data with the public cloud infrastructure as a service (IaaS) to match your objectives, SLAs, and cost constraints.
- Choose a hybrid cloud file provider by its ability to deliver additional value-add services, such as data mobility, data analytics, cyber resilience, life cycle management, and global access.