Datadobi boosts StorageMAP for smarter data management

By

-

March 27, 2025

The Datadobi StorageMAP v7.2 release has added metadata and reporting facilities so that it can better lower costs, help customers get greener, and track a wider range of object storage data.

StorageMAP software scans and lists a customer’s file and object storage estates, both on-premises and in the public cloud, and can identify orphaned SMB protocol data. There are new archiving capabilities that allow customers to identify and relocate old or inactive data to archive storage, freeing up primary data stores on flash or disk. Datadobi cites a Gartner study, saying: “By 2028, over 70 percent of I&O [infrastructure and operations] leaders will implement hybrid cloud storage strategies, a significant increase from just 30 percent last year.” The implication is that managing a hybrid on-prem/public cloud file and object data estate efficiently will become more important.

CRO Michael Jack stated: “Unstructured data continues to grow at an unprecedented pace, yet many I&O leaders still struggle to gain appropriate levels of visibility and control over their environments.”

StorageMAP has a metadata scanning engine (mDSE) with parallelized multi-threaded operations, a metadata query language (mDQL), an unstructured data workflow engine (uDWE) and an unstructured data mobility engine (uDME) to move between storage tiers and locations. It works across on-premises and public cloud environments, converts SMB and NFS files to S3 objects, and is deployed as a Linux VM.

Datadobi file scanning uses multi-threading to process directory structures in parallel. As object storage, with its flat address space, does not have any nested file/directory structure, StorageMAP’s scanning engine splits the object namespace into subsets, scanning them in parallel to lower scan times.

New metadata facilities in v7.2 let customers track costs, carbon emissions, and other StorageMAP tags with greater precision.

v7.2 introduces automated discovery for Dell ECS and NetApp StorageGRID object stores, enabling customers to identify their tenants and their associated S3 buckets. It extends its orphan data functionality to NFS environments so that it can identify and report on data not currently owned by any active employee. This feature applies to all data accessed over the SMB and NFS protocols.

The new software can find and classify data suitable for GenAI processing, “enabling businesses to feed data lakes with relevant, high-quality datasets” for use in retrieval-augmented generation (RAG). An enhanced licensing model lets customers scale their use of StorageMAP’s features according to their specific requirements.

Datadobi told us that, today, data is found based on customer-supplied criteria specific to intrinsic metadata and assigned enriched metadata such as StorageMAP tags. This means that customers can do searches on files satisfying certain metadata properties. Metadata such as last access, owner (active or deactivated), path of the file, type of file, etc. This is what we call “intrinsic metadata”, i.e. metadata that comes with the file (or object) in the storage system. These sources of metadata help to identify which data is not relevant for feeding into AI training or for use in RAG-based queries.

In the future StorageMAP will employ algorithms that examine patterns in metadata that can point to potentially interesting files/objects that are candidates for deep scanning that will analyze the content. Output from that analysis can result in tags being assigned. It can also serve as a guide for files/objects to copy to storage, feeding GenAI, Agentic AI, or other next-generation applications. The core problem with deep scanning massive numbers of files/objects is the time required to conduct the scans. Therefore, creative methods are required to containerize the deep scans and focus them on subsets of data that will allow meaningful insight to be determined in a timely fashion. Solutions that execute brute-force scans of billions of files/objects will literally run for years, which is not tenable.

Bootnote

The Gartner study is titled “Modernize File Storage Data Services with Hybrid Cloud.” The report has three recommendations:

Implement hybrid cloud data services by leveraging public cloud for disaster recovery, burst for capacity, burst for processing and storage standardization.
Build a three-year plan to integrate your unstructured file data with the public cloud infrastructure as a service (IaaS) to match your objectives, SLAs, and cost constraints.
Choose a hybrid cloud file provider by its ability to deliver additional value-add services, such as data mobility, data analytics, cyber resilience, life cycle management, and global access.

Databricks partners with Anthropic, Palantir on enterprise AI

By

Chris Mellor

-

March 27, 2025

Datalake supplier Databricks has signed a deal with Palantir, developed a better way to fine-tune large language models (LLMs), and is including Anthropic’s Claude LLMs with its datalake in a five-year alliance.

Databricks bought generative AI model maker MosaicML for $1.3 billion in June 2023 and developed its technology into Mosaic AI. This is a set of tools and services to help customers build, deploy, and manage GenAI models with retrieval-augmented generation (RAG) applied to their proprietary data. It is integrated into Databricks’ datalake, the Data Intelligence Platform. Now it is also integrating the Palantir Artificial Intelligence Platform (AIP) with the Data Intelligence Platform and offering Anthropic’s reasoning-level AI models through it. The TAO (Test-time Adaptive Optimization) initiative uses test-time compute to augment and simplify model tuning without needing labeled data.

Ali Ghodsi

Ali Ghodsi, Databricks co-founder and CEO, stated: “We are bringing the power of Anthropic models directly to the Data Intelligence Platform – securely, efficiently, and at scale – enabling businesses to build domain-specific AI agents tailored to their unique needs. This is the future of enterprise AI.”

Anthropic co-founder and CEO Dario Amodei said: “This year, we’ll see remarkable advances in AI agents capable of working independently on complex tasks, and with Claude now available on Databricks, customers can build even more powerful data-driven agents to stay ahead in this new era of AI.”

Palantir’s AIP is a workflow builder that uses GenAI models to analyze natural language inputs, generate actionable responses, and, in military scenarios where it could analyze battlefield data, suggest strategies and tactical responses. Earlier Palantir AI systems, Gotham and Foundry, are data integration and analytics systems designed to help users analyze complex, disparate datasets and produce action-oriented responses, particularly in Gotham’s case for military and national security datasets. AIP combines Palantir’s ontology (Foundry) framework, a digital representation of an organization’s operations, to bring more context to its request responses.

Databricks is bringing Palantir’s military-grade security to its Unity Catalog and Delta Sharing capabilities. The Unity Catalog is a unified governance software layer for data and AI within the Databricks platform. Delta Sharing is Databricks’ way of sharing third-party data. Databricks says that joint Databricks and Palantir customers will be able to use GenAI, machine learning, and data warehousing within a secure, unifiedm and scalable environment.

Rory Patterson, chairman of the board of Databricks Federal, said the combination of Unity Catalog and Delta Sharing with the Palantir system “will deliver the best of both worlds to our joint customers.”

The Anthropic deal enables Databricks to offer Anthropic models and services, including its newest frontier model, Claude 3.7 Sonnet, natively through the Databricks Data Intelligence Platform, available via SQL query and model endpoint. This means that “customers can build and deploy AI agents that reason over their own data.” Claude can “handle large, diverse data sets with a large context window to drive better customization.”

Databricks says its Mosaic AI provides the tools to build domain-specific AI agents on customers’ own data “that deliver accurate results with end-to-end governance across the entire data and AI lifecycle, while Anthropic’s Claude models optimize for real-world tasks that customers find most useful.”

The Unity Catalog works with Anthropic Claude, providing the ability for users to enforce access controls, set rate limits to manage costs, track lineage, implement safety guardrails, monitor for potential misuse, and ensure their AI systems operate within defined ethical boundaries. Customers can customize Claude models with RAG by automatically generating vector indexes or fine-tuning models with enterprise data.

The fine-tuning angle brings us to TAO and its way of side-stepping labeled data needs. An LLM trained on general data can be fine-tuned through additional training by using input items paired with output items, a label, clearly indicating a desired response. This “teaches” the model to generate better responses by adjusting its internal parameters when comparing its internal predictions against the labels. For example, the input could be “Is rain wet?” for a weather-related session with an answer of “Yes.”

However, this can involve a huge amount of human labeling effort, with tens or even hundreds of thousands of input-output labels. A Databricks blog explains that TAO uses “test-time compute … and reinforcement learning (RL) to teach a model to do a task better based on past input examples alone, meaning that it scales with an adjustable tuning compute budget, not human labeling effort.”

The model “then executes the task directly with low inference costs (i.e. not requiring additional compute at inference time).” Unexpectedly, TAO can achieve better model response quality than traditional fine-tuning. According to Databricks, TAO can bring inexpensive, source-available models like Llama close to the performance of proprietary models models like GPT-4o and o3-mini and also Claude 3.7 Sonnet.

The blog says: “On specialized enterprise tasks such as document question answering and SQL generation, TAO outperforms traditional fine-tuning on thousands of labeled examples. It brings efficient open source models like Llama 8B and 70B to a similar quality as expensive models like GPT-4o and o3-mini without the need for labels.” TAO will enable Databricks users to use less expensive and capable GenAI models and improve their abilities.

To learn more about the Databricks and Anthropic partnership, sign up for an upcoming webinar with Ghodsi and Amodei. Anthropic’s Claude 3.7 Sonnet is now available via Databricks on AWS, Azure, and Google Cloud.

Data sovereignty in focus as Europe scrutinizes US cloud influence

By

Chris Mellor

-

March 27, 2025

Analysis: Rising data sovereignty concerns in Europe following Donald Trump’s election as US President in January are increasing interest in Europe-based storage providers such as Cubbit, OVHcloud, and Scaleway.

The EU has GDPR, NIS2, and DORA regulations that apply to customer data stored in the bloc. However, US courts could compel companies under US jurisdiction to disclose data, potentially overriding EU privacy protections in practice.

EU data regulations

GDPR, the General Data Protection Regulation, harmonized data privacy laws across Europe with regard to the automated processing of personal data as well as rules relating to the free movement of personal data and the right to have personal data protected. GDPR’s scope applies to any organization located anywhere that processes the personal data of EU residents.

NIS2, the EU’s cybersecurity Network and Information Security 2 directive, took effect in October last year with operational security requirements, faster incident reporting, a focus on supply chain security, harsher penalties for non-compliant organizations, and harmonized rules across the EU. While NIS focuses on the security of network and information systems, the UK GDPR is concerned with the processing of personal data.

DORA, the EU regulation on Digital Operational Resilience for the financial sector, establishes uniform cybersecurity requirements for financial bodies in the EU. DORA and NIS complement and coexist with GDPR.

There is an EU-US Data Privacy Framework (DPF) set up in 2023. This is a legal agreement between the EU and US intended to allow the secure transfer of personal data to US companies that participate in the framework, thus ensuring that the data is protected at a level comparable to the EU’s GDPR.

US companies – excluding banks and telecom providers – can self-certify through the Department of Commerce, committing to privacy principles like data minimization, purpose limitation, and transparency. Periodic reviews by the European Commission and data protection authorities will monitor compliance.

GAIA-X

Lastly, there is a European GAIA-X cloud framework, launched in 2019 to create a federated, secure, and sovereign digital cloud infrastructure for Europe. It aims to ensure that European data remains under European control, adhering to EU laws such as GDPR. But it is a framework and new and existing cloud service suppliers in the EU have to adopt it.

US suppliers like AWS, Azure, Microsoft, and Palantir have joined GAIA-X as not quite full members. They are subject to US jurisdiction under the CLOUD Act, potentially compromising the initiative’s goals. French founding member Scaleway left the organization in 2021 due to such doubts.

US supplier EU data sovereign clouds

US-based public cloud suppliers have set up operations they say comply with GDPR rules. AWS has set up its European Sovereign Cloud, standalone cloud infrastructure physically located in the EU (starting with Germany), operated by EU-resident personnel, and designed to keep all data and metadata within EU borders. It enables customers to select specific EU region centers, such as Frankfurt, Ireland, and Paris, for data storage and processing.

Azure supports GDPR constraints with an EU Data Boundary concept. This ensures customer data for services such as Azure, Microsoft 365, and Dynamics 365 is stored and processed within EU and European Free Trade Association (EFTA) regions. Azure also provides multiple EU regions, such as Germany, France and Sweden, to further localize data within the EU geography and says it supports the GAIA-X framework.

Google Cloud partners with EU suppliers, such as T-Systems in Germany, to offer local sovereign cloud options, restricted to, for example, Belgium, Finland, or Germany. Data residency and operations are managed within Europe, sometimes with encryption keys controlled by external partners rather than Google. Even Oracle has set up an EU-only sovereign cloud.

US law and EU data sovereignty

However, certain US legal rights affect the situation and raise doubts about the ability of US-based EU sovereign cloud providers to refuse US government requests for access to EU citizens’ data.

The 2008 Foreign Intelligence Surveillance Act’s section 702 (FISA 702) authorizes the warrantless collection of foreign communications by US intelligence agencies like the NSA, targeting non-US persons located outside the United States for national security purposes. A Court of Justice of the European Union (CJEU) ruling in 2020 declared that FISA 702’s lack of judicial oversight and redress for EU citizens makes US privacy protections inadequate under GDPR.

The 2018 CLOUD (Clarifying Lawful Overseas Use of Data) Act raises questions about the vulnerability of US public cloud suppliers to government demands for access to their EU-stored data. It allows authorities to compel US-based tech companies to provide data about a specific person or entity, stored anywhere in the world, under a warrant, subpoena, or court order.

Companies can challenge such US orders in court if they conflict with foreign laws like GDPR, and if the target isn’t a US person and doesn’t reside in the US. This CLOUD Act could override the 2023 EU-US Data Privacy Framework.

Until a court rules that US public clouds and supplier-controlled EU sovereign clouds are not subject to FISA 702 and/or CLOUD Act requests for access to EU citizens’ data, and can refuse them, their solid adherence to EU data privacy laws must be in doubt.

We might imagine what could happen if the US Trump administration told a US public cloud supplier to give it access to an EU citizen’s data. They could well accede to such a request.

The most certain way for an EU organization to ensure that citizens’ private data is not accessible to US government inspection is to have it stored in strictly EU-controlled IT facilities, such as their own systems, France’s OVH Cloud, Cubbit’s decentralized cloud, and other regional cloud storage suppliers.

Cohesity launches NetBackup 11 with quantum-proof encryption

By

Chris Mellor

-

March 26, 2025

In what appears to be a tangible sign of its commitment to its acquired Veritas data protection product portfolio, Cohesity has issued a v11.0 major release of NetBackup.

It features extended encryption to defend against quantum attacks, more user behavior monitoring, and an extended range of cloud services protection.

Cohesity SVP and chief product officer Vasu Murthy stated: “This represents the most powerful NetBackup software release to date for defending against today’s sophisticated threats and preparing for those to come …The latest NetBackup features give customers smarter ways to minimize the impact of attacks now and post-quantum.”

NetBackup v11.0 adds quantum-proof encryption, claiming to guard against “harvest now, decrypt later” quantum computing attacks. Quantum-proof encryption is designed to resist attacks from quantum computers, which could potentially break current encryption methods. Encrypted data could, it is theorized, be copied now and stored for later decryption using a yet-to-be-developed quantum computer. Post-quantum cryptography uses, for example, symmetric cryptographic algorithms and hash functions that are thought to be resistant to such attacks. The US NIST agency is involved in these quantum-proof encryption efforts.

Cohesity claims that NetBackup v11.0 “protects long-term confidentiality across all major communication paths within NetBackup, from encrypted data in transit and server-side dedupe, to client-side dedupe, and more.”

v11.0 also includes a wider range of checks looking for unusual user actions, saying it’s a “unique capability” and “can stop or slow down an attack, even when threat actors compromise administrative credentials.”

It automatically provides recommended security setting values. An included Adaptive Risk Engine v2.0 monitors user activity, looking for oddities such as unusual policy updates and user sign-in patterns. Alerts and actions can then be triggered, with Cohesity saying “malicious configuration changes can be stopped by dynamically intercepting suspicious changes with multi-factor authentication.”

A Security Risk Meter provides a graphic representation of various risks across security settings, data protection, and host communication.

NetBackup 11.0 has broadened Platform-as-a-Service (PaaS) protection to include Yugabyte, Amazon DocumentDB, Amazon Neptune, Amazon RDS Custom for SQL Server and Oracle Snapshots, Azure Cosmos DB (Cassandra and Table API), and Azure DevOps/GitHub/GitLab.

It also enables image replication and disaster recovery from cloud archive tiers like Amazon S3 Glacier and Azure Archive.

The nonprofit Sheltered Harbor subsidiary of the US FS-ISAC (Financial Services Information Sharing and Analysis Center) has endorsed NetBackup for meeting the most stringent cybersecurity requirements of US financial institutions and other organizations worldwide.

A Cohesity blog by VP of Product Management Tim Burowski provides more information. NetBackup v11.0 is now available globally.

Rubrik taps NTT DATA to scale cyber-resilience offering globally

By

Chris Mellor

-

March 26, 2025

Fresh from allying with Big Four accounting firm Deloitte, Rubrik has signed an agreement with Japanese IT services giant NTT DATA to push its cyber-resilience services.

NTT revenues for fiscal 2024 were $30 billion, with $18 billion of that generated outside Japan. Rubrik is much smaller with revenue of $886.5 million in its latest financial year, and its focus on enterprise data protection and cyber-resilience led to a 2024 IPO, and product deals with suppliers including Cisco, Mandiant, and Pure Storage. Rubrik and NTT DATA have been working together for some time and have now signed an alliance pertaining to security services.

Hidehiko Tanaka, Head of Technology and Innovation at NTT DATA, said: “We recognize the critical importance of cyber resiliency in today’s digital landscape. Our expanded partnership with Rubrik will significantly enhance our ability to provide robust security solutions to our clients worldwide.”

NTT will offer its customers Rubrik-influenced and powered advisory and consulting services, implementation and integration support, and managed services. Its customers will be able to prepare cybersecurity responses before, during, and after a cyber incident or ransomware attack affecting their on-premises, SaaS, and public cloud IT services. The NTT DATA partnership includes Rubrik’s ransomware protection services.

In effect, Rubrik is adding global IT services and accounting firms to its sales channel through these partnerships, giving it access to Fortune 2000 businesses and public sector organizations, potentially offering an advantage over competitors such as Commvault, Veeam, and Cohesity. Once it has won a new customer this way, Rubrik will no doubt put in its salesteam to work on cross-sell and upsell opportunities.

Ghazal Asif, VP, Global Channel and Alliances at Rubrik, said: “As a trusted and longstanding strategic partner, Rubrik is proud to expand our collaboration with NTT DATA for cyber resilience. Together, we will empower hundreds of organizations with differentiated offerings that ensure rapid recovery from ransomware attacks and other cyber threats, no matter where their data lives.”

NTT DATA has used Commvault internally in the past, as a 2018 case study illustrates, and NTT UK is a Veeam partner.

N2WS adds cross-cloud backup for Azure, AWS, Wasabi

By

Chris Mellor

-

March 26, 2025

N2WS says its latest backup software provides cross-cloud backup across AWS, Azure, and Wasabi, which it claims lowers costs and enables direct backup cold tier storage for Azure Blob and Wasabi S3.

N2WS (Not 2 Worry Software) backup uses cloud-native, platform-independent, block-level snapshot technology with an aim to deliver high-speed read and write access across Azure, AWS, and third-party repositories like Wasabi S3. It charges on a per-VM basis and not on a VM’s size. N2WS says it provides vendor-neutral storage and, with its cloud-native snapshot technology, delivers control over cloud backups, designed to unlock “massive cost savings.” N2WS is a backup and recovery supplier that has not broadened into cyber-resilience.

Ohad Kritz, CEO at N2WS, claimed in a statement: “We excel in protecting data – that’s our specialty and our core strength. While others may branch into endpoint security or threat intelligence, losing focus, we remain dedicated to ensuring our customers are shielded from the evolving IT threat landscape.”

N2WS claims Backup & Recovery v4.4 offers:

Lower long-term Azure storage costs with up to 80 percent savings through Azure Blob usage, and more predictable costs compared to Azure Backup, with new per-VM pricing ($5) and optimized tiering
Seamless cross-cloud automated archiving with low-cost S3-compatible Wasabi storage
Faster, more cost-effective disaster recovery via Direct API integrations with AWS and Azure with greater immutability
New custom tags improve disaster recovery efficiency, with better failover and failback
Targeted backup retries, which retry only failed resources to cut backup time and costs

The tiered Azure backup is similar to N2WS’s existing and API-linked AWS tiering. Azure VM backup pricing is based on VM and volume size. An N2WS example says that if you have 1.2 TB of data in one VM, the cost of using Azure Backup would be $30 plus storage consumed versus $5 plus storage consumed for the same VM using N2WS.

N2WS Azure customers can also select both cross-subscription and cross-region disaster recovery for better protection with offsite data storage, we’re told.

The cross-cloud backup and automated recovery provides isolated backup storage in a different cloud and sidesteps any vendor lock-in risks associated with, for example, Azure Backup.

N2WS uses backup tags to identify data sources and apply backup policies. For example, AWS EC2 instances, EBS volumes, and RedShift databases are identified with key-value pairs. N2WS scans the tags and applies the right backup policy to each tagged data source. If a tag changes, a different backup policy can be applied automatically.

It says its Enhanced Recovery Scenarios introduce custom tags, enabling the retention of backup tags and the addition of fixed tags, such as marking disaster recovery targets. This improvement enhances the differentiation between original and DR targets during failover. v4.4 has a Partial Retry function for policy execution failures, which retries backing up only the failed sources and not the successful ones.

N2WS Backup & Recovery v4.4 is now generally available. Check out N2WS’s cost-savings calculator here.

Scan Computers deploys PEAK:AIO storage for Blackwell GPU cluster

By

Chris Mellor

-

March 26, 2025

An EMEA-based GPU server cloud provider is using PEAK:AIO storage to hold customer data for its Blackwell GPU cluster.

Startup PEAK:AIO is building scale-out AI Data Server storage based on highly efficient single-node building blocks, maximizing density, power efficiency, and performance. This delivers 120 GBps throughput from a 2RU, 1.5 PB Dell R765 server chassis filled with Solidigm 61.44 TB SSDs, using a PCIe Gen 5 bus.

Existing PEAK:AIO customer and Nvidia Elite Partner Scan Computers is using it to store data for its GPU-as-a-Service (GPUaaS) offering built on its DGX Blackwell B200 cluster – one of the first in EMEA – and representing a multimillion-pound investment in AI infrastructure.

PEAK:AIO founder and Chief Strategy Officer Mark Klarzynski tells us: “PEAK:AIO has been quietly perfecting ultra-efficient single-node AI storage … a concept that often flies under the radar because ‘single node’ doesn’t sound groundbreaking at first. But if you stop viewing it as just storage and start seeing it as a building block, the potential starts to become more visible.”

“Take that building block … so that when scaled, it forms the most efficient, scalable AI infrastructure possible.”

PEAK:AIO’s AI Data Servers feature:

GPUDirect NVMe-oF – Optimized for ultra-fast, I/O-intensive workloads, with interruption-free data movement directly between storage and GPU memory.
GPUDirect RDMA NFS – Providing scalable, file-based data movement suited for larger AI clusters.
Next-Gen S3 Storage – Delivering significantly more powerful S3-compatible object storage, pushing object storage to new levels of efficiency and accessibility.

Elan Raja, Scan Computers CEO, said: “It takes real expertise to craft a system where every element works in harmony to deliver the results customers expect.”

Klarzynski tells us PEAK:AIO has “had a two-year roadmap on features and extensions, and we will soon see a steady stream of new features, solutions, and collaborations set to roll out soon. And not just new tech, but solutions that will surprise even those who think they know where AI storage is headed.”

“It is not an easy decision to forgo the potential business benefits of mainstream enterprise storage. But GPUs represent such a fundamental shift in computing that we believe it is essential to fully commit to this evolved AI market rather than rebranding legacy storage. AI is a new frontier; it demands new storage and new thinking, not just new marketing.”

Seagate sees fresh spin for NVMe hybrid flash/disk array

By

Chris Mellor

-

March 25, 2025

A Seagate blog says that fast-access hybrid flash+disk drive arrays are a good fit to AI workload needs because retaining large datasets solely on SSDs is financially unsustainable for most enterprises.

It goes on to say that, by adopting a parallel NVMe interface instead of the existing serial SAS/SATA hard disk drive (HDD) interface, we can “remove the need for HBAs, protocol bridges, and additional SAS infrastructure, making AI storage more streamlined.” By using “a single NVMe driver and OS stack, these drives ensure that hard drives and SSDs work together efficiently, removing the need for separate software layers.”

Access to data stored on NVMe SSDs benefits from GPUDirect protocols with direct GPU memory-to-drive access and no intervening storage array controller CPU involvement with memory buffering of the data to be transferred. Also existing NVMe-over-Fabrics infrastructure can be used “to integrate into distributed AI storage architectures, ensuring seamless scaling within high-performance data center networks.”

This is of no benefit at the individual HDD level where access latency is far more dependent on HDD seek time, with the head moving to the right track and then waiting for the appropriate blocks in the track to spin around underneath the head, than drive controller response time. An SSD has far lower access latency than a disk drive because the controller has fast electrical connections to the data storing cells and is not subject to seek time waits.

Seagate is suggesting that an entire drive array, a hybrid one with both SSDs and HDDS, gets NVMe connectivity to a GPU server. Then the speed of the link, compared to a SATA/SAS 12Gbits connection would enable far higher aggregate connectivity, even taking seek time into account. The HDDs and SSDs would share the NVMe connection to the GPU server, simplifying that side of things.

An RNIC, a BlueField-3 smartNIC/DPU for example, in the array would send data, using RDMA, to its twin in the GPU server, which would be linked to that server’s memory. Seagate built a demo system for Nvidia’s GTC 2025 conference with NVMe HDDs and SSDs in a hybrid array with a BlueField-3 front-end and Nvidia’s AIStore software, which supports Nvidia’s GPUDirect for object data protocol.

It says this was a proof-of-concept idea showing that:

Direct GPU-to-storage communication via NVMe hard drives and DPUs helped reduce [overall] storage-related latency in AI data workflows,
Legacy SAS/SATA overhead was eliminated, simplifying system architecture and improving storage efficiency,
NVMe-oF integration enabled seamless scaling, proving the composability of multi-rack AI storage clusters,
Model training performance was improved by AIStore’s dynamic caching and tiering.

Seagate says the POC demonstrated the composability of multi-rack storage clusters.

If we agree that “retaining large datasets solely on SSDs is financially unsustainable for most enterprises” then this demo of an NVME-connected hybrid flash+disk array could illustrate that AI workloads don’t need all-flash array architectures.

Seagate says that NVMe-connected HDDs will have applicability in manufacturing, autonomous vehicles, healthcare imaging, financial analytics and hyperscale cloud AI workloads. Compared to SSDs, NVMe hard drives would offer, it says:

10× more efficient embodied carbon per terabyte, reducing environmental impact.
4× more efficient operating power consumption per terabyte, lowering AI datacenter energy costs.
Significantly lower cost per terabyte, reducing AI storage TCO at scale.

Seagate has an NVMe HDD roadmap which includes using its 36TB and above HAMR drives, and creating reference architectures.

The company has history here. Back in 2021, it produced a 2 RU JBOD (Just a Bunch of Disks) enclosure housing 12 x 3.5-inch disk drives and accessed through an onboard PCIe 3 switch and NVMe at an Open Compute Summit. That initiative did not spur either Toshiba or Western Digital into doing the same.

Four years later, neither Toshiba nor Western Digital still have any public involvement in adding NVMe interfaces to their disk drives. It’s probable that customers for NVMe-connected hybrid arrays would appreciate freedom from HDD lock-in and single vendor pricing by having NVMe HDD support available from them as well as Seagate.

Komprise adds automated share and permission mapping to speed data migrations

By

Chris Mellor

-

March 25, 2025

Komprise has updated its Elastic Data Migration (EDM) tech with automated destination share and user/permission mapping.

EDM ships unstructured data from a source to a target system. EDM v1.0 in 2020 took file data from an NFS or SMB/CIFS source and transferred it across a LAN or WAN to a target NAS system or, via S3 or NFS/SMB, to object storage systems or the public cloud. It used parallel transfers and was claimed to be up to 27 times faster than Linux rsync.

In 2022, EDM v4.0 provided up to 25x faster small file uploads to the public cloud with a so-called Hypertransfer feature, matching NFS transfer speed, according to Komprise. EDM v5.0 in 2023 enabled users and applications to add, modify, and delete data on the destination shares during the warm cutover period, when the last changes made on the source are copied to the destination. This, we’re told, eliminated downtime while the migration process was completed and the source data access switched off. Now we have a spring 2025 release.

Krishna Subramanian, COO and co-founder of Komprise, said: “IT leaders are looking for efficiencies across the board and data migrations can incur unnecessary extra costs and risks. Komprise has features that help stakeholders align on which unstructured data to move, automation to set up migrations and detailed reporting for ongoing data governance.”

According to Komprise, this latest EDM release provides:

Automatic creation and mapping of destination shares which directly map source hierarchies to the destination to increase transparency and minimize compliance risks, eliminating manual mapping from source to target shares with its high error risk when thousands of shares are involved.
Automated user/permission mapping with a security identifier (SID) mapping, automating the process of changing owner and group permissions when migrating files based on customer-defined mapping policies. SID mapping saves time and lowers the risk of manual user/permission mapping errors.
Automatic generation of consolidated chain-of-custody reports listing each file’s source and destination checksums and timestamps for the migration. Chain of custody is often required in regulated industries to serve as evidence of successful transfers.
Transparent Share Mapping for Tiering: A migration is an opportunity to put old data in cheaper storage. Komprise says it can reveal cold data and have EDM tier it to object storage, reducing the need for new storage capacity. It allows transparent 1-1 mapping from file shares to object buckets so data owners can see the original file data hierarchy represented in a similar structure in the destination object buckets.

Transparent share mapping provides transparent visibility into data lineage for AI data governance.

Komprise EDM chain-of-custody screenshot

Komprise suggests that when organizations have centralized storage teams supporting multiple departments, each with their own share hierarchies, EDM’s automatic creation of destination shares from source hierarchies minimizes compliance risks. It also says that users can pick any source – NFS, SMB, Dual, Object – and choose a destination. Komprise EDM then does the rest.

EDM, available both for Azure and AWS, can be obtained as a separate service or as part of Komprise’s Intelligent Data Management offering. Learn more here.

Bootnote

Datadobi’s DobiMigrate also automates the replication of data hierarchies, within a single migration NFS, SMB or S3 environment, by scanning the source environment and replicating the directory structures, file paths, and associated metadata (e.g. permissions, timestamps) to the target. Multi-protocol migrations may require separate runs. Chain-of-custody technology with hash validation is used to ensure data integrity during the process.

Cerabyte gets In-Q-Tel cash for data archiving

By

Chris Mellor

-

March 25, 2025

Ceramic nano-dot archiving startup Cerabyte has received an investment from In-Q-Tel as part of a strategic initiative to speed up its product development.

Cerabyte’s technology uses femto-second laser-created nano dots in a ceramic-coated glass storage tablet that is stored in a robotic library. The medium is immutable, can endure for 1,000 years or more and promises to be more performant and cost-effective than tape. It is based in Germany and has opened offices in Silicon Valley and Boulder, Colorado. In-Q-Tel (IQT) is effectively a VC for the US CIA and other intelligence agencies and was set up to bring new commercial information technology to these agencies.

IQT Munich managing director Greg Shipley stated: “Cerabyte’s innovative technology can significantly enhance storage longevity and reliability while also reducing long-term costs and complexity. This strategic partnership aligns with our mission to deliver advanced technologies that meet the needs of the national security community.”

Christian Pflaum, co-founder and CEO of Cerabyte, matched this with his statement: “The strategic partnership with IQT validates our mission and fuels our ability to deliver accessible permanent data storage solutions.”

Cerabyte says that Commercial and government organizations anticipate managing data volumes comparable to those of major corporations such as Meta or Amazon. Most of this data will remain in cold storage for extended periods, aligning with typical declassification timelines of 25 to 50 years. The US National Academies of Sciences, Engineering, and Medicine (NASEM), conducted a Rapid Expert Consultation (REC) on technologies for archival data storage upon request of the Office of the Director of National Intelligence (ODNI); ODNI is the head of the US intelligence community.

We understand a a US National Academies REC is a report by subject matter experts looking at urgent issues where decision makers need expert guidance quickly.

Other recent In-Q-Tel investments include:

Participating in $12 million A-round for VulnCheck, a CVE vulnerability and exploit intelligence monitoring startup.
Participating in Israeli defense tech startup Kela’a $11 million seed round.
Contributing to US Kubernetes and AI workload security startup Edera’s $15 million A-round.
Participating in cloud GPU server farm business Lambda’s $480 million D-round.

It has famously invested in big data analytics supplier Palantir and also in Databricks.

Cerabyte has previously received an investment from Pure Storage and been awarded a portion of a €411 million ($426 million) European Innovation Council (EIC) Accelerator grant.

It’s possible we may have Cerabyte product development news coming later this year.

MariaDB CEO on database company regaining its mojo

By

Chris Mellor

-

March 24, 2025

Interview: B&F caught up Rohit de Souza, CEO of MariaDB, following his visit to Finland where he met founder Michael “Monty” Widenius.

MariaDB was founded in 2009 by some of the original MySQL developers, spurred into action by Oracle buying MySQL owner Sun Microsystems. They developed their SkySQL open source databases as a MySQL fork. It was an alternative to proprietary RDBMSes and became one of the most popular database servers in the world. It went through some ten funding events, amassing around $230 million before going public through a SPAC transaction in 2023. In February 2022, it was valued at $672 million but started trading at the much lower $368 million. Michael Howard was CEO at the time, but Paul O’Brien took the post in March 2023. Tom Siegel was appointed CRO and Jonah Harris CTO.

Things did not go well. MariaDB posted poor quarterly results, received NYSE warnings about its low market capitalization, laid off 28 percent of its staff, closed down its Xpand and SkySQL offerings, and announced Azure Database for MariaDB would shut down in September 2025. A lender threatened to claw back cash from its bank account. Harris left in April 2024. O’Brien left in May that year.

MariaDB was acquired by private equity house K1 Investment Management in September 2024, at a suggested $37 million price, about a tenth of its initial public trading capitalization.

At that time, MariaDB had 700 customers, including major clients such as Deutsche Bank, Nokia, RedHat, Samsung, ServiceNow, the US Department of Defense, and multiple US intelligence and federal civilian agencies. This gave K1 something to build on. It appointed Rohit de Souza as CEO, with previous executive experience from his roles at Actian and MicroFocus. O’Brien was transitioned out to advisor status.

De Souza’s job is to pick up the pieces and rebuild the company after its near-disastrous public ownership experience. How has he set about doing that?

He claimed MariaDB had been “completely mismanaged,” but there is a strong community and it’s “still the number three open source database in the world.” It has had a billion downloads of its open source software and it’s used by up to 1.3 million enterprises. “And if that’s not a customer base who wants to continue using its product, and see it develop further,” then he doesn’t know what is, De Souza said.

Some executive musical chairs took place. Siegel left the CRO job in January this year, Widenius was rehired as the CTO, and Mike Mooney became MariaDB’s CRO.

There is a free community edition of MariaDB and a paid-for Enterprise edition that has support. De Souza said there is “a very small sliver of those who are paying for it, but that’s because we’ve not managed that whole process right.” MariaDB has been leaving money on the table.

He said: “For instance, take a big bank, a prominent bank in the UK, they’re using both the community version and the enterprise one. They’re paying us support on the enterprise piece, but there’s been no diligence with them around saying, ‘Hey, if you’re going to get support from us, you need to get support on the entire estate.'”

“People can pay for the support on the enterprise and then use the skills they’ve got to fix the community edition. They use the support on the community stuff.”

This will be one strategy De Souza will use to increase revenue from the customer base. He said “there’s scope for the smooth scheme increase in the discussions” with customers.

How have customers responded to that idea? “So far I’ve touched a lot of customers as we’ve gone forward. The reception has been fabulous. The reception has been, ‘Oh really? We didn’t know you were going down that path. We thought MariaDB was sort of on its way out’.”

He is also updating the product. “We talked to them about what we’re doing around vector, how it’s an obvious development. It’s out with the community edition already and the GA piece for the enterprise edition will be in there soon.”

MariaDB added vector support alongside its existing transactional, analytical, and semi-structured data support in January.

De Souza has been giving MariaDB employees and partners “the message that there’s more value we can provide to our customer base. There’s more revenue that we can get from our customer base for this.”

He told us he had visited Widenius in Finland, and met his daughter Maria, after whom MariaDB is named, and other family members.

He said of Monty: “This guy, this is his life’s work. This is part of why I joined, because I see so much value in this and I see so much innate capability. Until this point engineering at MariaDB was every engineer did what he thought was right. There was no hope. Monty is a fabulous technologist, but you can’t ask him to drive the roadmap. He doesn’t want to do that.”

So, De Souza said: “You need a chief product officer. I’ve got a good chief product officer. I’ve got a good chief technology officer. I brought in a top-notch engineering leader to make sure that the trains run on time, and make sure our releases are right, make sure our roadmaps are correct. I brought in the right AI talent to make sure that we’re working on AI to put next to SQL in the database to do checkpointing.”

“I want to make it really, really simple to use for all of the fellas trying to mine the gold out of the AI system.”

The situation is that a MariaDB customer is looking to use AI agents and have them access structured and unstructured information in the database. That information will need vectorizing and the vectors stored in the database. Is MariaDB developing its own AI agents?

“We’re using traditional LLMs, Langchain or whatever. Whatever you want to use. We’d like to be able to use the typical externally available agents.”

Think of the LLM as an arrow and MariaDB as a data target.”“Use whatever arrows you want and we’ll make our target big and bold for those arrows and we’ll make it usable. We’ll make whatever information you’ve got stored in MariaDB accessible to all your AI, to the rest of your AI estate.”

MariaDB may well add an AI pipeline capability, including data masking to screen out sensitive data, “filtered to remove sensitive information. Then maybe transformed a little and then fed upstream and add to that on the front end the right security models. This is thanks to K1 and their relationships in the security area.”

De Souza is also concerned that his customers don’t get information leakage through AI agent use. “So for example, if you did a query through your large language model of your financial database to, say, give me what’s the revenue of the company, and think, ‘Hey, I’ll email this to my friend downstairs.’ You won’t be able to. You won’t have access to that on the front end.”

He sees AI agents as “extraordinarily powerful microscopes,” and will be looking at partnerships and/or acquisitions to gain technology access.

He said the company is getting enthused and growing. “I’m adding 20 percent to the staff in the company and I’m having no trouble finding the right talent. Because you say to prospective talent, check out MariaDB’s customer base and exposure and heritage. Check out what we can do with it. Do you want to be part of this or would you prefer to go risk yourself elsewhere?”

De Souza is adding senior advisors to MariaDB, such as Linux Foundation exec director Jim Zemlin and ServiceNow president Amit Zavery. He said ServiceNow has been trying to develop its own database. “We’ve just renewed a relationship with them and I’m convinced that at some point [CEO Jim] McDermott is going to see the light” – and use MariaDB.

He said there is a substantial large enterprise appeal in moving off Oracle and saving lots of cash. “We now have lots of interest. We’re in the process of migrating a large Middle Eastern airline off Oracle. They’ve said, ‘Show me that you can move a critical system,’ which is a little unnerving for us, and I’ve had to have all hands on that. But if that’s successful, we’ll have proved – we’ve already proved it in financial services, in retail – we’ll be able to prove it in the airlines and the transportation business.”

Another direction for development is automating the migration off Oracle. De Souza said: “The Oracle mode will only be available for the enterprise, of course, because these are going to be customers who are already paying millions of dollars.”

Storage news ticker – March 24

By

Chris Mellor

-

March 24, 2025

Airbyte announced new capabilities for moving data at scale for AI and analytics workloads:

Support for the Iceberg open standard for moving data into lakehouses
Enterprise Connector Bundle that includes connectors for NetSuite, Oracle Database with Change Data Capture (CDC), SAP HANA, ServiceNow, and Workday
Support for AWS PrivateLink with secure, private cloud-to-cloud data transfers to ensure sensitive information is controlled

…

Lenovo announced new full-stack systems using Nvidia components at GTC 2025. The Lenovo Hybrid AI Advantage with Nvidia framework includes the Lenovo AI Library. A new AI Knowledge Assistant is a “digital human assistant” that engages in real-time conversation with attendees to transform their event experiences and navigation. It enables users to customize Lenovo AI Library use cases and operationalize new AI agents and assistants within weeks. Lenovo is helping businesses build, scale, and operate their own AI factories. Lenovo’s agentic AI and hybrid AI factory platforms include Nvidia Blackwell Ultra support.

…

Phison has updated its aiDAPTIV+ “affordable AI training and inferencing solution for on-premises environments.” Customers will be able to fine-tune large language models (LLMs) up to 8 billion parameters using their own data. The new aiDAPTIVLink 3.0 middleware provides faster time to first token (TTFT) recall and extends the token length for greater context, improving inference performance and accuracy. A Maingear AI laptop PC concept was demonstrated at GTC 2025 using aiDAPTIVCache SSDs, aiDAPTIVLink middleware, and a Pro Suite GUI-based all-in-one feature set for LLMOps, including ingest, fine-tuning, RAG, monitoring, validation, and inference.

It says aiDAPTIV+ is a budget-friendly, GPU memory extension capability allowing users to train an LLM with their data within an on-premises “closed-loop” secure network, while providing a simple user interface to interact with and ask questions about the data. AI Inference for IoT with aiDAPTIV+ now supports edge computing and robotics use cases with Phison’s validation of Nvidia Jetson-based devices.

…

Predatar, which scans backups for malware, is releasing third-gen CleanRoom tech that supports more storage environments than the previous version, is more cost effective, and significantly quicker and easier to deploy. It has removed the need for customers to purchase hypervisor licences and third-party EDR software to reduce cost and streamline procurement, and can run on commodity hardware. There is a new ISO-based deployment method that enables a CleanRoom to be created as a Virtual Machine, or on bare metal. CleanRoom 3 can be built in just a few hours without the need for highly specialized skills or extensive training.

CleanRoom 3 will be available from March 31 directly from Predatar, Predatar’s Apex Partners, or from any IBM Storage Reseller.

…

Cyber-resilience and data protector Rubrik announced a strategic alliance with Big Four accounting firm Deloitte to deliver advanced data security and management to help organizations safeguard their data assets, business continuity, and cyber resilience. The two will:

Combine Rubrik’s Zero Trust Data Security software with Deloitte’s technical knowledge in cybersecurity, risk management, and digital transformation
Deliver integrated offerings that help clients streamline data management processes, providing efficient data backup and recovery
Use cloud technologies and analytics to assist organizations in improving their IT systems

Deloitte employs over 460,000 professionals worldwide and reported revenues of $67 billion in its fiscal year 2024. It will also serve as a sales and integration channel for Rubrik’s offerings.

…

SK hynix is speeding up high bandwidth memory (HBM) manufacturing following a customer order surge, including from Broadcom, says Korean media outlet TheElec, planning to bring in equipment into its new M15X fab two months earlier than originally planned. Wafer capacity at M15X is planned to increase from 32,000 wafers/month. Broadcom is expected to account for 30 percent of SK hynix’s total HBM capacity by the end of the year.

We understand Broadcom’s Strata DNX Jericho2 (BCM88690) Ethernet switch-router, launched in 2018, delivers 10 Tbps of bandwidth and integrates HBM for large-scale packet buffering. Jericho2c+ announced in 2020, offers 14.4 Tbps and uses HBM for enhanced buffering and power efficiency. The StrataDNX Qumran3D single-chip router provides 25.6 Tbps of routing capacity and uses HBM to support high-bandwidth, low-power, and secure networking for carrier and cloud operators.

In December 2024, Broadcom announced availability of its 3.5D eXtreme Dimension System in Package (XDSiP) platform technology, enabling consumer AI customers to develop next-generation custom accelerators (XPUs). The 3.5D XDSiP integrates more than 6,000 mm² of silicon and up to 12 HBM stacks in one packaged device to enable high-efficiency, low-power computing for AI at scale.

TechRadar reported in January that Broadcom CEO Hock Tan told investors the company has three hyperscale customers who are each planning to deploy one million XPU clusters by 2027, and that it has been approached by two additional hyperscalers who are also in advanced development for their own AI XPUs.

…

Synology is thought to be launching DS925+, DS425+, DS225+, DS1525+, DS1825+, DS725+, DS625slim, DS1825xs+ and RS2825RP+ NAS storage systems between now and summer 2025, according to the NAScompares website, citing a ChipHell source. The new systems have 2.5 Gbps Ethernet ports.

…

Content Distribution Network supplier Akamai and VAST Data are partnering to profit from edge AI inference by enabling distributed customers to have better local response time and localization for distributed inference, compared to the centralized inference processing offered by hyperscalers. Akamai has a large Edge presence, with 4,300 points of presence and servers in more than 700 cities; effectively a distributed cloud. The partnership suggests that VAST Data storage and its AI software stack will be deployed in Akamai’s server locations.

Akamai says latency, cost, and scalability are hurdles to deploying AI-powered applications at scale. Unlike training, which is resource-intensive but happens in controlled environments, inference must operate continuously, on-demand, and often with ultra-low latency. Adding VAST Data to the Akamai Cloud aims to lower costs and improve customer experience to help democratize edge AI computing.

…

Agentic AI developer H2O.ai is partnering with VAST Data to bring agentic GenAI and predictive AI to enterprises’ exabyte-scale datasets. The two will combine “VAST Data’s modern infrastructure for storing and managing all data types (video, audio, documents, sensor data) with H2O.ai’s enterprise h2oGPTe platform to deliver retrieval-augmented answers, powerful model fine-tuning options, and grounded insights for enterprise users across on-premise, airgapped, and private cloud deployments.”

Enterprises can build agentic AI applications on top of the VAST Data Platform. Sri Ambati, Founder and CEO of H2O.ai, said: “h2oGPTe, the world’s #1 Agent AI, with VAST data platform makes the perfect platform for sovereign AI. Agentic AI is bringing incredible agility to businesses and modern automated workflows are transforming SaaS. With the accuracy of H2O.ai and instant access to any volume and variety of data from VAST, enterprises can deliver real-time decisions and become AI superpowers.”

H2O.ai partners include Dell, Deloitte, Ernst & Young (EY), PricewaterhouseCoopers (PwC), Nvidia, Snowflake, AWS, Google Cloud Platform (GCP), and Microsoft Azure. Founded in 2012, H2O.ai has raised $256 million from investors, including Commonwealth Bank, Nvidia, Goldman Sachs, Wells Fargo, Capital One, Nexus Ventures, and New York Life.

…

Datacenter virtualizer VergeIO, claiming it’s the leading alternative to VMware, has a strategic partnership with app virtualization and VDI supplier Inuvika to deliver a more cost-effective, high-performance virtual desktop and application delivery offering. The two aim to provide customers with “a seamless, scalable alternative to VMware Horizon and Citrix, addressing the increasing cost and complexity of legacy VDI solutions.”

…

A 2025 Emissions Blindspots Report commissioned by S3 cloud storage supplier Wasabi says “almost half of Europe’s business leaders (47 percent) are afraid to learn the full extent of their emissions data, even though 61 percent fear public backlash if their emissions are too high.” Wasabi surveyed 1,200 business decision makers across the UK, France and Germany and 51 percent are concerned they are going to lose customers if they fully disclose their emissions data. Only 66 percent of European businesses think that they can paint an accurate picture of their tech stack emissions, which includes Scope 3 emissions. You can read the 2025 Wasabi Emissions Blindspots Report in its entirety here.

In July last year, Wasabi and Zero Circle, a sustainable finance marketplace, teamed up to give businesses the data they need to “reduce their environmental impact.” Wasabi uses Zero Circle’s invoice-based carbon footprint calculator to provide customers with “transparency” and a real-time assessment of their emissions.

NewsPaperStorages and File System News

NewsPaperStorages and File System News

Datadobi boosts StorageMAP for smarter data management

Databricks partners with Anthropic, Palantir on enterprise AI

Data sovereignty in focus as Europe scrutinizes US cloud influence

EU data regulations

GAIA-X

US supplier EU data sovereign clouds

US law and EU data sovereignty

Cohesity launches NetBackup 11 with quantum-proof encryption

Rubrik taps NTT DATA to scale cyber-resilience offering globally

N2WS adds cross-cloud backup for Azure, AWS, Wasabi

Scan Computers deploys PEAK:AIO storage for Blackwell GPU cluster

Seagate sees fresh spin for NVMe hybrid flash/disk array

Komprise adds automated share and permission mapping to speed data migrations

Cerabyte gets In-Q-Tel cash for data archiving

MariaDB CEO on database company regaining its mojo

Storage news ticker – March 24

ABOUT US

FOLLOW US