Graid going for Nvidia RAID gold

By

-

July 7, 2025

Graid occupies a technology niche with its Nvidia GPU-powered RAID cards and is powering ahead with a development roadmap featuring AI and high-performance computing (HPC) products.

The company, which says it now has thousands of customers worldwide, has three SupremeRAID products: The SR-1010 enterprise performance card, the SR-1000 enterprise mainstream card, and the SR-1010 workstation and edge card. Graid claims it doubled its revenue in 2024, when it shipped around 5,000 cards, compared to 2023, and it thinks it may grow 60 percent over 2024 this year, with expanded OEM and reseller engagements, possibly to a $15 million revenue total and 7,000 or so cards sold.

Graid says the SupremeRAID products protect data against drive loss and remove bottlenecks from the storage IO path. They don’t, in themselves, accelerate anything. Its SupremeRAID roadmap, as presented to an IT Press Tour in Silicon Valley, has five components:

SE (Simple Edition) beta for desktops
AE (AI Edition )for GPU servers and AI workloads
SR-1000 AM (Ampere) with NVMe RAID on Nividia RTX A1000 GPU
S1001 AM with NVMe RAID on Nividia RTX A1400 GPU
HE (HPC Edition) and NVMe RAID with array migration for cross-node high availability (HA).

SupremeRAID SE is its first entry into the Bring-Your-Own-GPU market. It will provide enterprise-grade RAID protection and performance for PC/workstation users with a subscription scheme and be available this year. It will support up to 8 NVMe SSDs and use compatible GPUs in the PC. Suggested workloads include video editing, post-production, 3D rendering, animation, visual SFX for games, CAD, architecture, engineering and construction (AEC) apps.

The AE version for AI supports GPUDirect for direct NVMe-to-GPU memory transfers, and data offload to NVMe SSDs. It, Graid says, “integrates seamlessly with BeeGFS, Lustre, and Ceph, simplifying large-scale dataset management without data migration.” It achieves over 95 percent of raw NVMe drive performance with RAID protection.

Thomas Paquette, Graid’s SVP and GM Americas & EMEA, said: “We don’t have the failover built into this release. It will be built into the next release. So that’ll give you the opportunity to put two copies of the software, each one on a different GPU, and have one failover for your read.”

Graid says a fully enabled Nvidia H100 accelerator has 144 SMs (Streaming Microprocessors) or Multiprocessors), roughly equivalent to an x86 core, which manage multiple threads and cores for parallel processing. SupremeRAID-AE uses 6 x SMs on one GPU and doesn’t affect the others. It’s implemented with Time Division Multiplexing (TDM). Paquette said AE could use fewer SMs: “In low IO, we can go to sleep and give the GPU even more, and we implemented this with TDM.”

He added a comment: ”This was the best way for us to deploy and it’s it works quite well. It’s in the labs at Supermicro. They’re skewing it up right now. We’re testing it with Dell, and it’s going in the labs in Lenovo on a product project that Jason [Huang] is going to be working on.”

Paquette envisages Graid becoming a SW-only company, just selling licenses.

The Nvidia RTX A1000 and A400 single slot desktop GPUs were launched in April last year and feature an Ampere architecture. The The A400 introduced accelerated ray tracing, with 24 Tensor Cores for AI processing and four display outputs. The A1009 has 72 Tensor and 18 RT cores and is more powerful.

The SR-1000 AM combines SR-1000 and SR-1010 functionalities and performance.The SR-1001 AM follows on from the SR-1001 with equal performance and enhanced efficiency.

These AM products will have a new Graid software release, v1.7, which features a new GUI, Restful API, journalling for data integrity, bad block detection and improved error retry mechanism.

SupremeRAID HE is optimized for the BeeGFS, Ceph and Lustre environments found in the HPC world. it eliminates data replication across nodes and supports array migration. Graid and Supermicro have produced a Solution Brief document about a BeeGFS SupremeRAID HE system, which uses SupremeRAID HE with a SupremeRAID SR-1010 in each node.

The document says: “SupremeRAID HE, integrated with Supermicro’s SSG-221E-DN2R24R and BeeGFS, redefines NVMe storage standards. Leveraging array migration for cross-node high availability (HA), it delivers peak performance in a 2U system with two 24-core CPUs, saturating two 400Gb/s networks. By eliminating cross-node replication, it reduces NVMe costs and offers scalable adaptability. Achieving 132 GB/s read and 83 GB/s write locally – near theoretical limits post-RAID – and up to 93 GB/s read and 84 GB/s write from clients, this solution is excellent for your high performance storage needs, including HPC, analytics, and enterprise applications and is validated by rigorous benchmarks.”

Paquette said: “Supermicro calls this potentially a WEKA killer, and we’d have to scale pretty good to be a WEKA killer, but there’s nothing stopping us from doing that.”

Graid said the next software release will increase the number of supported drives beyond 32, potentially to 64, and there will be new Linux and Windows software releases later this year.

The company will also support new PCIe generations, with Paquette saying: “Each time PCIe changes three to four, four to five, … five to six, we automatically gain an exponential performance without doing anything to the product. Because the only bottleneck that we experience in a server today is the PCIe infrastructure. We can saturate it. So when six comes, we’ll saturate that as well.”

Erasure coding is on its roadmap but there is no committed date. We also understand the existing Dell and Graid partnership could be developing further. Graid is also a participant in an Nvidia Storage-Next initiative.

Could Graid support non-Nvidia GPUs? Paquette said: “We know that we could get it to work on an Intel GPU. We know that we can get it to work on an AMD GPU, but it’s a ground up rewrite, and we’ve got too much other stuff going on to play with.”

Find out more about SupremeRAID SE here and AE here.

Analysis finds Volumez block storage outpaces cloud giants for PostgreSQL

By

Chris Mellor

-

July 7, 2025

Volumez block storage for PostgreSQL on AWS, Azure, and Oracle Cloud Infrastructure (OCI) delivers more transactions per second at lower latency than the cloud providers’ own block storage instances.

PostgreSQL is an open source relational database. PostgreSQL, with SQL support, succeeded POSTGRES, which stood for Post-Ingres; Ingres was an earlier relational database. The PostgreSQL database has become widely deployed. Volumez pools cloud block ephemeral storage instances and provides block storage as a shared service to its customers more efficiently and at less cost than the raw AWS, Azure, or OCI services on which it is based. ArchitectingIT is a storage technology analysis service, and analyst Chris Evans has tested Volumez DIaaS (Data Infrastructure-as-a-Service) block storage for PostgreSQL. Evans says DIaaS “abstracts away the implementation details of cloud storage, replacing it with the capability to define application-specific metrics including bandwidth, throughput (IOPS) and latency per storage volume, irrespective of volume capacity and based on dynamic policies set by the administrator (most public cloud storage aligns capacity and IOPS in a linear scale).”

For example, Volumez can separate capacity from performance, typically combined in clouds like AWS and Azure, which means you can over-provision on capacity to get to a performance level.

Evans measured PostgreSQL transactions per second and latency with the pgbench evaluation tool. He ran “pgbench against a set of AWS, Azure and OCI configurations, mapping the results against the underlying cost of deploying that infrastructure.”

There are seven tests in his report:

AWS EBS io1 storage attached to the database server
AWS EBS io2 Block Express storage installed on the database server
Microsoft Azure PV2 SSD storage connected to the database server
Volumez DIaaS storage on AWS connected to the database server
Volumez DIaaS storage on Azure connected to the database server
Volumez DIaaS storage on OCI connected to the database server
Amazon Aurora storage directly configured as a cluster volume

Evans saw that block storage costs differed with each cloud, OCI being the cheapest and 40 percent less than AWS:

The AWS tests showed that Volumez-PostgreSQL delivered the highest transactions per second (TPS) rating:

He notes: “This data is interesting in that it highlights the relative instability of io1 storage, which significantly underperforms the alternate configurations. This is no surprise, as io1 is a much earlier implementation of SSD-based storage on AWS, compared to io2 Block Express or Volumez DIaaS using NVMe devices.”

The Volumez-PostgreSQL combo also delivered the lowest latency of the alternatives on AWS:

Evans examined the pricing of the various options, saying: “When the pricing is also taken into consideration ($34,000 per month for io2 Express vs $5,000 for the Volumez media nodes, plus 20 percent licensing charge), then the Volumez configuration clearly provides greater value for money.”

Volumez DIaaS also outperformed raw Azure block storage in TPS terms:

And in latency terms as well:

When Evans looked at Volumez and PostgreSQL on OCI he found that “with greater throughput at lower latency, OCI is the best choice for implementing high-performance PostgreSQL applications using Volumez across AWS, Azure and OCI,” as “OCI delivered the lowest latency results compared to AWS and Azure, while also being the cheapest option by a considerable margin.”

In general, “Volumez not only out-performs native block-storage options, but also delivers this capability at a fraction of the cost,” with an approximate “80 percent reduction highlighted for the AWS platform.”

Evans concludes: “Any business using the public cloud at scale, particularly for traditional database applications, should be using Volumez DIaaS as the primary choice for application storage, gaining operationally from flexible storage deployments and being able to gain a level of performance and resiliency that cannot be achieved on the public cloud alone.”

Volumez has published a blog post offering its perspective on the results.

Nvidia extends LLM memory with tiered KV caching and Dynamo engine

By

Chris Mellor

-

July 7, 2025

Nvidia GPUs store vectors as key-value pairs in a large language model (LLM) memory cache – KV cache – which is tiered out in a multi-level structure ending with network-attached SSDs.

Vectors are encoded values of multi-dimensional aspects of an item – word, image, video frame, sound – that an LLM deals with in its semantic searching for responses to input requests. Such requests are themselves vectorized and the LLM processes them and looks for elements in a vector store to build its response. These elements are key-value pairs held in a GPU’s high-bandwidth memory as a KV cache. Problems occur when the vectors needed in a particular response session are larger than the GPU memory available. Then existing vectors are evicted and, if needed again, recomputed – which takes time. It’s better to move them down the memory-storage hierarchy so that they can be read back in to GPU memory when needed, rather than being recomputed. That’s what tiered KV caching accomplishes and Nvidia’s Dynamo software achieves it.

An LLM has two phases when it processes a response: prefill and decode. During the prefill phase, the input request is broken down into tokens – basic words or sections of words – and these are vectorized and represented in memory as KV pairs. This process is computationally intensive and can be parallelized. The decode phase is where the LLM builds its output, a token at a time, in a sequential operation. Each new token is predicted based on previously generated tokens and the result stored in the KV cache. The first output token depends on all prompt tokens. The second output token depends on all prompt tokens plus the first output token. The third output token again depends on all the prompt tokens plus the first and second output tokens, and so on.

When the output is complete, the KV cache contents are still in GPU memory and may need to be retained for follow-up questions from the user or for use by an iterative reasoning LLM. But then a new request comes in and the KV cache contents are evicted. Unless they are held somewhere else, they have to be recomputed if needed again. Techniques like vLLM and LMCache offload the GPU’s KV cache to the GPU server’s CPU DRAM, second-tier memory, which can be larger than the available GPU memory.

Dynamo is a low-latency KV cache offload engine that works in multi-node systems. It supports vLLM and other inference engines such as TRT-LLM and SGLang and large-scale, distributed inferencing. Dynamo works across a memory and storage hierarchy, from HBM, through a CPU’s DRAM, to direct-attached SSDs and networked external storage.

It has four features: Disaggregated Serving, Smart Router, Distributed KV Cache Manager, and Nvidia Inference Transfer Library (NIXL). Nvidia says: “Disaggregating prefill and decode significantly boosts performance, gaining efficiency the more GPUs that are involved in inference.”

Version 1.0 of Dynamo enabled KV cache offloading to system CPU memory, and is being extended to support SSDs and networked object storage in subsequent releases. It is open source software.

Many storage suppliers support Nvidia’s AI Data platform and its included Nvidia AI Enterprise software with NIM microservices, of which Dynamo is part. We understand that Cloudian, DDN, Dell, Hitachi Vantara, HPE, IBM, NetApp, PEAK:AIO, Pure Storage, VAST Data, and WEKA will all be supporting Dynamo, as will Cohesity. Hammerspace and Pliops also support KV cache tiering.

As examples of this:

Cloudian will be supporting KV cache tiering
DDN says its Infinia object storage system “is engineered to serve KV cache at sub-millisecond latency.”
VAST Data has a blog about its Dynamo support. This says: “The distributed architecture behind Dynamo naturally supports implementing disaggregated prefill and decode. This serves as another strategy for enhancing scheduling across accelerated computings to boost inference throughput and minimize latency. It works by assigning a set of GPUs to run prefill and having NIXL move the data using RDMA to a different set of GPUs that will perform the decode process,” as seen in the diagram below.
A WEKA blog discusses its approach to tiered KV caching with the Augmented Memory Grid concept, noting that “when storing the cache outside of HBM, WEKA Augmented Memory Grid stores KV cache rapidly and asynchronously to maximize efficiency.” As a performance example, it says: “Based on testing within our Lab with a eight-host WEKApod with 72 NVMe drives a single eight-way H100 (with tensor parallelism of eight) demonstrated a retrieval rate of 938,000 tokens per second.”

goHardDrive RMA customer details vulnerability

By

Chris Mellor

-

July 7, 2025

Developer Michael Lynch returned a disk drive to supplier goHardDrive after finding a flaw in its Return Merchant Authorization (RMA) process that he claimed could have accidentally published thousands of customer details. The reseller responded by improving access control, and later disabling the lookup feature entirely to mitigate the risk.

The goHardDrive business is a direct-to-consumer disk and SSD sales operation based in La Puente in the greater Los Angeles area, California, with its own website and sales also made through e-commerce sites such as Amazon, eBay, Newegg and Walmart. It says it focuses on quality of services and lowest prices, “specializing in providing computer-related excess inventory, manufacturer-closeouts, high-demand and unusual computer components and peripherals at highly-discounted prices to you!”

Lynch claims that “goHardDrive Leaked Personal Data for Thousands of Customers“ based on his own experience. He returned three purchased disk drives to the etailer, two of which were dead-on-arrival, using its RMA procedure, getting a 5-digit RMA number. He used this to check the status of his return and the website window showed this along with his name, postal address, email address, order number and date, products being returned and the reason for their return. It would also have included his phone number but he had not provided that. So far so good.

But he repeated the check, mistyping the last digit of his RMA number, and says he got another customer’s RMA details. Lynch notes that the RMA check form was “public and had no authentication, rate limits, or CAPTCHA.”

“It would be trivial to write a script that sends an HTTP GET request replacing 12345 with every number from 00001 to 99999 and scrapes the personal details of every goHardDrive customer who had requested a return.”

Lynch says he “emailed goHardDrive about this issue on May 21, 2025. To their credit, they responded within two hours to acknowledge the issue and confirm that they would deploy a fix within three to five business days.” The firm altered their RMA form to require customers to enter their postal (ZIP) code and house number before fetching any details.

This was insufficient to deter any determined hacker, he adds, noting there are ~42k valid ZIP codes and the majority of house numbers “are likely to fall in the range of 1 to 100.” He calculates that “the worst case is that an attacker has to try about 42k x 100 = 4.2M possible combinations to leak details associated with an RMA number. Optimizing by common ZIP codes and house numbers probably means the attacker has >50 percent chance of success after about 50k guesses.”

He “followed up with goHardDrive to tell them that I thought the new mitigations were insufficient.” The RMA status page should not reveal any customer details, only the RMA status. As a result goHardDrive removed “the RMA status check from their website entirely” and said “customers could just email them for status updates.”

Lynch says there is no goHardDrive bug bounty but the etailer “refunded $20 of my $330 purchase as a thank you.” He also complains about the firm’s general RMA procedure in his blog.

There’s no evidence that anyone besides Lynch used the alleged flaw to obtain other customers’ personal details.

A goHardDrive spokesperson told us: “The article’s title suggests a large-scale customer data leak, which we believe is misleading. There was no confirmed data breach or compromise. The assumptions made in the article were based on guesswork involving our RMA number sequence. Additionally, the RMA website is completely separate from our shopping cart platform and does not store any credit card or sensitive payment data. It is used strictly for processing product returns.

“In reality, fewer than 1 percent of our customers use the RMA website. Most customers reach out to us directly via email for returns or RMA status updates. Moreover, the majority of our customers are from platforms like Amazon and eBay, where returns are managed by those marketplaces themselves, limiting any direct interaction with our RMA system or contact us directly via email.

“We do appreciate Michael’s efforts in identifying a potential vulnerability in our RMA status lookup tool. In response, we promptly added verification fields (Street Name and ZIP Code) to improve access control. Ultimately, we decided to disable the lookup feature entirely, as it was rarely used (approximately 10~20 searches per month) and posed unnecessary risk.

“We’ve been in business for over 17 years and always strive to deliver excellent service while safeguarding our customers’ data. Although our RMA database never stored sensitive information like credit card details, we take all concerns seriously and are grateful that this issue was brought to our attention. As a small reseller business, we are not a software company; feedback like this helps us improve.”

VAST Data cracks into HPC with Doudna supercomputer win

By

Chris Mellor

-

July 4, 2025

VAST Data is supplying an AI-focussed storage system for the Doudna supercomputer, sharing the storage billing with IBM’s Storage Scale and its traditional high-performance computing (HPC) parallel file system.

Doudna is a NERSC-10 supercomputer, NERSC being the National Energy Research Scientific Computing Center. It is operated by the Lawrence Berkeley National Laboratory for the United States Department of Energy’s Office of Science, with the University of California managing the Lawrence Berkeley National Lab. NERSC was founded to support fusion research, and Doudna will provide supercomputational power for research into fusion energy, as well as quantum computing simulation. It will also enable DOE-funded researchers to integrate large-scale AI into their simulation and data analysis workflows.

U.S. Secretary of Energy Chris Wright said: “The Doudna system represents DOE’s commitment to advancing American leadership in science, AI, and high-performance computing. It will be a powerhouse for rapid innovation that will transform our efforts to develop abundant, affordable energy supplies and advance breakthroughs in quantum computing. AI is the Manhattan Project of our time, and Doudna will help ensure America’s scientists have the tools they need to win the global race for AI dominance.”

The NERSC-10 system is called Doudna after Jennifer Doudna, the Berkeley Lab-based biochemist who was awarded the 2020 Nobel Prize for Chemistry in recognition of her work on the gene-editing technology CRISPR. Doudna’s storage performance will be up to five times faster than NERSC’s current – Perlmutter or NERSC9 – system and offer performance guarantees for time-sensitive science. Doudna’s computational performance will be ten times greater than Perlmutter.

It is a Dell system, with ORv3 direct liquid-cooled server technology, using Nvidia’s Vera Rubin platform composed from “Rubin” GPUs and “Vera” Arm CPUs. It has two roles; supporting large-scale HPC workloads like those in molecular dynamics, high-energy physics, and, secondly, AI training and inference.

Doudna’s architects have selected two storage systems, one for each role; a quality-of-service storage system (QSS) for the AI work and a platform storage system (PSS) for the traditional HPC storage needs.

PSS storage will be provided by IBM’s Storage Scale, a long-term and popular high-performance parallel file system for modeling and simulation workloads running at scale. The QSS will use VAST Data. Both are all-flash systems. This is a big win for VAST, marking its entry into an HPC, parallel file system, citadel

VAST’s win was hinted at in June. Now it’s confirmed that Doudna will use VAST Data’s AIOS (AI Operating System) which “unifies data storage, database, compute, messaging, and reasoning capabilities into a single, data-centric infrastructure built from the ground up for AI and agentic workflows.”

VAST Co-founder Jeff Denworth stated: “With the VAST AI Operating System, NERSC is pioneering a new model for Doudna, where users get guaranteed performance, security, real-time access, and built-in data services – without the operational friction of traditional HPC systems. Together, NERSC and VAST are setting the blueprint for exascale computing, enabling breakthrough capabilities that will define the next era of scientific computation.”

NERSC HPC architecture and performance engineer Stephen Simms said: “The addition of quality-of-service [QSS] will provide predictable performance through fine-grained control of file system capability. This partnership will further our aim to enhance the user experience in the service of science.”

The IBM Storage Scale system will be all-flash parallel scratch filesystem supporting file and object access. It, NERSC said, “delivers high speed, scalable performance, and automated efficiency designed to help eliminate bottlenecks and streamline data workflows, empowering researchers to focus on discovery instead of infrastructure management.”

IBM’s Vanessa Hunt, GM, Technology, US Federal Market, bigged up Storage Scale’s management ease, stating: “IBM Storage Scale is purpose-built to support the next wave of American innovation – delivering the speed, flexibility, and reliability needed to power breakthrough discoveries, while simplifying data management in even the most complex HPC environments.”

Denworth posted a comment on X: “First met NERSC in Jan of ’18. Longest sales cycle ever 🙂 I’ve been telling everyone for years that NFS is suitable for hyperscale HPC, now it’s happening. If you want “Quality” choose @VAST_Data.”

It doesn’t actually appear to be happening though, as NERSC is using Storage Scale for the trad HPC workloads with VAST selected for the AI work.

Bootnote

(1) The Next Platform published a Doudna article looking at the compute side here.

(2) We understand that Doudna’s storage architecture includes tiering, which optimizes data placement across different storage layers to balance performance and capacity. This suggests there is a disk-based backing store.

Storage news collection – July 3

By

Chris Mellor

-

July 3, 2025

Commvault announced a strategic partnership with Bytes, “a leading provider of IT solutions across the UK and Ireland. This partnership will focus on providing the joint customer base with the tools and services needed to advance their cyber resilience in the UKI.” It said of the deal: “We are thrilled to announce our strategic partnership with Commvault. This collaboration is rooted in a shared belief that our combined expertise offers a unique and effective approach to combating the relentless threat of cyber-attacks,” said Hayley Mooney, Chief Commercial Officer at Bytes. “By leveraging Commvault’s cyber resiliency solutions, alongside Bytes’ tailored professional services, we are confident that this partnership will provide unparalleled support and security for our customers.”

…

Geo-distributed cloud storage enabler Cubbit announced that the TV channels of De Agostini Editore (DeAKids and DeAJunior), and KidsMe have migrated 1 PB of data to WIIT’s European Cloud Vault, a fully-managed cloud object storage service enabled by Cubbit’s DS3 Composer technology. Cubbit enabled the WIIT Group to deliver this service by integrating its cloud technology into WIIT’s data centres. It says “the result is a service that combines the performance and security of a hyperscaler with greater cost efficiency, ease of use, and full data sovereignty — enabled by a technology entirely designed, developed, and operated in Europe.”

…

HPE is partnering data manager and migrator Datadobi to to integrate its enterprise-grade object replication capabilities inits StorageMAP software into HPE’s ecosystem. Datadobi’s replication offering, built into StorageMAP’s Unstructured Data Mobility Engine, delivers replication for S3-compatible object data. StorageMAP replication works in multi-vendor, multi-site, and hybrid cloud configurations. HPE says it and Datadobi “offer a seamless method for organizations to replicate critical object data into the HPE Alletra Storage MP X10000 system, enabling customers to address the challenge of moving data from in-family or third-party object storage platforms with confidence and control.”

…

Gartner has named DDN a Sample Vendor in two categories in its Hype Cycle for Storage Technologies, 2025 (subscription required): Storage Platforms for Generative AI and Open-Source Storage Software.

…

Edge website accelerator Harper announced v4.6 of its composable application software, adding “vector indexing for the efficient storing and retrieving of high-dimensional vector data – essential for bringing contextual depth to AI models like smart search.” It’s “powered by the Hierarchical Navigable Small World (HNSW) algorithm, allows for quick and accurate nearest-neighbor search, which is essential in applications like recommendation systems, personalized content feeds, chatbot retrieval, image recognition, and natural language processing. The addition of vector indexing to the Harper platform eliminates the need for third-party vector databases – semantic caching can be done natively in Harper, helping bring down the overall costs of running AI models.”

Other new features include new plugins API with support for dynamic loading, HTTP logging for improved formatting, control and debugging, new data loader for pre-loading content and Resource API updates. Read the Release Notes for more info.

…

Cloud data backup target and data protector Keepit is now backing up Atlassian’s Jira and Confluence Cloud. Its service features automated backups and granular restores with monitoring of snapshot data to automatically detect anomalies. Users can compare backup snapshots to identify records added, modified, or deleted over time, enabling precise recovery.

…

Keepit has hired Thierry Bedos as VP South EMEA. He was previously the SVP EMEA at cyber-security supplier Forcepoint. Data sovereignty issues are expected to accelerate demand for localized cloud storage in Europe.

…

NetApp has been named a winner of the 2025 SE Labs Award for Enterprise Data Protection, “reinforcing its status as the most secure storage on the planet.” NetApp ONTAP Autonomous Ransomware Protection with Artificial Intelligence (ARP/AI), was tested and validated by SE Labs. It demonstrated 99 percent detection of tested, advanced full-file encryption ransomware attacks with zero false positives, indicating a strong ability to operate in a business context without contributing to alert fatigue.

…

OWC announced its hardware-encrypted Guardian portable SSD wth aUSB-C interface. It is compatible with USB-C Macs, iPad Pros, and PCs. Users connect the device, enter a password using the built-in color touchscreen, and can then transfer files via drag and drop. There is no software installation required. OWC Guardian uses 256-bit AES OPAL hardware encryption to automatically encrypt data during writing and decrypt it upon authorized access. The touchscreen allows PIN or passphrase entry, and additional features include multi-user access, read-only mode, secure erase, and randomized keypad layouts. It provides up to 1000MB/s in real-world transfer speeds and capacities up to 4.0TB. It is now GA starting at $219.99 for 1.0TB, $329.99 for 2.0TB, and $529.99 for 4.0TB.

…

Fabless startup Panmnesia has unveiled a lineup of Link Solution products designed for AI infrastructure. The lineup encompasses the entire technology stack required for AI infrastructure design, including hardware, silicon IPs, network topology, and software. By delivering cross-stack optimization, it says its Link Solution “significantly reduces communication overhead and enhances overall system performance” and enablies composable architecture, “allowing seamless addition, removal, and reconfiguration of diverse devices such as GPUs, AI accelerators, and memory modules, to meet the evolving needs of AI applications.” And it accelerates large-scale AI workloads by minimizing inter-device communication overhead. There’s more information on its website.

…

Percona announced GA of the Transparent Data Encryption (TDE) extension for Percona for PostgreSQL – the first time that fully open source TDE will be available. PostgreSQL is the only major open source database that doesn’t support TDE in its open source version. Percona will make TDE available as part of its PostgreSQL distribution, Percona for PostgreSQL – this provides access to TDE without restrictions. This will also integrate with key management services from the likes of Hashicorp, Thales, Fortanix and OpenBao – so it is easier to manage the encryption over time.

…

Data integrity supplier Precisely introduced an MCP server to its AI ecosystem for data integrity. The vendor claims its API “enables AI applications to easily access and integrate trusted location intelligence tools and property, location, and consumer datasets with enterprise AI using natural language. Built on the open-source Model Context Protocol (MCP) developed by Anthropic, Precisely’s MCP server makes it dramatically easier for enterprises to build spatially aware, data-driven AI solutions.”

…

Tiger Technology announced that Tiger Bridge, its flagship hybrid cloud data management software, is now officially available on AWS Marketplace. Customers can now procure, deploy and manage Tiger Bridge directly through their own AWS account, with flexible subscription models adapted to diverse hybrid storage needs.

Kioxia tunes SSD-based vector search for RAG workloads

By

Chris Mellor

-

July 3, 2025

Kioxia has tweaked its AiSAQ SSD-based vector search by enabling admins to vary vector index capacity and search performance for different workloads.

RAG (retrieval-augmented generation) is an AI large language model (LLM) technique in which an organization’s proprietary data is used to make the LLM’s responses more accurate and relevant to a user’s requests. It relies on processing that information into multi-dimensional vectors – numerical representations of its key aspects – that are mapped into a vector space, where similar vectors are positioned closer together than dissimilar ones. An LLM looks into this space during its response generation in a process called semantic search to find similar vectors to its input requests, which have also been vectorized.

AiSAQ (All-in-Storage ANNS with Product Quantization) software has the ANNS (Approximate Nearest Neighbor Search), used for semantic searches across vector indices, carried out in an SSD instead of in DRAM, thus saving on DRAM occupancy and increasing the size of a searched vector index set beyond DRAM capacity limits to the capacity of a set of SSDs. Kioxia has updated the open source AiSAQ software to enable search performance to be tuned in relation to the size of the vector index.

Neville Ichaporia, Kioxia America SVP and GM of its SSD business unit, stated: “With the latest version of Kioxia AiSAQ software, we’re giving developers and system architects the tools to fine-tune both performance and capacity.”

As the number of vectors grows, increasing search performance (queries per second) requires more SSD capacity per vector – which is limited by the system’s installed SSD capacity. This results in a smaller number of vectors. Conversely, to maximize the number of vectors, SSD capacity consumption per vector needs to be reduced, which results in lower performance. The optimal balance between these two opposing conditions varies depending on the specific workload.

The new release enables admins to select the balance for a variety of contrasting workloads among the RAG system, without altering the hardware. Kioxia says its update makes AiSAQ technology a suitable SSD-based ANNS for other vector-hungry applications such as offline semantic searches, as well as RAG. SSDs with capacities of 122 TB or more may be particularly well-suited for large-scale RAG operations.

You read the scientific paper explaining AiSAQ here, and download AiSAQ open source software here.

Progress snaps up Nuclia for agentic RAG tech

By

Chris Mellor

-

July 3, 2025

Business app AI tool supplier Progress Software is buying Nuclia, a Spanish startup with RAG-as-a-Service software.

Nuclia was founded in Barcelona, Spain, in 2019 by CEO Eudald Camprubí and CTO Ramon Navarro. They raised €5.4 million in seed funding in 2022 from Crane Venture Partners in the UK and Elaia in France. No additional funding rounds have been disclosed. The company’s software vectorizes unstructured data documents and stores the vectors in a Nuclia-operated cloud or an on-prem open source NucliaDB database. Source files can be Word documents, PowerPoint slides, images that can be scanned with OCI, and audio files that can be transcribed. Multiple languages are supported.

Progress says customer organizations can use Nuclia’s RAG service to vectorize their unstructured data, and enable them to automatically use their proprietary information to retrieve verifiable, accurate answers using GenAI.

Yogesh Gupta, Progress Software CEO, stated: “Nuclia’s easy-to-use, self-service SaaS product democratizes the use of trustworthy and verifiable GenAI. Small to mid-sized businesses, as well as large global corporations, can quickly and easily reap the benefits of sophisticated agentic RAG capabilities using Nuclia SaaS without the need for significant upfront investment.”

Camprubí added: “The rapid evolution of AI has transformed how organizations interact with information, creating new possibilities for more accurate, dynamic, and context-aware systems. Agentic RAG is a cutting-edge approach that combines the power of large language models (LLMs) with businesses’ own proprietary data to provide accurate and trustworthy answers. Our team at Nuclia is proud of what we have built, and we are excited to join Progress to continue to advance this important technology.”

Eudald Camprubí, Nuclia — Eudald Camprubí

Progress says Nuclia will extend the end-to-end value of its Progress Data Platform while creating opportunities to reach a broader market of organizations looking to take advantage of agentic RAG technology. This likely refers to small and medium-sized businesses. The acquisition of Nuclia is expected to enhance multiple Progress product lines with advanced AI capabilities.

The company reported a 36 percent increase year-over-year in total revenue for its second fiscal 2025 quarter to $237 million. Gupta said on an earnings call: “With Nuclia, we have accelerated the R&D process by purchasing great technology that addresses an urgent market need, and we will rapidly integrate it with our products. This will allow us to incorporate additional agentic RAG AI features that help our existing customers speed up their own GenAI initiatives, thereby enabling us to continue to drive strong customer retention. You’ll hear more about Nuclia in the coming quarters as we integrate this cutting-edge technology with our products.”

“This acquisition was primarily driven as an investment in our product portfolio. AI is sort of third major wave in the enterprise software that we see dramatically changing the landscape. This is a rather modest purchase price for a leading-edge technology around agentic RAG solutions for genAI. So we feel really, really good about it. Both for adding the technology as well as bringing on a strong team that can help us continue to move this technology forward, integrate it with our products and go to market.”

The acquisition was signed and closed on June 30 and is not expected to have a material impact on Progress’s financials. The price was not disclosed but a figure of $50 million has been floated.

DDN on Infinia object storage and solving the POSIX problem

By

Chris Mellor

-

July 3, 2025

Interview: We spoke to DDN SVP for Products James Coomer and CTO Sven Oehme about the status of its relatively new Infinia object store and about details on its claimed performance, multi-tenancy, and resilience.

Blocks & Files: What’s the status of Infinia?

James Coomer: We’ve now been in production for quite a while. v2.2 is coming out shortly and it has strong multi-tenancy features with security/resilience, performance and capacity SLAs.

In Infinia’s architecture everything is a key-value store at the base of the system, unlike other systems which may have a block storage base layer, then a file system and then exporters for objects.

By using a key-value store, it allows us to basically distribute all the data in the system equally across all the devices because keys and values are very easily distributable, you can basically spread them across all the devices.

We are using a Beta Epsilon tree data structure, which is a very advanced key-value store essentially that has a nice balance between reads and writes. If you look at previous data structures, either they optimize for reads or they optimize for writes.

That means in Infinia [that] every data source that we provide – it doesn’t matter if this is object, various forms of object, we are going to come out with different data services besides block that we already support – everything is a peer to each other. Instead of layering things on top of each other, everything is a peer.

Blocks & Files: How about its performance?

James Coomer: We’ve traditionally been very strong for throughputs, obviously, and IOPS and single threaded throughput; those are all very important and still are in the world of AI. But with Infinia we’re also concentrating on the other end of the spectrum, which is listing performance: time to first byte and latency. Listing performance is when you have a million things in a bucket. I want to find all of those objects which are images with pictures of cats.

That’s like the stereotypical requirement inside a RAG query. You run your RAG query and it’s got to search a massive knowledge base and find things relevant within a fraction of a second. What you need is a storage environment, a data platform, which is going to allow you to search, list and retrieve, not just do IOPS and not just do throughput. You’ve got to find it first. And finding things first is like the new challenge of AI. It’s how quickly can I find this data, which is millions or even hundreds of millions of objects.

Sven Oehme: When you look at AWS and a lot of the other objects stores, what they basically do is they put all of those things in a database on the site. Even if you don’t see that, there is something running in the background that is basically typically an in-memory database to do some of these things. Still, despite that, they have not been able to actually make this really fast.

With Infinia this is actually a property of the way we lay out the data structures in the key-value store because, with the key-value store, we can actually auto index, prefix and load balance all the objects that are created – because it is a one-to-one mapping between the object name and the key value in the key-value store.

That allows us to basically do things like prefix rendering and also prefix prefetching. And so when you run an object listing on AWS, you get about 6,000 objects per second if you’re really lucky, maybe 10,000 per second. We’ve demonstrated in production at one of our first customer object listing with a single thread of over 80,000 per second. Where we use multiple threads, which doesn’t actually scale on AWS because they have no way to actually parallelize the listing, we were able to, on that same data set, improve this to 600,000 per second. So we are basically about 100 x faster than AWS in object listing. It doesn’t stop there. If you actually run this across multiple buckets, in parallel on one of our systems in production, we are able to do 30 million objects per second listing across multiple buckets.

Blocks & Files: And latency?

Sven Oehme: If you look at latency, the time to first byte, [then] when you go to AWS S3, your latency is about a hundred milliseconds per request. If you go to a S3 Express, which is about 12 times more expensive than S3, you get about a 10 millisecond response time. If you deploy Infinia in AWS on AWS VMs, your latency is about a millisecond.

Basically from S3, through S3 Express to Infinia there is a 100x improvement in latency, while Infinia actually has a lower cost point running on AWS than AWS S3 Express itself.

Now, when we are talking about [a] put and get operation in a millisecond or less, you are in file system latency territory. You can basically now run very interactive workloads on an object store that before you could only run on file systems.

What we believe is we are going to have a very low latency object store implementation that allows you to do distributed training in lots of different sites, maintaining very low latency compared to a file system, but in a distributed way like an object store can operate.

James Coomer: We’ve kind of got Stockholm syndrome when it comes to POSIX and file. We think we love it, but we only love it because we’ve been beaten up and punished by it for 20-plus years. And actually if you stand outside the circle and look at it, it’s awful. POSIX is terrible in today’s world with massive aggregated access, distributed data, huge amounts of metadata. It does nothing well and object does everything well, apart from latency and that single-threaded throughput. And those are the challenges we’ve been able to solve.

Blocks & Files: Tell me about Infinia’s multi-tenancy

James Coomer: When incoming requests come in, we can take a look at, okay, what tenant is it? What subtenant? And then we can completely distribute this [data] across all the available resources. While if you take the traditional approach, typically people have volumes and volumes that are assigned to a particular tenant. So there is a direct linkage between capacity, performance and a consumer. While in our system there is no linkage.

[This] means we can do crazy stuff like you get 99 percent of the performance but you only get 1 percent of the capacity or you only get 1 percent of performance, but you get 99 percent of capacity.

These are extremes that you can’t do on legacy systems because it’s basically more resource allocation than SLA-based management. We can really, truly do SLA-based management on performance, on capacity, and also very importantly, on resilience.

You can say this particular set of data I’m storing here for this particular tenant, it’s so critical it needs to survive site failures. But these other data sets over here, they’re not critical. I want to get the highest performance, so therefore they should be locally stored wherever that data and consumer resides.

Being able to dial this in on every single data service [and] in one gigantic shared infrastructure is something that is very, very unique.

Blocks & Files: Tell me about Infinia’s resilience

Sven Oehme: We talk basically resilience in the form of how many failures can we survive? And if you have a small system that is all in one rack, the only things it can really survive are nodes and drives. But if you deploy an Infinia system across physical sites, [say] five physical sites, you can define a data set that is so critical, it needs to survive a site failure.

Then we automatically either apply erasure coding that is wide enough to cover the site failure or we dynamically apply replication. And that depends on the size of the IO you do and what is more efficient from a resilience versus capacity planning point of view. But the system does this all automatically. So all the end user does is apply an SLA for resilience and then the system automatically takes care of how that protection can be handled in the most efficient way.

Blocks & Files: How does Infinia treat SLAs?

James Coomer: When it comes to multi-tenancy, typically people want a service level which defines a minimum service, right? They’re never going to get less than X, Y or Z. And it seems to me what everybody else has done is basically implemented something that’s quite crude, which is a maximum. Which is kind of the opposite of what people are asking for. They want to know that applications are never going to be experiencing a downfall in performance because they’re getting interactions with other tenants. What they don’t want is to be limited to an artificial ceiling when there’s lots of resources available.

Sven Oehme: Our implementation of quality of service actually assigns priorities amongst the tenants and then within the tenants amongst their subtenants and individual data services. So that means if you’re not using your share in some of these other peer neighbours, you can take full advantage of the system, but as soon as your peer neighbours within that same quality of service as a group and start using their parts, everybody gets assigned into their fair share lane, which means if you have multiple medium priority, a bunch of low and one high priority workload, if that high priority workload doesn’t run, everybody else can use the full set of performance of the system.

As soon as the high priority workload comes in, it gets the maximum share possible and everybody else still doesn’t get stalled out, but they get the appropriate value to it. And so basically we have a notion of up to 64 individual levels of quality of service priority. We don’t expose them to the end user because it’s typically too complex for people to deal with that amount of high granularity. So we provide three preset defaults, which are basically a high priority default, a low priority default and a medium priority default.

The end user could, if they want to, do this through the REST interface or CLI to adjust this, but these are the preset values.

Blocks & Files: DDN is changing its spots a little bit. It’s no longer about just providing the fastest possible parallel storage for HPC and AI sites. It’s developing a software stack.

Sven Oehme: Absolutely. And in fact, you see this in a lot of other places. I don’t know if you’ve seen our RAG pipeline demo that we’ve done on AWS. We built an entire end-to-end RAG pipeline. We deployed this on AWS and we used only AWS services for the first version of it. And then what we did is we basically replaced two components. So everything was using just AWS internal services for GPUs for the storage. We used AWS S3 Express and then what we did is we added a highly optimized version of the Milvus database using GPU offloading that we’ve co-developed together with Nvidia. And then we replaced the S3 Express object store with our Infinia object store interface and we were able to speed up that entire RAG pipeline by 22x and we were able to do this while reducing the cost by over 60 percent. You don’t need to run on on-prem super highly-powered hardware.

Blocks & Files: Are there built-in services?

James Coomer: We do data reduction by default. There is not even a knob or any tuning or anything to tweak with it. It’s just on all the time. We encrypt all the data all the time. Also here, there is no knob, there is no tweak you need to do whenever you set up an Infinia system and automatically there is encryption. If there is no hardware acceleration for encryption, we do everything in software. If there is hardware, we take the hardware encryption.

Blocks & Files: Can you discuss your roadmap at all?

James Coomer: There is a lot of work we are doing together with Nvidia right now to integrate in a lot of upcoming features like KV Cache support. There’s upcoming support for GPUDirect for Object. We were a little bit late for that, but there is a very good reason for it because our Infinia software stack provides a software development kit which already did RDMA offloading of data transfers into the system. And given that this is the main feature that is provided by GPUDirect for Object, we are just adding this now.

Cristie verifies backup recoverability in clean room for major platforms

By

Chris Mellor

-

July 2, 2025

Cristie Software can automatically verify the recoverability of backups and systems from Rubrik, Cohesity, IBM, and Dell within a clean room environment.

UK-based Cristie is a system recoverability and replication supplier with some 3,000 customers, including a third of Fortune 500 companies, and patented tech dealing with automation and applying machine learning during recovery and replication processes. The new Continuous Recovery Assurance feature is part of its BMR (Bare Machine Recovery) product set and performs a full system recovery to a clean room environment whenever a new backup is detected on a protected system. This simplifies operational testing and reduces risk.

Scott Sterry, VP Business Development at Cristie Software, stated: “As the cybersecurity threat landscape evolves, organisations must go beyond basic backup and recovery. They need verifiable, proactive assurance that physical and virtual systems can be recovered at any time, under any conditions.”

Organizations need to be able to trust the recoverability of their backup datasets, the vendor said, pointing out that the middle of a ransomware attack is the worst time to learn your backups are unusable.

Sterry added: “Many businesses are waiting until a disaster to test their system recovery, often due to resource constraints, or overconfidence in their system and data backup processes. By integrating an isolated clean room environment with automated recovery validation for protected systems, Cristie overcomes these problems to strengthen enterprise resilience against ransomware, data corruption, and infrastructure failures.”

Physical systems plus major cloud and virtualization platforms can be supported with Continuous Recovery Assurance, which supports Rubrik Security Cloud, Cohesity DataProtect, Dell Technologies’ Avamar, NetWorker, and PowerProtect Data Manager, and IBM Storage Protect and Storage Defender.

This Continuous Recovery Assurance function is a companion to Cristie’s existing Advanced Anomaly Detection utility, which delivers an early warning of malware file encryption within backup files.

Cristie does not partner with Commvault, which has its own backup recoverability testing features through its Cleanroom Recovery and CommServe Recovery Validation service. Nor does it partner with Veeam, which also has its own clean room recovery offering. Druva has a Curated Recovery function, which can automatically step back in time through backup jobs and extract the latest clean copy of the data, compiling them into a brand new composite “GOLD” copy of your data. HYCU can restore backups, Nutanix for example, to a clean room for forensic analysis.

Continuous Recovery Assurance is available in Cristie Virtual Appliance release 5.2.1 and above.

Cloudian: AI inference will need an immense amount of storage

By

Chris Mellor

-

July 2, 2025

Interview: In a briefing with Cloudian founder and CEO Michael Tso, we explored the data storage demands of AI and learned that inference could require vast amounts of context data to be stored online and that compute will have to come to this data. This conversation has been edited for brevity and flow.

Blocks & Files: Do you think that AI is going to become really, really important? It’s not just a bubble?

Michael Tso: I think it’s going to be world-changing. I don’t want to exaggerate, but I think this is one of the things where it’s a kind of James Watt-type moment.

I’m not sure where that leaves humanity because we’re basically automating ourselves right now. We’re automating all the work that we normally do and I think it is happening at the pace that the soft tissue, the biology, is not able to adapt. So I think that’s a problem. I think we are going to have to figure out very quickly what are we going to do with ourselves.

Blocks & Files: Are you using some of the advanced AI inside Cloudian, for instance, for doing some programming work?

Michael Tso: It’s always supervised. We’re having AI write some of the code and it’s very good at analyzing code and fixing things and telling you why it’s not working. We were just doing some testing on some GPU server, and we were stuck there. It looked like some kind of hardware setup wasn’t right. So we just asked the AI, what should be the right kind of file setup? Normally stuff like this would take ages because it’s a new box, you’re not going to find anything on it. But it gave us a bunch of suggestions. We tried it and we got beyond where we were stuck on and things got much faster. It’s like any tool, we need to learn how to use it to our advantage.

Blocks & Files: How do you see the future here? Will Cloudian carry on providing a fast, reliable storage layer to feed AI and other applications?

Michael Tso: I believe compute’s going to come to the data. I believe that data gets so big, compute’s going to have to come to the data.

I think what we have right now in hyperconverged is one end of the spectrum where the compute is using small amounts of data and you would suck the data into the compute and it’s a lot faster. Cloudian all these years has been working on the opposite end of the spectrum, with the idea that data gets so heavily gravitational that it’s going to attract the compute. Nvidia actually shares this vision.

Where we are going right now is we’re building Cloudian into a full-fledged data processing platform. We’re no longer a storage-only platform. The idea is that we’ll take data in and we will turn it into different formats that can be easily consumed and can be easily processed by different AI tools. Think of it as, normally you would just take documents from this company and we will be storing it and we’ll put a legal hold, all that stuff, right on it. But now when it comes in, we will vectorize data. We’ll put in a vectorized database.

The way we see Cloudian is, Cloudian is a true platform in a sense that you can ingest data and you can plug in these modules that will process the data and these modules can then create their own data. We’ve built prototypes of this very early on. We were interested in video. We could do automatic tagging on a video.

That’s one example. But imagine now doing this at a very big scale on any kind of data, any kind of plug-in. And the plug-ins now, what we are working on right now are the Nvidia inferencing microservices, the NeMO retriever and NIM.

What we are working on now is the inferencing pipeline. You take a user question and you first add all the context to the question. Then you take the context and pass it to your AI model. First to your local model that has all your enterprise knowledge. Then to the global model that has knowledge of the universe. You combine these and then you go back and check if this is the answer they’re looking for. And if it’s not right, then you’ve got to fix something and go back that again. And that’s the inferencing pipeline.

And at the end you come out with an answer and then you go back again. So storage is incredibly important in this pipeline because, one, that whole “my data” concept is key. That’s always going to stay inside the enterprise. That’s one. Two, the other one that people miss is the user data. So, in order for me to know what you are asking, I actually have to load all the past history conversations that I had with you.

Blocks & Files: You have to have context for the user to enable you to interpret the question?

Michael Tso: That’s exactly right. [With the AI tools] in the beginning it was a Q&A thing. It did not remember anything. But now it remembers everything about you. You might have heard things called LM cache and KV cache. And what these things do is that they’re basically caching your previous conversations. Essentially it’s caching the token input and the token output. So it doesn’t have to recalculate that again. And it’s caching that in a vectorized way with a very fast search.

Blocks & Files: So in a particular AI question response session you need the tokens for that response session when the response is first made, and you need the tokens for that user’s context. Which means you have to store them.

Michael Tso: That’s right – and you have to store that forever, for billions of people.

Blocks & Files: On what? Tape?

Michael Tso: It cannot be on tape. This is online. It has to be online. I think it’s going to be a tiered solution. I think it’s going to be difficult to store all our lives on NAND.

Blocks & Files: You’re talking exabytes.

Michael Tso: Yes, exactly. When Nvidia first came to Cloudian and said we want to work with you. At that time I was like, “Hey, why do you need to work with us? You seem quite happy with DDN and VAST?” They said: “Well, we need you guys for training because data is getting bigger, but more importantly, we need you guys for inferencing” and “inferencing needs a lot of storage.”

I’m like, really? The models are kind of small, aren’t they? So I didn’t understand that at the time, [but then] I realized, oh my God, who wants to talk to an AI that every time I need to tell it, OK, I’m a 50-something-year-old male. And it’s stupid, right? The great thing about having an assistant, someone who’s worked for you for 20 years, they know everything about you. So you ask for something, they give you the right answer. So that’s what we expect our AI to be. We expect them to remember everything about us forever. Everything we told it – it should know, right? So if you think of storing this, I mean, this is immense. This is an immense amount of storage.

Blocks & Files: You must, I’m thinking, have talked to customers, potential customers about this, and they must have sat across the table from you. What did they say?

Michael Tso: Everybody’s reaction is: “Wow, we never thought about it like this.” This is real. This is exactly everybody’s reaction.

Basically, Nvidia, a year and a half ago, realized that the money in AI is initially all in training. But eventually it’s in inferencing because everybody’s going to have to do it. And for the money to be made in inferencing, you actually have a storage problem. It’s not actually only a compute problem. They have a storage problem and they need to have a distributed big scale storage.

And what does that mean? Distributed big scale storage at reasonable cost. That’s object storage. So that was why they came talking to us. You are going to want something that’s tiered, that allows you to start small and grow.

We come from the enterprise. So we look at it and say, oh, that’s interesting. So what does that mean for us? It means that the architecture we already have, which is a peer-to-peer distributed architecture that can spread around the world. That is perfect. Right? We already have a way to plug in and do compute on it. Really perfect. What we have to do now is that we actually have to integrate way more compute in the platform sooner than we thought we were going to have to.

Do you put GPUs in every Cloudian storage node that right now only has CPUs and NICs and storage, right? Do you put the GPUs in these nodes?

There is another way. In our storage cluster, you build a compute cluster. We traditionally believe that you want to compute as close as practical to the data. So we will end up having what we do now, which is you’re doing some compute that’s on the data on the nodes, and then the heavy stuff we’re going to be doing on our internal compute cluster.

Because that way we can size that out easily. And that way we can also add that on later. Because a customer who buys a Cloudian system may not on day one know that they’re going to be running a vector database. They may not want to do that, but somewhere down the road, they’ll want to do this inferencing.

Blocks & Files: You could implement this in the cloud?

Michael Tso: Yes.

Blocks & Files: And your vision is?

Michael Tso: Our vision is that a customer can choose whatever inferencing modules they want on top of our platform. So we will provide some, but they can provide others; it’s an open platform. So we will move on from your long-term data storage archive platform. We’re already doing a lot of microservices and modern apps. So we’re already in a tier one storage. But then when we get into this interesting AI world, we’re now part application software. But that makes sense because, essentially, the infrastructure is upleveling. If you are only doing storage a couple years from now, you’re not going to be competitive.

Western Digital sees HAMR capacity advantage in roadmap

By

Chris Mellor

-

July 2, 2025

Interview: We had the opportunity to speak with Ahmed Shihab, Western Digital chief product and engineering officer, and he told us that WD’s HAMR tech is progressing well, that OptiNAND can provide a capacity advantage, and that the company has a way to increase the bandwidth per terabyte of disk drives.

Shihab joined Western Digital in March, following more than a year as a corporate VP for Microsoft Azure storage and eight years as AWS VP for Infrastructure hardware. Azure and AWS will be two of the largest hyperscaler buyers for nearline disk drive storage, which is WD’s biggest market. Shihab will have detailed knowledge of how these hyperscalers evaluate disk drive product transitions and what they look out for, invaluable to Western Digital as it follows Seagate with its own HAMR technology transition. It will not want to endure Seagate’s multi-year HAMR drive qualification saga.

Blocks & Files: Could you start by talking about the state of Western Digital and HAMR, how that’s going?

Ahmed Shihab: Actually, I was a little afraid when I first came in. It’s like, what will I find? But I was really happy with what I found because actually HAMR technology works. That’s the good news. Obviously there’s a lot of work to do to get it to the reliability levels and density levels and things like that. So there’s a lot of engineering work. So the physics, we got that done. The basics in terms of the media and the heads and all the recording technology, we got that done. There is a bunch of manufacturing stuff we have to get done to dial everything in. You know how that works. A new technology; you’ve got to do some cleanup from the initial concept.

We have a couple of customers helping us out. So that was always very gratifying. They’re looking at the drives, they’re giving us a bunch of feedback.

One of the lessons we learned is engage your customers early and being a former customer, I really appreciate that because I always wanted to know how. I don’t want to be surprised by the end of the day when somebody comes to me and says, well, here’s our technology now could you start using it? It takes a long time to qualify. So in one of my prior roles, we always engaged early. I encourage my team to engage early with technology so that we can get on the leading edge of its release and that’s what we’re doing.

Blocks & Files: Do you anticipate that the qualification period with hyperscaler customers for HAMR will be as long as the one Seagate has been enduring?

Ahmed Shihab: I hope not. The thing is there’s an advantage to being a fast follow-up because, if you think about it, AWS was always a fast follow-up for a long time and it did us well. We didn’t start leading until the 2017, 2018 sort of time frame and that is a very good philosophy in the sense a lot of the lessons were already learned. Customers will tell us what to expect and what works and what doesn’t. And we’ve certainly benefited from that experience. So that’s one of the things that’s really helping us accelerate that. And we have been working on HAMR for a long time. The ecosystems were mature. We had developed the technology. It wasn’t as much of a focus because we had the density of roadmap in ePMR, which is also unique to us. That takes us into the high thirties, early forties (TB). So we needed to accelerate it. That’s what we’re doing now.

Blocks & Files: Do you think that with an 11-platter technology you have more headroom for HAMR development? It’s less rigorous than for Seagate with its 10-platter technology to match any particular capacity point.

Ahmed Shihab: It certainly gives us more headroom to play with. We can use the extra platter to give us the overall capacity at lower density. So it gives us more headroom. It means we can go to market faster than they can. It’s not a trivial thing to do, obviously, to operate on 11 versus 10. It sounds easy, but all the tolerances, I’m sure you can appreciate, get tight. So we think that is an advantage and we’re certainly taking advantage of it.

Blocks & Files: Will you use what I understand to be the same tactic as Seagate, which is provide lower capacity drives by stripping out platters and heads and using much of the same HAMR technology and manufacturing?

Ahmed Shihab: I think in time, maybe. Right now we don’t need to. Our ePMR technology is mature, it works, it’s high yield, it’s good margins with the OptiNAND and everything else we’re doing. We can actually continue to deliver that technology for some years.

Blocks & Files: Will OptiNAND give you particular opportunities with HAMR drives that don’t accrue to Seagate or Toshiba?

Ahmed Shihab: Let’s look at what OptiNAND does. UltraSMR (shingling) is what really is enabled by OptiNAND and UltraSMR is an algorithmic gain of capacity. Because of OptiNAND, we can actually get more capacity per platter than our competition can. So it’s an algorithmic thing that translates across the technologies. We expect it to apply equally to HAMR. We’re probably going to be a little less aggressive in the beginning, but it has the headroom, and it’s already implemented in the drives because we take technology from PMR to HAMR. It’s just the recording technology that’s different.

Blocks & Files: Does OptiNAND mean that, again, you’ve got a little bit more headroom with HAMR density than you would without it?

Ahmed Shihab: Yes. So one of the things we want to do is to be able to return that capacity to customers and in the beginning you’ll probably see us being a little conservative. We generally are more conservative. Our roots, we came from IBM and HGST and places like that where we are very, how shall I put this? We’re not flamboyant. We always deliver what we say we’ll deliver. As a customer, I’ll give you an example. We had HGST and IBM and WD drives that lasted way, way longer and with less failure rates than the competition. So we’ve always been appreciative of that and that is part of our culture.

We’re not flamboyant. We’re going to go on with it. We’re going to match and exceed Seagate’s capacity because we can win with things like OptiNAND that all our customers have qualified. They have qualified UltraSMR on ePMR. That’s already qualified in their software and we don’t have to do anything different when they come to the new drives with HAMR in them. They’ll qualify them for physical vibrations, all the usual things we help them with. But beyond that, there’s no new software they have to create.

Blocks & Files: How do you see the disk drive market developing?

Ahmed Shihab: One of the things I would say from being a customer is this: disk drives are really the bedrock of all the data economy that’s been developed. Data is constant. If you think about the cloud players’ object store, S3 and Blob stores, and all this data is constantly moving around, moving up and down the tiers, it’s moving to different regions, and moving for maintenance purposes. It’s all transparent to the user. Nobody sees it. You just apply a policy. You might have a trillion objects but it’s one policy and in the background all this data is moving around and being scrubbed and checked for bit rot and things like that.

It’s a very active environment in the background. So this is where we feel that disk drives have stood the test of time because they can deal with the read write endurance, the performance, the bandwidth – performance per terabyte is actually really more important than the IOPS in this space because it’s mostly large objects or large chunks that are moving around. So we see that as continuing. Disk drives are going to continue to be very relevant. There’s some Google papers and there’s a couple of quotes from a distinguished engineer now at Microsoft talking about how important disk drives are and continue to be important for the world. SSDs have their place, they absolutely have their place, and it’s more a question of better together than one versus the other.

Blocks & Files: How about for video surveillance, smaller network attached storage drives and that kind of thing?

Ahmed Shihab: If you think about video surveillance, it’s a very write-intensive workload. The endurance is a big deal and it’s also a very cost-sensitive world. So being 6x cheaper than QLC NAND and having practically infinite endurance, it makes a perfect sense for disk drives to be in that world. That’s definitely what we continue to see in that workload. SSDs tend to do well in very small block-based use cases and in caching use cases and high intensity read use cases.

Blocks & Files: Do you think HDDs will continue to play a role in gaming systems?

Ahmed Shihab: In some, I would say yes. It’s hard to say. The gaming market changes quite quickly. I don’t really have an opinion just yet on that one.

Blocks & Files: Disk drives are pretty much in a fixed format; it’s a 3.5-inch drive bay. And that format has a long life ahead of it because you continually are going to be able to increase capacity. And so the cost per terabyte will continue to slowly edge down and your customers will get more value from disk drives than they can get from any alternate technology, whether it be SSDs above them in the performance space or tape drives below them. Is that pretty much it?

Ahmed Shihab: We actually tried to look at five and a quarter inch drives, things like that. But the three and a half inch, one inch high form factor has really endured; mostly because you have existing infrastructure, you have people who know how to handle it. There’s a whole bunch of operational considerations to do with it. And it wasn’t lack of trying for wanting to change it, but there are practical limitations to do with the rate of spin and the size of the platters and the mechanics about the platters and the vibrations and things like that. So it’s here to stay in my opinion.

The nice thing is we actually know the physics of the recording technology of HAMR is going to take us for another at least 10, 15, 20 years. There is a whole roadmap ahead of us in that world and that’ll continue to deliver the dollars per terabyte that customers want. And there’s opportunities for us to invent new capabilities. One of the things that we’re very excited about is something new about how we managed to increase the bandwidth per terabyte. So I’m not going to say much more about it, I’m just going to tease it out there. We talked about it publicly also, but it’s something that we’re very excited about.

Blocks & Files: Do you think there’s a future for NVMe access drives?

Ahmed Shihab: The only reason I can think is NVMe will become important is when the bandwidth exceeds the capability of SATA. I think it’s going to be a practical thing that changes it. There’s a lot of ideological conversations around NVMe versus SATA versus SAS and, as a company and as a team, we’re very rooted in pragmatism. If there’s no need for it, don’t do it, because it’s disruptive to customers. And that’s something we’re very careful about, in the sense we want to reduce the friction that customers feel when they’re adopting new technology.

If you think about SMR and what we’ve done with OptiNAND and things like that, we learned a bunch of lessons from that, which is how to make it really super low-friction for customers to get into new drive technology. And we’ve been working with customers for a long time helping them write code, deliver libraries so that they can take better advantage of all the capabilities of drives. So we very much want to work in lockstep with those customers.

So just changing the interface for the sake of it; can we do it? Of course we can do it. Doing NVMe now is not really hard. It’s changing the wire and the protocol. It’s not a big deal. The question is, is the market, our customers, ready for it? Is it necessary? And customers will only do it when it’s necessary.

NewsPaperStorages and File System News

NewsPaperStorages and File System News

Graid going for Nvidia RAID gold

Analysis finds Volumez block storage outpaces cloud giants for PostgreSQL

Nvidia extends LLM memory with tiered KV caching and Dynamo engine

goHardDrive RMA customer details vulnerability

VAST Data cracks into HPC with Doudna supercomputer win

Storage news collection – July 3

Kioxia tunes SSD-based vector search for RAG workloads

Progress snaps up Nuclia for agentic RAG tech

DDN on Infinia object storage and solving the POSIX problem

Cristie verifies backup recoverability in clean room for major platforms

Cloudian: AI inference will need an immense amount of storage

Western Digital sees HAMR capacity advantage in roadmap

ABOUT US

FOLLOW US