Home Blog Page 12

Sputtering Seagate, storage giant wants to buy Intevac

Seagate plans on spending $119 million to buy Intevac, a maker of sputtering tools used to coat disk drive platters with thin films.

Intevac’s 200 Lean machine applies thin films, formed from microscopic particles, to disk drive platters to create PMR and HAMR recording medium coatings. In the sputtering process, atomic particles are ejected from a source material when it is bombarded with energy of a particular kind, and the particles land on a platter’s surface. Its main customer for this tool is Seagate and it also counts Resonac (Showa Denko as was) and Western Digital as its customers. Resonac counts Toshiba as a customer.

Intevac’s 200 Lean sputtering tool machine

Seagate and Intevac have entered into a definitive agreement for Intevac to be acquired by Seagate in an all-cash $4/share deal.

Intevac will pay a one-time special dividend of $0.052 per share, which makes the deal worth $4.102/share to Intevac stockholders. This is a premium of 45 percent to Intevac’s closing price of $2.83 per share on December 11, 2024, a date on which Intevac said it was pursuing strategic options. Intevac ran into serious problems when a TRIO generalized glass substrate coating business expansion effort was closed down. It decided to retreat to its core HDD sputtering tool business and look for a buyer or partner.

Seagate will now launch a $4/share all-cash tender offer for all of Intevac’s outstanding shares “as promptly as reasonably practicable,” it said.

Seagate needs to have more HAMR sputtering capacity as it increases HAMR drive production. Wedbush analyst Matt Bryson told subscribers the deal could: “Reduce [Seagate’s] capital costs given the need to update equipment to accommodate HAMR.” It signals Seagate’s confidence in its HAMR technology.

Bryson suggests this deal could also hinder moves into HAMR technology by both Toshiba and Western Digital, by reducing the availability of sputtering machinery, given that they wouldn’t necessarily want to buy from Seagate.

However, either Western Digital or Resonac or both could object to the deal on reduced competition grounds, which could delay its completion, or even prevent the transaction taking place. We imagine Seagate’s legal team have already examined and discounted this extreme possibility.

The transaction is expected to close in late March or early April 2025, subject to the satisfaction of customary closing conditions. Seagate says it will be accretive to its earnings in the future.

How to reduce AI hallucinations

SPONSORED POST: Accurate data is fundamental to the success of generative AI (GenAI) – and so is Retrieval-Augmented Generation (RAG).

RAG can enhance the accuracy of GenAI workloads by processing large amounts of information pulled from external vector databases which the learning models at their core would not usually access. In doing so, it not only fine-tunes underling large (LLMs) and small language models (SLMs) by swapping in fresh data but also reduces the need to continually retrain them.

That can provide a significant boost for GenAI applications which rely on constantly changing datasets, such as healthcare or finance for example, or any scenario which uses virtual assistants, chatbots or knowledge engines. But in order to retrieve accurate, up to date responses rather than AI hallucinations from those dynamic datasets, RAG inferencing also needs a fast, scalable compute architecture.

That’s something that not every enterprise has in house, and often these organizations lack the budget to implement. You can watch this video interview to hear The Register’s Tim Philips talk to Infinidat CMO Eric Herzog about the infrastructure cost and complexity barriers which have stopped many organizations from building their LLMs in-house and how to get around them.

Introduced last November, Infinidat’s RAG workflow deployment architecture is designed specifically to address those challenges, by working in tandem with existing InfiniBox and InfiniBox SSA enterprise storage systems to optimize the output of AI models without the need to invest in specialized equipment. It can also be configured to harness RAG in multi-cloud environments, using Infinidat’s InfuzeOS Cloud Edition, and comes with embedded support for common cloud-hosted vector database engines such as Oracle, Postgres, MongoDB and DataStax enterprise. Infinidat’s RAG solution will also work on non-Infinidat base storage systems with NFS-based data that can be integrated into the overall RAG configuration.

You can read more about Infinidat’s RAG workflow deployment architecture here, alongside details on potential use cases which range from AI Ops, business intelligence, chatbots and educational tools to healthcare information, industrial automation, legal research and support.

Sponsored by Infinidat.

Quantum reports steep losses, says better times lie ahead

Data storage supplier and protector Quantum reported a revenue upturn in its third fiscal quarter of 2025 and a massive loss as it clears the books and looks ahead.

For the three months ended December 31, Quantum recorded a one percent rise in revenue to $72.6 million and a $71.4 million net loss. The bulk of that loss was due to a non-cash adjustment of $61.6 million to the fair market value of warrant liabilities. Take that away and the loss was a more acceptable $9.8 million, slightly worse than the year-ago $9.4 million loss. We’ll look at warrant liabilities in a moment.

CEO Jamie Lerner stated: “Third quarter revenue increased sequentially and was above the midpoint of guidance as recent bookings momentum and customer wins were converted into realized sales.”

The revenue rise was led by higher DXi deduplicating backup appliance sales, offset by supply chain hiccups.

In sales pitch mode, Lerner said of the hardware: “Dell does not have all-flash Data Domain products. IBM does not have all-flash deduplication products. Hitachi does not have that. NetApp does not have that. There just are no major storage vendors that have that technology available today. I assume they will chase us, but they don’t have it today. Certainly, smaller vendors like an ExaGrid don’t have anything like that today. And we are just taking share with that technology.”

Quantum revenues

He claimed: “Pure Storage would say, sure, you can back up against us, and they have an all-flash appliance that is 3x to 4x more expensive than any of us operate in the backup space, and they don’t have anywhere near the efficacy of deduplication that a product like a variable length deduplication algorithm like DXi has.”

Quantum reported $20.6 million in cash, cash equivalents, and restricted cash, compared to $24.5 million a year ago. Net debt was $133 million. Gross margin was 43.8 percent, better than the year-ago 40.6 percent. Lerner talked about Quantum having “generated improving free cash flow” and “a significant reduction in operating expenses,” down 6 percent annually.

The switch to subscription sales is progressing with annual recurring revenue (ARR) of $141 million, in which subscription ARR rose 29 percent annually to $21.3 million. Over 90 percent of new product unit sales were subscription-based.

Lerner mentioned the recent SEPA deal, saying: “This quarter represented tangible evidence of improved financial performance from our ongoing business transformation and operational efficiency initiatives over the past year.”

CFO Ken Gianella added that the SEPA deal with Yorkville Advisors “gives Quantum the right to access additional capital at the company’s discretion over a three-year period.”

Warrant liabilities refer to warrant holders with the right to buy Quantum’s stock at a fixed price before the warrant expires, and with terms that specify a fair value re-evaluation in reporting periods. The warrant holders may issue cash to a company in return for the warrants. In Lerner’s view: “This is just a tool at our disposal to give us some …  growth capital as well as pay down debt,” and allow Quantum to meet growth objectives. The company pays “millions of dollars of interest … per quarter.”

Gianella said in the earnings call that a $61.6 million charge resulted “from the significant increase in our stock price during the quarter as well as a positive non-cash impact of $1.2 million inter-company foreign currency adjustment.” Quantum’s stock price was $3.2 at the start of the quarter but finished at $53.92.

In Lerner’s view, this quarter has seen Quantum “executing on improving our operational process and productivity, combined with significant steps toward being cash flow positive and moving the company forward to become debt-free.”

Answering an analyst’s question, Lerner said: ”We want to get the company to be debt-free. Secondly, we want to stop burning cash and make cash, and we are getting very close to that. We think this year we’ll cross over to where we’re no longer a cash consumer, but a cash producer.”

However the slight revenue growth in Q3 is unlikely to be maintained in the short-term as the outlook for Quantum’s fourth quarter is for revenues of $66 million ± $2 million; roughly 8 percent lower than the year-ago Q4, and down sequentially to a record low point for Quantum’s quarterly revenues in the last 15 years. Quantum said the decreased amount was due to seasonality and “temporary manufacturing headwinds,” particularly with the new Raptor i7 tape library. On the library, Lerner said: “This is kind of a pretty amazing new product that has 2,008 tape cartridges in a standard rack. No one has achieved anything even close to that. It is the highest and it is the largest and most dense storage appliance ever built by man. It just is.”

There’s a worry about new US import tariffs affecting its future costs as some of the components supplied by Dell and Supermicro for Quantum are made in Mexico.

Quantum promised that “efficiency gains and portfolio refresh have us poised for profitable growth into FY ’26.” Gianella said Quantum continues “to prioritize annual recurring revenue, which we expect to be a key driver for delivering increasing profitability over time.”

At the midpoint, the Q4 outlook makes for $280.4 million in revenues for the full FY 2025, down 10 percent year-on-year. Lerner commented: “We think this is the year where we’ve stemmed some of the declines in our business and we go back into growth mode.”

He added: “Our investors know that we had a very old company with old products, and we had to dig in deep with some lenders to spend the money to refresh these products, but this is the year where we are committing, and we are pretty confident that the company, because of these investments, moves into a growth mode.”

HPE Alletra X10000 redefines scale-out hardware

Analysis: The Alletra MP X10000 object storage system from HPE is representative of a new class of scale-out storage hardware with a disaggregated shared everything (DASE) architecture pioneered by VAST Data. Dimitris Krekoukias, a Global Technology and Strategy Architect at HPE, has written a blog cataloging its main features and explaining why they were included in its design.

He lists RDMA and GPUDirect for S3, Data Service Partitions, key-value store underpinnings, single bucket speed, balanced read and write speed for high-throughput and small transactions, small write buffering, and more.

It is built with HPE technology, not OEM’d software or licensed hardware, and is constructed from ProLiant-based storage server controller nodes and separate all-flash storage or capacity nodes, hooked up across an NVMe internal fabric and existing in a global namespace. The containerized OS provides object storage in its initial incarnation, but could support other protocols on top of its base log-structured key-value store. Additional protocol layers could be added.

Krekoukias says: “These protocol layers are optimized for the semantics of a specific protocol, treating each as a first-class citizen. This allows X10000 to take advantage of the strengths of each protocol, without inheriting the downsides of a second protocol or running protocols on top of each other (like Object on top of File or vice versa).”

HPE needed the system to provide fast access to object data using RDMA (remote direct memory access) and Nvidia’s GPUDirect for S3 protocol to provide direct storage drive access to GPU servers. Krekoukias writes: “This technology will both greatly improve performance versus TCP and significantly reduce CPU demands, allowing you to get a lot more out of your infrastructure – and eliminate NAS bottlenecks from GPUDirect pipelines.” 

HPE Alletra Storage MP X10000 architecture

He claims: “Extreme resiliency and data integrity are ensured with Triple+ Erasure Coding and Cascade Multistage Checksums (an evolution of the protection first seen in Nimble and then Alletra 5000 and 6000).” The write buffer is in the SSDs and not NVRAM, “eliminating the HA pair restriction.” There’s more about this here.

Each drive is partitioned into so-called small logical drives or disklets, as small as 1 GB, which are put in RAID groups. These can be confined to a JBOF (storage node) or span JBOFs to protect against failure. Incoming data is compressed and “good space efficiency is ensured by using 24-disklet RAID groups.”

Workloads are sliced up into Data Service Partitions (DSPs) and the DSP shards run on separate controller nodes. If one or more controller nodes are lost, the affected DSPs “just get nicely and evenly redistributed among the remaining controllers.” Disklet RAID slices are dynamically allocated to DSPs as needed.

Krekoukias points out: “Because all state is only persisted within JBOFs and nodes are completely stateless, this movement of DSPs takes a few seconds and there is no data movement involved … Since objects are distributed across DSPs based on a hash, performance is always load-balanced across the nodes of a cluster.” Compute and storage capacity can be scaled independently.

Apart from GPUDirect for S3, performance has been a huge focus in the system’s design. The blog tells us that “typical unstructured workloads such as analytics and data protection assume a single bucket or a small number of buckets per application unit such as a single warehouse or backup chain.” There is no need to have multiple small buckets to improve I/O performance because of the DSP shards concept.

He writes: “X10000’s ability to scale a single bucket linearly means individual applications benefit from X10000’s scale out ability just the same as a large number of applications or tenants.”

The X10000 has balanced performance, HPE says, as it “is designed to provide balanced read vs write performance, both for high throughput and small transactional operations. This means that for heavy write workloads, one does not need a massive cluster. This results in an optimized performance experience regardless of workload and provides the ability to reach performance targets without waste.”

The system’s “X10000’s log-structured Key-Value store is extent-based” and extents are variable-sized. An extent is a contiguous block of storage that holds multiple key-value pairs, thereby increasing access speed. Metadata and data accesses can also be adapted to application boundaries.

The “X10000’s write buffer and indexes are optimized for small objects. X10000 implements a write buffer to which a small PUT is first committed, before it is destaged to a log-structured store and metadata updates are merged into to a Fractal Index Tree for efficient updates.”

Small object writes (PUTs) “are committed to X10000’s write buffer prior to destaging them to the log-structured, erasure-code-protected store. The commit to a write buffer reduces the latency of small PUTs and reduces write amplification. The write buffer is stored on the same SSDs as the log-structured store and formed out of a collection of disklets.”

Krekoukias says: “An SSD-based write buffer can deliver the same high reliability and low latency as prior approaches such as NVDIMM, even for latency-sensitive structured data workloads.”

All these design points mean that small X10000 configurations, ones with only 192 TB of raw capacity and 3.84 TB SSDs, can get good performance and it is not necessary to scale up the cluster to achieve high performance.

Sandisk investor day outlines roadmap post WD spin-off

Western Digital’s about to be spun-off Sandisk business unit held an investor day session and revealed details of three new SSDS coming later this year along with plans for High-Bandwidth Flash (HBF), a NAND equivalent of high-bandwidth DRAM (HBM).

It needs to reassure investors that it’s a solid business ahead of the spinout. As part of this effort,  Sandisk execs presented their view of the market, technology trends, and product strategies to show that its technology will appeal to customers and remain relevant.

Sandisk’s latest 3D NAND generation is called BiCS8, it has 218 layers and is used to make a 2Tb QLC die, which is said to be the world’s highest capacity NAND die in production. Sandisk previewed a future BiCS generation with more than 300 layers. It will be used to make a 1Tb TLC die. Generally speaking WD/Sandisk uses TLC NAND for performance drives and QLC NAND for capacity-centric drives. It showed a slide predicting QLC’s growing presence in the client SSD market, and also the deepening penetration of PCIe gen 5:

This will feed into the first new drive, a PC QLC product with 512GB, 1TB and 2TB capacity levels, using a PCIe gen 4 interconnect. 

This capacity-focused drive will ship later this year as will a performance-centered PC gen 5 product using faster TLC NAND and coming with with 512GB, 1TB, 2TB and 4TB capacity levels. There is initial performance data on the slide image above.

A capacity-centric datacenter drive was also previewed, the UltraQLC DC SN670:

The UltraQLC angle refers to the controller having hardware accelerators, being scalable to the 64 Die/Channel level, able to scale power according to workload demand and including an “integrated advanced toggle mode bus Mux control.” Toggle mode NAND uses a double data rate interface for faster data transfers and a multiplexer (Mux) manages the data lanes. Sandisk’s version of this technology will be managing the NAND-SSD controller data flow more efficiently.

The SN670 uses the PCIe gen 5 bus and will have either 64TB or128TB capacities, meaning 61.44TB or 122.88TB usable respectively. It is scheduled ship in the third quarter of 2025. Both Solidigm and Phison announced 122.88TB SSDs in November last year.

Sandisk displayed a capacity eSSD progression slide with 128TB in calendar 2025, 256TB in 2026, 512TB in 2027 and 1PB after that but with no year identified.

On the future technology front, Sandisk talked about 3D Matrix Memory (DRAM) to help solve the memory wall problem; the growing disparity between memory capacity and bandwidth. It’s working with IMEC on this technology with a CMOS development project with 4-8Gbit capacity.

It is also developing bandwidth-optimised NAND with its HBF concept, which it suggests could provide equivalent HBM bandwidth with 8 to 16 times HBM capacity at the same cost. 

In effect stacked HBM DRAM layers would be replaced in whole or part by NAND layers, connecting to a host GPU/CPU/TPU via a logic die and an interposer. An 8-high stack could have 6 x 512GB NAND dies and 2 HBM ones to provide 3.12PB of total capacity, which compares to an 8Hi HBM chip with 192GB of total capacity.

An all-HBF 8Hi stack would have 4PB of capacity. Sandisk said a c1.8 trillion parameter LLM with 16-bit weights needs 3.6PB of memory and could fit in an 8Hi HBF chip. If we envisage an LLM running on a smartphone then a 64 billion parameter MoE model could fit inside an HBF chip in the phone. It foresees 3 HBF generations with gen 2 having 1.5x the capacity of gen 1 and 1.45x the read bandwidth, and gen 3 being 2x gen in both categories. The energy efficiency would also improve generation by generation.

HBF is not drop-in compatible with HBM but does have the same electrical interface “with minor protocol changes.”

Sandisk wants to help form an open HBF standard ecosystem and is setting up a technical advisory board “of industry luminaries and partners.”

If the HBF idea spreads, with Micron, Samsung and/or SK hynix adopting it, then we could have a near-storage-class memory concept. Download the full Sandisk investor day slide set here.

Veeam co-founder invests in agentic AI start-up Integrail

Integrail – an agentic AI company – today confirmed Ratmir Timashev, Veeam and Object First co-founder, as its Executive Chairman and the primary investor behind a $10 million injection of seed funding.

The business was created in September by Anton Antich, who has now exited the CEO post to become Chief Product Officer. Entering the CEO’s office is Peter Guagenti, the former President and CMO at Tabnine, an AI -powered code completion tool developer.

Guagenti will oversee the rollout of Integrail’s no-code, AI Studio platform used to build AI agents. “The term Agentic AI is on everyone’s lips for an important reason; this capability will fundamentally change work by autonomously executing complex tasks across every line of business,” he said in a canned statement.

From left; Peter Guagenti, Ratmir Timashev and Anton Antich.

“It is hard to overstate how big this transformation will be. There are low-value tasks being done in enterprises by contractors or employees today that will soon be delegated to armies of AI agents, allowing humans to focus on new and higher value work. More interestingly, there are tasks that could make a business more competitive that simply don’t get addressed due to a lack of resources, but assigning these efforts to AI agents will finally make them not only possible but extremely cost-effective.”  

We understand that Tabnine was originally developed by Israeli company Codota, which built AI-assisted software development tools. Codota rebranded as Tabnine in 2022.

Integrail AI Studio diagram.

AI Studio has pre-built agents that are the starting point for the development of custom agents. It can automatically connect AI applications with existing apps and tools like CRM, Marketing Automation, Customer Support, and ERPs for workflow automation. Users describe a work process and its requirements and AI Studio will then create the agent. Examples, we’re told, “include finance use cases like analyzing and acting on contracts and terms, operational uses like review and altering on dashboards, and HR tasks like candidate screening and prioritization or automatically executing employee onboarding and offboarding across a variety of systems.” 

Integrail claimed that: “With agents armed with rich contextual awareness of how a business works, they perform like trained employees.” The Integrail platform has already been employed by ”tens of thousands of users to deploy hundreds of thousands of AI agents,” it claimed.  

AI Studio supports all major LLMs and can be deployed as SaaS or on-premises. Integrail provides training, advice, and professional services to its customers.

A VAST effort: Data estate vectorization and storage for the AI era

All data storage companies are having to respond to the wave of generative AI and intelligent agents interrogating an organization’s data estate, be it block, file, or object, and structured, semi-structured, or unstructured. VAST Data has built an AI-focused software infrastructure stack on its storage base. We asked the company about its approach to the issues we can see.

Blocks & Files: How large is the vector data store for a vectorized multi-format data estate? What are the considerations in making such a judgment?

VAST Data: The overall “cost” or overhead for vectorization is determined by the specific details outlined during the AI design specification phase. Typically, this overhead ranges between 5 and 15 percent for a standard embedding. The exact percentage depends on several factors, including the use cases being addressed, the types of data involved, and the specificity of unique data elements required to effectively meet the needs of those use cases. These considerations ensure the vectorization process is both efficient and tailored to the enterprise’s requirements.

Blocks & Files: Will an organization’s entire data estate need to be vectorized? If not all, which parts? Mission-critical, near-real-time, archival?

VAST Data: While any given AI project may only require a subset of data, over time, as AI penetrates all functions of a business (marketing, HR, support, sales, finance, etc.), the answer is simple: every piece of data should be cataloged and vectorized to unlock its potential for modern applications. Without vectors or proper labeling, data is a liability, not an asset – like a book in a box without a label. Mission-critical and near-real-time data will naturally take priority for vectorization, but even archival data can yield value when cataloged.

Challenges arise from diverse data sources – files, databases, or SaaS platforms like social media. The VAST Data Platform uniquely supports all data types from TB to EB scale and bridges these gaps with file triggers and real-time monitoring, ensuring data changes trigger immediate vectorization. For external sources, event-based or batch processing delivers adaptability for varying latencies.

Blocks & Files: Will an organization’s AI agents need, collectively and in principle, access to its entire data estate? How do they get it?

VAST Data: A key component of the design of the VAST Insight Engine is the preservation of each individual’s data access rights. An AI interface to data, like a chatbot, must respect the users’ assigned data rights and then continue adhering to the data governance rules of the organization. So while agents in aggregate may access all of the data, any given agent will only be able to access data that it has specifically been granted permissions.

Many AI solutions using third-party non-integrated solutions have a hard time achieving this because once you remove the existing classifications on data, rebuilding it based on context can have unpredictable consequences.

The consumption of the VAST Data Insight Engine brings the AI data to the file as an attribute, meaning that the respective ACLs and control metadata are never lost. When consuming from VAST Data, the user interacts with the chat box, the chat box then searches the questions that hit the Insight Engine, which then only returns the chunks of files or files the user has rights to. It’s seamless to the user and the only reliable way to ensure adherence to governance rules. Of course, all of this user access is logged to the VAST DB for compliance as needed. 

Blocks & Files: Does all the information in a vector data store become effectively real-time? Is a vector data store a single real-time response resource?

VAST Data: Think of the VAST Platform Insight Engine as the beating heart of your data ecosystem, delivering near-real-time responses within its environment. While internal data written to the VAST DataStore pulses instantly through AI pipelines, external sources bring a natural delay based on their rhythm, creating a dynamic yet highly responsive system for enterprise use. Regardless of the source’s data, once the chunks and vectors are saved to the VAST DB, it is instantly available for inference operations.

Blocks & Files: Will the prime interface to data become LLMs for users in an organization that adopts LLM agents?

VAST Data:
AI-powered interfaces are the inevitable future of data access. Users seek simplicity – asking questions and receiving precise answers, without navigating complex systems. As LLM agents mature, they’ll transform how we interact with data, replacing traditional CRUD applications with intuitive, conversational experiences that make data truly accessible to everyone.

Blocks & Files: Must a vector data store, all of it, be held on flash drives for response time purposes? Is there any role for disk and tape storage in the vectorized data environment?

VAST Data: In the AI era, the speed of insight defines competitiveness. Traditional vector databases are loaded into RAM and often sharded across multiple servers to scale, adding complexity and decreasing performance as they scale. This, however, is preferred over swapping to HDDs due to significant impact on query latency. VAST enables exabyte scale while delivering linear scaling low latency performance by distributing the chunks and indexes across NVMe flash storage for instantaneous response times, aligning with business-critical, real-time needs.

Blocks & Files: In a vector data space, do the concepts of file storage and object storage lose their meaning?

VAST Data:
The shift to vectorized data reframes how we think about storage entirely. File and object storage, once foundational concepts, lose their meaning in the eyes of users. What matters now is data accessibility and performance, with storage evolving to support these priorities invisibly in the background.

Blocks & Files: Can you vectorize structured data? If not, why not?

VAST Data:
Yes, structured data can and should be vectorized when possible and often delivers better results than text-to-SQL queries. Vectorization better captures complex, non-linear relationships, while text-to-SQL is limited to relationships defined in the schema. While its organization in rows and columns serves traditional applications, vectorization prepares it for the future of AI and machine learning. By converting structured data into numerical vectors, organizations can unlock advanced analytics, cross-domain integration, and more powerful AI-driven insights.

Blocks & Files: Can you vectorize knowledge graphs? If not, why not?

VAST Data: Yes, vectorizing knowledge graphs is not only possible but essential for AI applications. By embedding nodes, edges, and their relationships into vector space, organizations can unlock advanced analytics, enabling their knowledge graphs to power recommendation systems, semantic search, and reasoning tasks in a scalable, AI-ready format.

Blocks & Files: Will we see a VQL, Vector Query Language, emerge like SQL?

VAST Data: The creation of a Vector Query Language would signify a pivotal moment for vector databases, akin to SQL’s role in the relational era. However, such a standard would require both a unifying purpose and cooperation among vendors – a challenge in today’s competitive, rapidly evolving market. If history teaches us anything, it’s that demand for simplicity and interoperability often drives innovation.

Blocks & Files: Will high-capacity flash drives, ones over 60 TB, need to be multi-port, not just single or dual-port, to get their I/O density down to acceptable levels in a real-time data access environment?

VAST Data: The short answer is no – high-capacity flash drives won’t need more than dual ports. Modern SSDs have already outgrown the limitations of HDDs by scaling PCIe lane bandwidth alongside capacity, maintaining a consistent bandwidth-per-GB ratio.

While concepts like Ethernet-enabled drives are interesting for expanding connectivity, most real-time access environments don’t require more than two logical connections. This design simplicity ensures performance, reliability, and scalability without unnecessary complexity.

Blocks & Files: Can a storage supplier provide a data space abstraction covering block, file, and object data?

VAST Data: Yes, a storage supplier can provide a unified data space abstraction across block, file, and object data. The VAST Data Platform does precisely that, creating a seamless environment that integrates all data types into a single namespace.

Blocks & Files: How does a data space differ from a global namespace?

VAST Data: We have in the past used the VAST DataSpace to mean the VAST global namespace. However, we are redefining DataSpace to include all the features that connect multiple VAST clusters plus VAST on Cloud clusters and instances that are most useful when participating in the DataSpace with other VAST clusters. 

  • Snap-to-Object – A VAST feature that replicates data to S3-compatible object storage. 
  • Global Clone – A clone made on one VAST cluster based on a snapshot of a folder on a different VAST cluster. Global Clones can be full, transferring the full contents of the folder, or lazy where the remote cluster only fetches data from the snapshot on the original cluster when data is read. Writes to global clones are always local. 
  • Asynchronous replication – Replication from a source VAST cluster to one or more (1:1 and 1:many) other VAST clusters based on snapshots. Frequently called native replication because Snap-to-Object was developed first so this was native VAST-to-VAST. 
  • Synchronous Replication – Active-Active VAST clusters with synchronous replication.
  • Global Access – The feature name/GUI menu that manages the VAST global namespace making Global Folders available on multiple VAST clusters.
  • Global Folder – A folder that’s made available on multiple VAST clusters via Global Access, the VAST global namespace 
  • Origin – A VAST cluster holding a full copy of a Global Folder. VAST 5.2 supports one origin per Global Folder but future releases will support multiple origins with replication. 
  • Satellite – A VAST cluster that presents a Global Folder for local access caching the folder’s contents on its local storage. 
  • Write Lease – A write lease grants a VAST cluster the right to write to an element, or byte-range within an element. In VAST 5.2, the Origin cluster holds the Write Lease to the entire contents of a Global Folder and so all writes are proxied to the Origin which can apply them to the data. 
  • Read Lease – A read lease is a guarantee of cache currency. When a satellite cluster fetches data from a Global Folder, it takes out a read lease on that data and registers that lease with the write lease holder. If the data should change, the read lease is invalidated and satellites will have to fetch the new version.
     

Blocks & Files: In other words, how do we organize an organization’s data estate and its storage in a world adopting LLM-based chatbots/agents?

VAST Data: In the era of LLM-based chatbots and agents, the VAST Insight Engine on the VAST Data Platform offers a transformative way to organize an organization’s data estate. It seamlessly vectorizes data for AI-driven workflows while preserving user attributes, file permissions, and rights. This ensures secure, role-based access to insights, enabling real-time responsiveness without compromising compliance or data governance.

Backblaze reveals best and worst disk drives with 2024 failure rate stats

Backup and cloud storage supplier Backblaze has released its quarterly disk drive stats report

The company had 301,120 hard drives used to store data as of the end of last year. A chart of the Q4 2024 disk drive annual failure rates (AFR) for various suppliers’ drive types shows a pronounced 4.5-plus percent AFR for four drives. These outliers were a 12 TB HGST model (HUH7212ALN604) and three Seagate drives with 10 (ST10000NM0086), 12 (ST12000NM0007), and 14 TB (ST14000NM0138) capacities:

The average AFR across all the drives was 1.35 percent.

There were positive outliers as well, with five drive models having zero failures for the quarter: a 4 TB HGST (HMS5C4040ALE640), Seagate 8 TB (ST8000NM000A), 14 TB (ST14000NM000J), 16 TB (ST16000NM002J), and 24 TB (ST24000NM002H). Blogger Andy Klein says: “The 24 TB Seagate drives join the 20 TB Toshiba and 22 TB WDC drive models in the 20-plus club as we continue to dramatically increase storage capacity while optimizing existing storage server space.”

Backblaze also presents full 2024 year AFR numbers, which we have charted as well:

Seagate’s 12 TB (ST12000NM0007) drive has by far the worst AFR in both the Q4 and full 2024 year stats, with its 14 TB (ST14000NM0138) and 10 TB (ST10000NM0086) models as well as HGST’s 12 TB (HUH7212ALN604) faring badly in the full-year AFR table.

Klein writes: “There were no qualifying drive models with zero failures in 2024. That said, the 16 TB Seagate model (ST16000NM002J) got close by recording just one drive failure back in Q3, giving the drive an AFR of 0.22 percent for 2024.” The average full 2024 AFR was 1.57 percent, better than 2023’s 1.7 percent. Klein expects the 2025 AFR to be lower still.

Backblaze installed 53,337 drives in the year, averaging 26 drives per hour per technician. It tracked AFR by drive size over the past few quarters:

The 10 TB drives have the highest AFR rating, followed by 12 TB ones in 2024 and the 8 TB drives. Klein says: “The 8 TB (gray line) drives and 12 TB (purple line) drives range in age from five to eight years, and as such their overall failure rates should be increasing over time.”

“The 14 TB (green line) and 16 TB (azure line) drives comprise 57 percent of the drives in service and on average they range in age from two to four years. They are in the prime of their working lives. As such, they should have low and stable failure rates, and as you can see, they do.”

Klein cuts the numbers another way to reveal AFRs by supplier:

Andy Klein, Backblaze
Andy Klein

HGST has climbed to the top of this particular tree in 2024, delivering poorer AFR rates than the previous worst supplier, Seagate, with Toshiba second best and Western Digital providing the most reliable drives of all. HGST’s lousy result is due to its 12 TB drives. Remove them from the equation, and the 2024 AFR for HGST drives would be 0.55 percent.

Klein is now retiring after several years of presenting these illuminating HDD failure rate stats, which are not available anywhere else. He deserves a terrific vote of thanks for this service.

The complete dataset used to create the tables and charts in this report is available on Backblaze’s Hard Drive Test Data page. You can download and use this data for free for your own purposes, but must cite Backblaze as the source.

Optera Data stores data in fluorescence with spectra holes

University of South Australia researcher Dr Nicolas Riesen is working on research that opens the door to potential archival storage that he believes could be ten times cheaper than currently available optical storage by 2030.

Nick Riesen

The technology depends upon selectively altering the fluorescence in wavebands at recording areas on an optical medium such that the presence or absence of a lower level of fluorescence in a band can be used to encode data. The number of lower-level wavebands present, also known as spectral holes, could be used in multi-bit storage, similar to multi-level cell NAND. The depth of a lower level could also encode information.

The way this work is by having a recording area filled with nanoparticles such as 4H- and 6H-SiC  (4 and 6-layer hexagonal Silicon Carbide crystals) doped with vanadium such that the vanadium ion’s energy levels affect the crystal’s fluorescence (light emission) when hit by a laser beam. Vanadium ions can have V4+ and V5+ oxidation states with either four or five electrons lost, and these states affect the particle’s emission capability.

The recording area’s various nanoparticles are set to emit a group of light emission frequencies when excited by a laser beam. These can be viewed as a set of overlapping waves in a spectrum, which collectively have a flat top, (a) in the diagram below. 

The ability of particular nanoparticle variants to emit light at their particular frequency can be altered by hitting them with an appropriately tuned laser with a high enough energy level. That means that, when the recording area’s emission frequencies are read and collated, there can be spectral holes where the frequency strength (emission brightness) is less than it would otherwise be, (b) in the diagram above. 

Geoff Macleod-Smith

Entrepreneur Geoff Macleod-Smith has set up Optera Data to try to commercialize this technology. We can envisage a recording medium, such as a disk being shuttled to and from drives that read and write data. Writing could require multiple spectral frequencies to be set. The reading would require multiple fluorescence frequencies to be read and collated to detect the holes and thereby the bit value or values.

The technology is discussed in an AS Photonics journal paper, “Data Storage in a Nanocrystalline Mixture Using Room Temperature Frequency-Selective and Multilevel Spectral Hole-Burning” by  Nicolas Riesen, Kate Badek, and Hans Riesen, published in 2021. The two Riesens have patented the invention.

Optera Data claims its technology, when developed, will achieve: 

  • 1TB per $1 total cost of ownership
  • 100x efficiency of incumbent HDD storage
  • WORM immutability & offline storage security
  • Manufacture-ready & backwards compatible

Tom Coughlin is an advisor to the company and has written a white paper available on its website. He suggests that the TB discs are feasible with particulate media, adding: “A thin film single layer write-once, archival disc with high volume manufacturing costs of $1/10TB ($0.10/TB) is possible before the end of the decade.”

He concludes: “Optera Data’s projection of storage costs of less than $0.10/TB or better by the end of this decade are at least 10X lower than competing optical storage projections from Folio Photonics and Cerabyte and 25X lower than projections for tape storage.

“This cost advantage, combined with archival storage advantages such as lower energy consumption in a robotic library, media longevity and the use of conventional optical disc substrates (and thus easier integration into existing drives and library systems), makes the Optera Data optical storage technology an attractive option for archive storage.”

That option has to be perceived as one capable of generating substantial revenues for it to be used to help Optera Data raise development cash.

Data protection vendor market share history and changes

Cohesity-Veritas has the largest share in the data protection market with Veeam in second place, but Veeam is growing its share fastest with Datto-Kaseya and Rubrik in joint second place.

These IDC market share numbers have been revealed in a 53-page document sent to subscribers by William Blair analyst Jason Ader, who analyzes Rubrik’s market and technology situation.

Ader writes: “Perhaps more than any other vendor, Rubrik deserves credit for changing the way customers view the backup market and making the term ‘cyber resilience’ stick. While the effectiveness of using backup to counteract ransomware helped here, Rubrik successfully pioneered the perception shift from ‘boring old backup’ to a must-have cyber resilience platform, on which the operations of the business depend.”

Ader includes an IDC table of vendor positions, using “IDC Worldwide Semiannual Software Tracker 1H24; October 2024” data, to show the changes from 2019 to the first half of 2024: 

We have charted this to make the trends more apparent:

Veeam is clearly the fastest-growing vendor in market share terms, with Rubrik accelerating its growth rate from 2023 onward. Datto-Kaseya grew its share up until 2023 then dropped back. We also see two vendors losing market share – Dell and IBM. Plotting the market share changes between the starting and ending periods makes the contrasts between the suppliers quite vivid:

We think there is more scope to grow market share at the low end of the market with small and medium businesses and organizations, as well as by protecting SaaS applications.

Ader says of Rubrik: “The DP (data protection) space is only getting more competitive – with Cohesity’s acquisition of Veritas, Commvault’s revival, and Veeam’s recent financing round – but channel feedback suggests that Rubrik has the leading brand and most market momentum at present. The company’s value proposition is built on the operational simplicity and broad, security-centric feature set of its products and its top-notch go-to-market organization.”

This is now eight-month-old IDC data. Rubrik was positioned seventh in the market in the first half of 2024, and Ader points out: “Given the company’s above-industry growth in 2024 (and expected 25 percent-plus growth in 2025), we suspect that its share in the DP market has continued to improve.”

Given the relatively low market share growth rates, we would suggest that, were Rubrik to acquire another data protection vendor, it would grow its share faster than otherwise, as Cohesity has found out with its Veritas acquisition. Were Rubrik to acquire Commvault or Datto-Kaseya, it could become a top three player. 

NetApp accelerates all-flash SAN A-Series arrays

NetApp has refreshed the lower part of its all-SAN storage array line with more affordable and powerful systems. 

There are two all-flash NetApp array product lines: the AFF (All-flash FAS) unified file and block array A-Series and the ASA (All-flash SAN) block-access-only A-Series. The basic hardware is the same, with the AFF products being upgraded in two phases – first, by new controller processors for the mid-to-high-end systems and, second, for the mid-to-low-end systems a few months later. The ASA models get the same two-step range upgrade some months after the AFF A-Series.

Thus the company refreshed its AFF A-Series product line last year, with the A400, A800, and A900 replaced by the A70, A90, and A1K in May 2024, and the lower-end A150 and A250 succeeded by the A20, A30, and A50 in November.

Sandeep Singh, NetApp
Sandeep Singh

Sandeep Singh, NetApp Enterprise Storage SVP and GM, claimed: “In less than a year, NetApp has refreshed our entire unified, block-optimized, and object portfolio … We offer systems that are faster, simpler, more scalable, and more affordable than the competition – tailored to any workload or budget.”

NetApp upgraded the mid-to-high-end ASA A-Series in September last year, introducing the A70, A90, and A1K alongside the ASA A400, A800, and A900. Now it has refreshed the mid-to-lower-end segment of the ASA A-Series with the new A20, A30, and A50 models.

The company says: “They are ideal for smaller deployments including remote or branch offices with a starting price as low as $25K.” It claims “upfront costs 30-50 percent lower than competitive systems,” a “better return on investment driven by up to 97 percent lower power consumption, and low operational overhead when modernizing to all-flash ASA.”

NetApp ASA line
From top ASA A20, A30, and A50

Customers can deploy them “in minutes, provision in seconds, and protect with one click.”

The new ASA systems will also be available in a FlexPod converged infrastructure, with pre-tested and validated architectures.

Here is a datasheet table for the ASA A-Series model range;

There is a lot of competition in this area with all-flash SAN-capable products available from Dell, Hitachi Vantara, HPE, Huawei, IBM, and Pure Storage. NetApp said it was increasing its wallet share among customers with its all-flash products in its FY 2025 second quarter results last November. These three new ASA A-Series products should help sustain that momentum.

NetApp said it’s going to increase its ASA product’s cyber-resiliency by providing ONTAP Autonomous Ransomware Protection with artificial intelligence (ARP/AI) for block storage later this year. This update will build on the existing capabilities of ARP/AI, the first real-time threat detection and response for NAS systems, expanding its cyber-resiliency protections to SAN customers. NetApp does not use Index Engines technology in the ARP/AI offering.

It is also launching a Ransomware Detection Confidence Program, saying: “If we miss an attack with our AI-powered ransomware detection, we’ll make sure you don’t experience data loss.” In the event that certain ransomware attacks are not detected, this program assists with recovery using NetApp Professional Services at no initial charge. 

There will be more information about the new ASA A-Series products on a NetApp product update webpage and also in a blog post.

WEKA tops SPECstorage Solution 2020 benchmark

WEKA says it has topped all five component test scenarios using HPE PCIe Gen 5 hardware in the SPECstorage Solution 2020 benchmark.

There are five workload scenarios in this benchmark: AI image processing, representative of AI Tensorflow image processing environments; Electronic Design Automation (EDA); Genomics; Software Builds; and Video Data Acquisition (VDA). The workload results include jobs or builds, overall response time (ORT), and other measures detailed in a supplier’s submission on the results webpage. WEKA supplied mostly winning on-premises results using Samsung SSDs in January 2022. It then beat other suppliers again in four of the five categories in March 2024. Now it is in front with every workload.

Boni Bruno, WEKA director for Performance Engineering and Technical Marketing, blogs: “WEKA on the … HPE Alletra Storage Server 4110, powered by Intel Xeon processors, set new records on January 28, 2025, securing the No. 1 ranking across all five SPECstorage Solution 2020 benchmark workloads … Our combined solution not only raised the bar by setting new records for jobs and streams across all workloads – it also delivered significantly lower latency – in some cases up to 6.5x lower than previous records.”

The results were better than WEKA’s March 2024 runs, where it used public cloud instances. Two tables show the vendor submitted results for the AI Image and EDA workloads:

In its latest test results, WEKA slightly more than doubled its AI Image performance, and with lower latency, and increased the number of its EDA blended job sets by 2.7x, again with lower latency.

Charts, plotting job output counts against ORT (latency), show the distance between WEKA and the other vendors:

Scores farthest to the right and lowest to the right are better

We believe that this surge in WEKA’s benchmark speed is due to the accelerative effect of the PCIe Gen 5 bus, linking the NVMe SSDs to the system’s memory, used in the Alletra 4110 storage server, rather than to vastly improved WEKA software since March last year.

Bruno says: “The records were achieved using a single, consistent configuration across the five SPECstorage benchmarks without the need for workload-specific tuning … These improvements translate to faster AI training, reduced semiconductor simulation delays, quicker genomic analysis, and more responsive video analytics.”

His blog details the Alletra 4110 hardware configuration and provides more details on the individual benchmark runs.

As far as AI training and inference are concerned, this benchmark is less hotly contested than the MLPerf benchmark, where DDN and Hammerspace, for example, do better than WEKA in terms of keeping a number of GPUs active with 90 percent or greater utilization. 

WEKA’s advantage in the SPECstorage Solution benchmark is that it has accomplished a clean sweep across all the workloads. Now we wait and watch until other suppliers, such as Qumulo and NetApp, try out their software with PCIe Gen 5 hardware.