Home Blog Page 3

Pure Storage expects hyperscale sales bump this year

Pure Storage’s first quarter revenues rose in double digits with no signs of order pull-ins from tariff uncertainity.

Revenues grew 12 percent year-on-year to $778.5 million in the quarter ended May 4, beating the $770 million outlook. Pure reported a net loss of $13.9 million, better than the year-ago $35 million loss. This is in line with seasonal trends – Pure has made a loss in its Q1 for three consecutive years while making profits in the other quarters, as the below chart indicates:

CEO Charlie Giancarlo said the business experienced “steady growth”.

”Pure delivered solid performance in Q1, delivering double digit growth within a dynamic macro environment.” There were: “very strong Evergreen//One and Evergreen//Forever sales” in the quarter. Evergreen//One is a storage-as-a-service while Evergreen//Forever is a traditional storage offering with continuous HW and SW upgrades via subscription.

Financial summary:

  • Gross margin: 68.9 percent, down from 73.9 percent a year ago
  • Free cash flow: $211.6 million vs $173 million a year ago
  • Operating cash flow: $283.9 million vs $221.5 million a year ago
  • Total cash, cash equivalents, and marketable securities: $1.6 billion vs $1.72 billion a year ago
  • Remaining Performance Obligations: $2.69 billion, up 17 percent year-on-year

The hyperscale business is progressing but not delivering revenue increases yet. Giancarlo said in the earnings call: “Our hyperscale collaboration with Meta continues to advance. Production validation testing is on schedule with strong progress in certifying our solutions across multiple performance tiers. We remain on track to deliver our anticipated 1-2 exabytes of this solution in the second half of the year, as planned.” 

This would be revenue on a license fee model, not on full product sales. Getting revenues from these projects takes time, Giancarlo said: ”The reason is because it’s not the testing of our product specifically that’s taking a long time. It is their design cycle of their next-generation data center, which goes well beyond just the storage components of it.”

He added: “It generally takes us somewhere between 18-months and two years to design a new product here at Pure. It’s the same for these hyperscalers who are designing their next-generation data center.” It’s a co-engineering process.

Giancarlo said Pure is working well with the other hyperscalers: “We’re making steady progress there [at] about the pace that we expected. Hard to predict when one of those would turn into what we would call a fully validated design win. We are in some POCs [proof of concept] that should be an indicator. …we think we’re on track, but there’s still more work to be done before we can declare victory.”

Pure confirmed CFO Kevan Krysler is leaving for a new opportunity, and will stay in place until a new CFO is appointed.

New customer additions in the quarter totalled 235, the lowest increase for seven years. 

Next quarter’s outlook is for $845 million in revenues, a 10.6 percent year-on-year rise, with a $3.5 billion full year revenue forecasted, equating to a rise of 11 percent. Giancarlo said: ”Our near-term view for the year remains largely unchanged, although we are navigating increased uncertainty.” Overall: “We are confident in our continued momentum to grow market share and strengthen our leadership in data storage and management.” 

Pure will announce a new products at its forthcoming Accelerate conference – June 17 – 19, Las Vegas – to enable customers “to create their own enterprise data cloud, allowing them to focus more on their business outcomes rather than their infrastructure.”

Analyst Jason Ader told subscribers: “We believe that Pure Storage will steadily take share in the roughly $35+ billion enterpise storage market based on: 1) clear product differentiation (as evidenced by industry-leading gross margins); 2) strong GTM organization and deep channel partnerships; 3) secular trend toward all-flash arrays (AFAs), in which Pure has been a pioneer; and 4) Pure’s leverage to robust spending areas, including SaaS data centers, HPC, and AI/ML use-cases.”

Storage news round-up – May 29

Alluxio, supplier of open source virtual distributed file systems, announced Alluxio Enterprise AI 3.6. This delivers capabilities for model distribution, model training checkpoint writing optimization, and enhanced multi-tenancy support. It can, we’re told, accelerate AI model deployment cycles, reduce training time, and ensure data access across cloud environments. The new release uses Alluxio Distributed Cache to accelerate model distribution workloads; by placing the cache in each region, model files need only be copied from the Model Repository to the Alluxio Distributed Cache once per region rather than once per server. V 3.6 debuts the new ASYNC write mode, delivering up to 9GB/s write throughput in 100 Gbps network environments. There is a web-based Management Console designed to enhance observability and simplify administration. It has multi-tenancy support, multi-availability zone failover support and virtual path support in FUSE.

AWS announced GA of Aurora DSQL, a serverless, distributed SQL database enabling customers to create databases with the high availability, multi-Region strong consistency, and PostgreSQL compatibility. Until now, customers building globally distributed applications faced difficult trade-offs when selecting a database: Existing systems offered either low latency without strong consistency, or strong consistency with high latency, but never both low latency and strong consistency in a highly available SQL database. With Aurora DSQL, AWS claims customers no longer need to make these trade-offs. It’s now GA in eight AWS Regions, with availability in additional regions coming soon.

…  

Fervent AI-adoptee Box announced revenue of $276 million, up 4 percent Y/Y for its Q1 fy2026 quarter with $3.5 million net income. Billings came in at $242.3 million, up 27 percent. Next quarter’s revenue is expected to be $290 million to $291 million, up 8 percent Y/Y. Aaron Levie, co-founder and CEO of Box, claimed: “We are at a pivotal moment in history where AI is revolutionizing work and business. In this AI-first era, organizations are embracing this shift to stay competitive… Earlier this month, we unveiled our largest set of AI innovation yet, including new AI Agents that integrate with the leading models and software platforms to accelerate decision-making, automate workflows, and boost productivity.”

Cloudera announced an extension of its Data Visualization capability to on-premises environments. Features include:

  • Out-of-the-Box Imaging: Drag-and drop functions which facilitate graph and chart creation for use cases ranging from customer loyalty shifts to trading trends.
  • Built-in AI: Unlock visual and structured reports with natural language querying thanks to AI Visual.
  • Predictive Application Builder: Create applications pre-built with machine learning models served by Cloudera AI, as well as Amazon Bedrock, OpenAI and Microsoft Azure OpenAI.
  • Enterprise Security: use enterprise data from anywhere without moving, copying or creating security gaps as part of the Cloudera Shared Data Experience (SDX).
  • Robust Governance: Gain complete control over data used for picturing with advanced governance features. 

Datadobi took part in the in the SNIA Cloud Object Storage Plugfest, sponsored by CST, at the end of April in Denver. It says: “we explored emerging challenges in cloud object storage around the AWS S3 protocol, where continuous server-side changes, evolving APIs and fragmented third-party implementations often lead to compatibility issues. Plugfest allowed us to brainstorm, plan, run compatibility tests, experiment, and work collectively on addressing these pain points.”

DataOps.live  announced the launch of the Dynamic Suite,  which includes two new Snowflake Native Apps designed to solve data engineering challenges faced by many Snowflake customers: continuous integration and deployment (CI/CD) of Snowflake Objects, and the operationalization of dbt projects. The Dynamic Suite of Snowflake Native Apps are available on the Snowflake Marketplace. They are the first two deliverables of the Dynamic Suite family, with additional ones to follow.

HighPoint Technologies is introducing a portable, near-petabyte NVMe table-top, storage product with 8 x Solidigm D5-P5336 122TB SSDs in a RocketStor 6542AW NVMe RAID Enclosure to deliver 976TB storage capacity in a compact box. Highpoint says it’s designed to provide scalable, server-grade NVMe storage for a variety of data-intensive applications, including AI, media production, big data analytics, enterprise data backup, and HPC. 

RocketStor 6542AW

MariaDB is acquiring Codership and its flagship product Galera Cluster, an open-source, high-availability database. This marks the next chapter in a long-standing relationship between MariaDB and Codership as, for over a decade, Codership’s technology has been integrated into MariaDB’s core platform. As such, this deal is expected to come with no disruption to service. It says that, with the Galera team now formally onboard, MariaDB will accelerate development of new features for high availability and scalability across the Enterprise Platform. Galera Cluster is becoming a more integral part of the platform, enabling a more seamless and robust customer experience going forward. Customers will gain access to deeper expertise and more responsive support for database needs directly from the team behind Galera.

At Taiwan’s Computex Phison showed new hardware:

  • Pascari X200Z Enterprise SSD: engineered for endurance (up to 60 DWPD) and SCM-like responsiveness, the new X200Z is designed to provide enterprise-grade performance for AI, analytics, and database workloads. It’s Phison’s most advanced Gen5 SSD for write-intensive environments to date. 
  • aiDAPTIVGPT: a plug-and-play inference services suite for on-premises LLMs. It offers conversational AI, code generation, voice services, and more. It fills a critical market gap for SMBs, universities, and government agencies seekinglocalized AI performance without public cloud dependencies.
  • E28 SSD Controller: the E28 is Phison’s flagship PCIe Gen5 SSD controller, now with integrated AI compute acceleration for faster model updates and unmatched Gen5 performance. 

Researchers from Pinecone, University of Glasgow and University of Pisa recently published “Efficient Constant-Space Multi-Vector Retrieval,” introducing “ConstBERT” – an approach that reduces the storage requirements of multi-vector retrieval by ~50% through fixed-size document representations. We’re told that ConstBERT reduces memory and compute cost while retaining strong retrieval quality. For most practical applications, especially those involving large-scale candidate reranking, it offers no meaningful compromise in quality but substantial gains in efficiency. As AI applications increasingly rely on effective retrieval for accuracy (particularly RAG systems), this approach offers a promising direction for deploying efficient retrieval at scale.

The paper’s abstract says: “Multi-vector retrieval methods, exemplified by the ColBERT architecture, have shown substantial promise for retrieval by providing strong trade-offs in terms of retrieval latency and effectiveness. However, they come at a high cost in terms of storage since a (potentially compressed) vector needs to be stored for every token in the input collection. To overcome this issue, we propose encoding documents to a fixed number of vectors, which are no longer necessarily tied to the input tokens. Beyond reducing the storage costs, our approach has the advantage that document representations become a fixed size on disk, allowing for better OS paging management. Through experiments using the MSMARCO passage corpus and BEIR with the ColBERT-v2 architecture, a representative multi-vector ranking model architecture, we find that passages can be effectively encoded into a fixed number of vectors while retaining most of the original effectiveness.”

Pure Storage is partnering SK hynix “to deliver state-of-the-art QLC flash storage products that meet the high-capacity, energy efficient requirements for data-intensive hyperscaler environments.” SK hynix recently announced a 60TB-class PS1012 SSD and says it will develop 122TB and 244TB follow-on drives to meet AI storage demand. The 60 TB and 122TB drives uses SK hynix’ 238-layer 3D NAND while the 256TB one will use newer 321-layer 3D NAND.

Pure says it can deliver future DirectFlash Module products with SK hynix’s QLC NAND flash memory that will be purpose-built for demanding hyperscaler environments. Pure also has agreements with Micron and Kioxia for NAND chips.

Pure Storage blogs about its latest support for Nvidia, saying the NVIDIA AI Data Platform reference design has been implemented with FlashBlade//EXA and Portworx. It says “By leveraging accelerated compute through NVIDIA Blackwell, NVIDIA networking, retrieval-augmented generation (RAG) software, including NVIDIA NeMo Retriever microservices and the AI-Q NVIDIA Blueprint, and the metadata-optimized architecture of Pure Storage, organizations reduce time to insight from days to seconds while maintaining very high inference accuracy in production environments.”

Quantum announced a major refresh of its Professional Services portfolio, including:

  • Subscription-Based Value Packages – Combining Quantum’s training, updates, guidance, and more at up to 40 percent savings.
  • Deployment Services – Installation and configuration support from certified Quantum experts
  • On-Demand Services – Commitment-free support that includes health checks, workflow integration, system migration (including non-Quantum environments) to optimize hybrid and cloud storage setups.

Rubrik and Rackspace Technology have launched Rackspace Cyber Recovery Service, a new fully-managed service aimed at improving cyber resilience for businesses operating in the public cloud. By integrating Rubrik’s data protection and recovery technologies with Rackspace’s DevOps expertise, the service is intended to help organizations recover from ransomware attacks.

  • Accelerated and automated recovery: Enables restoration of cloud workloads using orchestrated, codified workflows and DevOps best practices. 
  • Enhanced cyber resilience: Combines immutable backups, zero-trust architecture, and AI-driven threat detection to ensure data recovery. 
  • Fully Managed end-to-end support: Offers professional services, continuous optimisation, and guidance for policy management, compliance, and infrastructure recovery.

Singularity Hub published an article about molecular plastic storage as an alternative to DNA storage. It said: “a team from the University of Texas at Austin took a page from the DNA storage playbook. The researchers developed synthetic molecules that act as ‘letters’ to store data inside custom molecules. Compared to DNA sequences, these molecular letters are read using their unique electrical signals with minimal additional hardware. This means they can be seamlessly integrated into existing electronic circuits in our computers.” Read the piece here.

Symply announced LTO-10 products with up to 30TB uncompressed and 75TB compressed per cartridge and a native performance of up to 400MB/Sec. CEO Keith Warburton said: LTO-10 is a sea change in technology with a completely new design, including new head technology, and new media substrate.” The three new products are: 

  • SymplyPRO Ethernet  Desktop and Rackmount standalone LTO drives with 10Gb Ethernet connectivity. 
  • SymplyPRO Thunderbolt  Desktop and Rackmount standalone LTO drives with Thunderbolt & SAS connectivity. 
  • SymplyPRO XTL 40 and 80 – Mid-range and Enterprise modular tape libraries featuring SAS, Fibre Channel, Ethernet, and Thunderbolt interfaces. 

The products will begin shipping in mid-June 2025 and are compatible with media and entertainment backup and archive software applications such as Archiware, Hedge and YoYotta. Initially available in standalone desktop and rackmount formats with Ethernet, SAS, and Thunderbolt interfaces. The SymplyPRO XTL 40 and 80 slot modular libraries will follow later in Q4 2025, featuring interoperability with enterprise backup and archive applications. 

Simply desktop and rack mount LTO-10 tape libraries.

Pricing: SymplyPRO XTL 80 Library from $26,995, SymplyPRO SAS LTO-10 from $11,995, SymplyPRO Ethernet LTO-10 from $15,995, SymplyPRO Thunderbolt LTO-10 from $12,995 and SymplyPRO XTL 40 Library from $19,995 .

Synology’s PAS7700 is an active-active NVMe all-flash storage product. Combining active-active dual-controllers with 48 NVMe SSD bays in a 4U chassis, PAS7700 scales to 1.65 PB of raw capacity with the addition of 7 expansion units. Synology says it uses an all NVMe array to deliver millisecond-grade low latency and up to 2 million IOPS and 30GB/s sequential throughput, and supports a range of file and block protocols including SMB, NFSv3/4 (including RDMA versions), iSCSI, Fibre Channel, NVMe-oF TCP, NVMe-FC, and NVMe-RoCE. It features redundant memory that is upgradable to 2,048 GB across both controllers (1,024 GB per node) and support for high-speed 100GbE networking. It  offers immutable snapshots, advanced replication and offsite tiering. 

Synology PAS7700.

Yugabyte is supporting DocumentDB, Microsoft’s document database-compatible Postgres extension. By bringing NoSQL workloads on PostgreSQL, Yugabyte claims it is both reducing database sprawl and providing developers with more flexibility to replace MongoDB workloads with YugabyteDB – this means they can avoid vendor lock-in issues and take advantage of advanced vector search capabilities needed to build next-generation AI applications.

Hitachi Vantara primes partnership-driven AI push

Hitachi has worked to re-evaluate the fit of its Hitachi Vantara subsidiary to respond faster to the market’s adoption of AI and AI’s accelerating development, according to its Chief Technology Officer.

In late 2023 Hitachi Vantara was reorganized with a services unit spun-off, a new direction set, and Sheila Rohra taking control as CEO.

It built a unified storage product data plane, a unified product control plane and an integrated data-to-agent AI capability. VSP One, the unified data plane, was launched in April last year and all-QLC flash and object storage products added late in the 2024. The unified control plane VSP 360 was announced a week ago. The soup-to-nuts AI capability is branded Hitachi iQ, not VSP One iQ nor Hitachi Vantara iQ, as it will be applied across the Hitachi group’s product portfolio.

Jason Hardy

Jason Hardy, Hitachi Vantara’s CTO for AI and a VP, presented Hitachi iQ at an event in Hitachi Vantara’s office.

VSP One

The data plane includes block, file and object protocols as well as mainframe storage. VSP One includes separate products for these, with:

  • VSP One Block appliance – all-flash running SVOS (Storage Virtualization Operating System)
  • VSP One SDS Block – all-flash
  • VSP SDS Cloud – appliance, virtual machine or cloud offering running cloud-native SVOS
  • VSP One File (old HNAS) and VSP SDS File
  • VSP Object – the HCP (Hitachi Content Platform) product

A new VSP Object product is coming later this year, S3-based, developed in-house by Hitachi Vantara, and set to replace the existing HCP-based object storage product, which will be retired.

Hitachi Vantara is also unifying VSP One file and object with its own on-house development. This started a year ago. Up until now there has been no demand to unify block with file and object.

The data plane is hybrid, covering the on-premises world and will use the three main public clouds: AWS, Azure and GCP (Google Cloud Platform). The current public cloud support status is:

  • VSP One SDS Block – available on AWS and GCP with Azure coming
  • VSP One SDS Cloud – available on AWS
  • VSP One File – roadmap item for AWS, Azure and GCP
  • VSP One Object – roadmap item for AWS, Azure and GCP

VSP 360

The recently announced VSP 360 single control plane is an update or redevelopment of the existing Ops Center, and will play a key role in how AI facilities in the business are set up to use VSP One and how they are instantiated.

VSP 360 gives observability of and insight into VSP One’s carbon footprint. This is read-only now. The next generation of the product will enable user action. A user could, for example, choose a more sustainable option if VSP 360 reveals that the footprint is getting high and alternatives are available. This will be driven by agentic AI capabilities, implying that more than one agent will be involved and the interaction between the agents cannot be pre-programmed.

The VSP One integrated and hybrid storage offerings, managed, observed and provisioned through VSP 360, form the underlying data layer used by Hitachi iQ.

Hitachi iQ

Hitachi Vantara says it has been working with Nvidia on several projects, including engagements with Hitachi group businesses such as Rail where an HMAX digital asset management system, using Nvidia GPUs in its IGX industrial AI platform, has – we’re told – enabled a 15 percent lowering of maintenance costs and a 20 percent reduction in train delays. Hitachi Vantara also has an Nvidia BasePOD certification. A blog by Hardy provides some background here.

 

Hitachi Vantara featured in Jensen Huang’s agentic AI pitch at Computex as an AI infrastructure player along with Cisco, Dell, HPE, Lenovo, and NetApp

Hitachi Vantara says its iQ product set is developing so fast that the marketing aspect – ie, telling the world about it – has been a tad neglected.

Hardy told B&F that Hitachi iQ is built on 3 pillars: Foundation, Enrichment and Innovation. The Foundation has requirements aligned with Nvidia and Cisco and is an end-to-end offering equivalent to an AI Factory for rapid deployment. Enrichment refers to additional functionality, advisory services and varied consumption models.  A Hammerspace partnership extends the data management capabilities, a WEKA deal provides high-performance parallel file capabilities, and the data lake side is helped with a Zetaris collaboration. Innovation refers to vertical market initiatives, such as Hitachi iQ AI Solutions use case documentation, and projects with partners and Nvidia customers.

A Hitachi iQ Studio offering is presented as a complete AI solution stack spanning infrastructure to applications and running across on-prem, cloud, and edge locations. It comprises an iQ Studio Agent and Studio Agent runtime with links to Nvidia’s NIM and NeMO Retriever microservices extracting and embedding data from VSP One object and file, and storing the vectors in a Milvus vector database.

There is a Time Machine feature in iQ which is, we understand, unique to Hitachi V and developed in-house. This enables the set of vectors used by a running AI training or inference job to be modified, during the job’s execution and without topping the job. 

As we understand it, incoming data is detected by iQ and embedding models run to vectorize it, with the vectors stored in a Milvus database. The embedding is done in such a way as to reserve, in metadata vectors, the structure of incoming data. For example, if a document file arrives, this has content and file-level metadata; author, size, date and time created, name, etc. The content is vectorized as is the metadata so that the vectorized document entity status is stored in the Milvus database as well.

Hitachi iQ Studio components graphic

This means that if, for some reason, a set of vectors which includes the content ones from the document becomes invalid during the AI inference or training run, because the document breaks a privacy rule, the document content vectors can be identified and removed from the run’s vector set in a roll back type procedure. That’s why this feature is called a Time Machine – note the time metadata notes in the iQ Studio graphic above.

What we’re taking away from this is that Hitachi iQ product set is moving the company into AI storage and agentic work, hooking up with Nvidia, joining players such as Cisco, Dell, HPE, Lenovo, NetApp and Pure Storage. It’s done this by a combo of partnering and in-house development, leaving behind its previous (Hitachi Data Systems) acquisition mindset – remember the Archivas, BlueArc, Cofio, ParaScale, Pentaho and Shoden Data Systems purchases?

Hitachi V is in a hurry and AI is its perceived route to growth, increased relevance and front rank storage system supplier status. It is hoping the iQ product range will give it that boost.

Salesforce snaffles Informatica to feed Agentforce with data

SaaS giant Salesforce is entering into an agreement to buy data integrator and manager Informatica in a bid to strengthen its Agentforce’s data credentials, paying $25/share (totalling around $8 billion). Informatica’s current stock price is $23.86.

Salesforce says buying Informatica will give it a stronger trusted data foundation needed for deploying “powerful and responsible agentic AI.” It wants its Agentforce agents to work at scale across a customer’s Salesforce data estate. Salesforce has a unified data model and will use Informatica tech to provide a data catalog, integration and data lineage, plus data quality controls, policy management and master data management (MDM) features.

Marc Benioff.

Marc Benioff, Chair and CEO of Salesforce, said: “Together, Salesforce and Informatica will create the most complete, agent-ready data platform in the industry. By uniting the power of Data Cloud, MuleSoft, and Tableau with Informatica’s industry-leading, advanced data management capabilities, we will enable autonomous agents to deliver smarter, safer, and more scalable outcomes for every company, and significantly strengthen our position in the $150 billion-plus enterprise data market.”

Salesforce claims that buying Informatica will:

  • Strengthen Data Cloud’s leadership as a Customer Data Platform (CDP), ensuring data from across the organization is not just unified but clear, trusted, and actionable.
  • Provide a foundation for Agentforce’s autonomous AI agents to interpret and act on complex enterprise data, building a system of intelligence.
  • Salesforce CRM applications will be enhanced, giving customer teams the confidence to deliver more personalized customer experiences.
  • Informatica’s data quality, integration, cataloging, and governance will ensure data flowing through MuleSoft APIs is connected and also standardized.
  • Tableau users will benefit from richer, context-driven insights through access to a more accessible and better-understood data landscape.

Salesforce CTO Steve Fisher said: “Truly autonomous, trustworthy AI agents need the most comprehensive understanding of their data. The combination of Informatica’s advanced catalog and metadata capabilities with our Agentforce platform delivers exactly this.”

“Imagine an AI agent that goes beyond simply seeing data points to understanding their full context — origin, transformation, quality, and governance. This clarity, from a unified Salesforce and Informatica solution, will allow all types of businesses to automate more complex processes and make more reliable AI-driven decisions.”

Amit Walia.

Informatica CEO Amit Walia said: “Joining forces with Salesforce represents a significant leap forward in our journey to bring ​​data and AI to life by empowering businesses with the transformative power of their most critical asset — their data. We have a shared vision for how we can help organizations harness the full value of their data in the AI era.”

Background

Informatica was founded by Gaurav Dhillon and Diaz Nesamoney in 1993. It went public on Nasdaq six years later and was subsequently acquired in 2015 by private equity house Permira and the Canada Pension Plan Investment Board for $5.3 billion. Microsoft and Salesforce Ventures participated in this deal, meaning Salesforce has had a close aquaintance with Informatica since then. Amit Walia was promoted to CEO in January 2020 from his role as President, Products and Marketing. Informatica then returned to public ownership with an NYSE listing in 2021.

Informatica stock price history chart.

Eric Brown, Informatica’s CFO quit in early 2023 for other opportunities, when it laid off 7 percent of its headcount; around 450 employees. It bought data management tool supplier Privitar in mid 2023. Five months later, deciding to focus on its Intelligent Data Management Cloud (IDMC) business, it announced a restructuring plan, laying off 10 percent of its employees and cutting down on its real estate occupancy. A revenue history and profits chart shows the 2023 growth tailing off somewhat in 2024 with GAAP profits declining as well.

IDMC is a database/lake and cloud-agnostic product supporting AWS, Azure, Databricks, GoogleCloud, mongoDB, Oracle Cloud, Salesforce, Snowflake, and (Salesforce-owned) Tableau. Using its CLAIRE AI engine, it can ingest and aggregate structured and unstructured data from many different sources, with more than 70 source data connectors. Ingested data can be cleansed, classified, governed and validated. IDMC provides metadata management and a master data management service

Informatica integrated its AI-powered IDMC into Databricks’ Data Intelligence Platform in mid-2024. It announced a strengthened partnership with Databricks in January this year. At the end of the first 2025 quarter Informatica was a $1.7 billion annual recurring revenue business with more than 2,400 cloud subscription customers and 119 trillion cloud transactions per month, up from 200 billion/month in 2015.

Salesforce was rumoured to want to buy Informatica for less than its $11 billion market capitalisation in April last year and has now looks set to take over the business for $8 billion.

The transaction has been approved by the boards of directors of both companies and is expected to complete early in Salesforce’s fiscal year 2027, subject to the customary closing conditions. It will be funded through a combination of cash on Salesforce’s balance sheet and new debt.

When the deal completes, Salesforce will integrate Informatica’s technology stack — including data integration, quality, governance, and unified metadata for Agentforce, and a single data pipeline with MDM on Data Cloud — embedding this “system of understanding” into the Salesforce ecosystem.

Comment

Will other large suppliers with agentic AI strategies and a need for high-quality data aggregated from multifarious sources now be looking at getting their own data ingestion and management capabiities through acquisition? If so that would put Arcitecta, Datadobi, Data Dynamics, Diskover, Hammerspace and Komprise on a prospective purchase list.

PEAK:AIO uses CXL memory to rescue HBM-limited AI models

PEAK:AIO claims it is solving AI inferencing model GPU memory limitations with CXL memory instead of offloading KVCache contents to NVMe flash drives.

The UK-based AI and GPU data infrastructure specialist says AI workloads are evolving “beyond static prompts into dynamic context streams, model creation pipelines, and long-running agents,” and the workloads are getting larger, stressing the limited high-bandwidth memory (HBM) capacity of GPUs and making the AI jobs memory-bound.

This causes a job’s working memory contents, its KVCache, to overflow HBM capacity, meaning tokens get evicted and have to be recomputed when needed again, lengthening job run-time. Various suppliers have tried to augment HBM capacity by having, in effect, an HBM memory partition on external flash storage, similar to a virtual memory swap space, including VAST Data with VUA, WEKA with its Augmented Memory Grid, and Pliops with its XDP LightningAI PCIe-add-in card front-ending NVMe SSDs.

PEAK:AIO is developing a 1RU token memory product using CXL memory, PCIe gen 5, NVMe and GPU Directvwih RDMA.

Eyal Lemberger.

Eyal Lemberger, Chief AI Strategist and Co-Founder of PEAK:AIO, said in a statement: “Whether you are deploying agents that think across sessions or scaling toward million-token context windows, where memory demands can exceed 500GB per model, this appliance makes it possible by treating token history as memory, not storage. It is time for memory to scale like compute has.”

PEAK:AIO says its appliance enables:

  • KVCache reuse across sessions, models, and nodes
  • Context-window expansion for longer LLM history
  • GPU memory offload via CXL tiering
  • and Ultra-low latency access using RDMA over NVMe-oF

It claims that by harnessing CXL memory-class performance it delivers token memory that behaves like RAM, not files. The other suppliers listed: Pliops, VAST and WEKA, cannot do this. Mark Klarzynski, Co-Founder and Chief Strategy Officer at PEAK:AIO, said: “This is the token memory fabric modern AI has been waiting for.”

We’re told the tech gives AI workload developers the ability to build a system that can cache token history, attention maps, and streaming data at memory-class latency. PEAK:AIO says it “aligns directly with Nvidia’s KVCache reuse and memory reclaim models” and “provides plug-in support for teams building on TensorRT-LLM or Triton, accelerating inference with minimal integration effort.”

In theory PCIe gen 5 CXL controller latency can be around 200 nanoseconds while GPUDirect-accessed NVMe SSD access latency can be around 1.2ms (1,200,000ns); 6,000 times longer than a CXL memory access. Peak’s token memory appliance can provide up to 150 GB/sec sustained throughput at <5 microsecond latency.

Lemberger claimed: “While others are bending file systems to act like memory, we built infrastructure that behaves like memory, because that is what modern AI needs. At scale, it is not about saving files; it is about keeping every token accessible in microseconds. That is a memory problem, and we solved it [by] embracing the latest silicon layer.”

PEAK:AIO token memory appliance is software-defined, using off-the-self servers and is expected to enter production by the third quarter. 

The elephant in the storage supplier room: How to deal with AI

Comment: The big issue for storage suppliers is AI – how to store and make data accessible to AI agents and models. Here’s a look at how they are responding to this.

Using AI in storage management is virtually a no-brainer. It makes storage admins more effective and is becoming essential for cybersecurity. The key challenge is storing AI data so that it’s quickly accessible to models and upcoming agents through an AI data pipeline. Does a supplier of storage hardware or software make special arrangements for this or rely on standard block, file, and object access protocols running across Fibre Channel, Ethernet, and NVMe, with intermediate AI pipeline software selecting and sucking up data from their stores using these protocols?

There are degrees of special arrangements for base storage hardware and software suppliers, starting with the adoption of Nvidia GPUDirect support to send raw data to GPUs faster. This was originally limited to files but is now being extended to objects with S3 over RDMA. There is no equivalent to GPUDirect for other GPU or AI accelerator hardware suppliers. At each stage in the pipeline the raw data is progressively transformed into the final data set and format usable by the AI models, which means vector embeddings for the unstructured file and object data.

The data is still stored on disk or SSD drive hardware but the software managing that can change from storage array controller to data base or data lake and to a vector store, either independent or part of a data warehouse, data lake or lakehouse. All this can take place in a public cloud, such as AWS, Azure or GCP, in which case storage suppliers may not be involved. Let’s assume that we’re looking at the on-premises world or at the public cloud with a storage supplier’s software used there and not the native public cloud storage facilities. The data source may be a standard storage supplier’s repository or it may be some kind of streaming data source such as a log-generating system. The collected data lands on a storage supplier’s system or a data base, data lake or data lakehouse. And then it gets manipulated and transformed.

Before a generative AI large language model (LLM) can use unstructured data; file, object or log, it has to be identified, located, selected, and vectorized. The vectors then need to be stored, which can be in a specialized vector database, such as Milvus, Pinecone or Qdrant, or back in the database/lake/lakehouse. All this is in the middle and upper part of the AI pipeline, which takes in the collected raw data, pre-processes it, and delivers it to LLMs.

A base storage supplier can say they store raw data and ship it out using standard protocols – that’s it. This is Qumulo’s stance: no GPUDirect support, and AI – via its NeuralCache – used solely to enhance its own internal operations. (But Qumulo does say it can add GPUDirect support quickly if needed.) Virtually all enterprise-focused raw storage suppliers do support GPUDirect and then have varying amounts of AI pipeline support. VAST Data goes the whole hog and has produced its own AI pipeline with vector support in its database, real-time data ingest feeds to AI models, event handling, and AI agent building and deployment facilities. This is diametrically opposite to Qumulo’s stance. The other storage system suppliers are positioned at different places on the spectrum between the Qumulo and VAST Data extremes.

GPUDirect for files and objects is supported by Cloudian, Dell, DDN, Hammerspace, Hitachi Vantara, HPE, IBM, MinIO, NetApp, Pure Storage, Scality, and VAST. The support is not necessarily uniform across all the file and object storage product lines of a multi-product supplier, such as Dell or HPE.

A step up from GPUDirect support is certification for Nvidia’s BasePOD and SuperPOD GPU server systems. Suppliers such as Dell, DDN, Hitachi Vantara, HPE, Huawei, IBM, NetApp, Pure Storage, and VAST have such certifications. Smaller suppliers such as Infinidat, Nexsan, StorONE, and others currently do not hold such certifications.

A step up from that is integration with Nvidia Enterprise AI software with its NIM and NeMo retriever microservices, Llama Nemotron model, and NIXL routines. Dell, DDN, Hitachi Vantara, HPE, NetApp, Pure, and VAST do this.

Another step up from this is to provide a whole data prep and transformation, AI model support, agent development and agentic environment, such as what VAST is doing with its AI OS, with Dell, Hitachi Vantara and HPE positioned to make progress in that direction via partners, with their AI factory developments. No other suppliers appear able to do this as they are missing key components of AI stack infrastructure, which VAST has built and which Dell, Hitachi Vantara and HPE could conceivably develop, at least in part. From a storage industry standpoint, VAST is an outlier in this regard. Whether it will remain alone or eventually attract followers is yet to be answered.

This is all very Nvidia-centric. The three main public clouds have their own accelerators and will ensure fast data access by these to their own storage instances, such as Amazon’s S3 Express API. They all have Nvidia GPUs and know about GPUDirect and should surely be looking to replicate its data access efficiency for their own accelerators.

Moving to a different GPU accommodation tactic might mean looking at KV cache. When an AI model is being executed in a GPU, it stores its tokens and vectors as keys and values in the GPU’s high-bandwidth memory (HBM). This key-value cache is limited in capacity. When it is full and fresh tokens and vectors are being processed, old ones are over-written and, if needed, have to be recomputed, lengthening the model’s response time. Storing evicted KV cache contents in direct-attached storage on the GPU server (tier 0), or in networked, RDMA-accessible external storage (tier 1), means they can be retrieved when needed, shortening the model’s run time.

Such offloading of the Nvidia GPU server’s KV cache is supported by Hammerspace, VAST Data, and WEKA, three parallel file system service suppliers. This seems to be a technique that could be supported by all the other GPUDirect-supporting suppliers. Again, it is Nvidia-specific and this reinforces Nvidia’s position as the overwhelmingly dominant AI model processing hardware and system software supplier.

The cloud file services suppliers – CTERA, Egnyte, Nasuni, and Panzura – all face the need to support AI inference with their data and that means feeding it to edge or central GPU-capable systems with AI data pipelines. Will they support GPUDirect? Will Nvidia develop edge enterprise AI inference software frameworks for them?

The data management and orchestration suppliers such as Arcitecta, Datadobi, Data Dynamics, Diskover, Hammerspace, and Komprise are all getting involved in AI data pipeline work, as selecting, filtering, and moving data is a core competency for them. We haven’t yet seen them partnering with or getting certified by Nvidia as stored data sources for its GPUs. Apart from Hammerspace, they appear to be a sideshow from Nvidia’s point of view, like the cloud file services suppliers.

Returning to the mainstream storage suppliers, all of the accommodations noted above apply to data stored in the supplier’s own storage, but there is also backup data, with access controlled by the backup supplier, and archival data, with access controlled by its supplier. We have written previously about there being three separate AI data pipelines and the logical need for a single pipeline, with backup suppliers positioned well to supply it.

We don’t think storage system suppliers can do much about this. There are numerous backup suppliers and they won’t willingly grant API access to their customer’s data in their backup stores.

If we imagine a large distributed organization, with multiple storage system suppliers, some public cloud storage, some cloud file services systems, some data protection suppliers, an archival vault, and some data management systems as well, then developing a strategy to make all its stored information available to AI models and agents will be exceedingly difficult. We could see such organizations slim down their storage supplier roster to escape from such a trap.

Lenovo boasts near record revenues but profits slump

Lenovo reported its second-ever highest revenues for the fourth fiscal 2025 quarter as unexpected tariff rises affected profitability.

Revenues in the quarter, ended March 31, were up 23 percent Y/Y to $17 billion, with GAAP net income of $90 million, down 64 percent from the year-ago $248 million.

Full fy2025 revenues were 21 percent higher at $69.1 billion, its second ever highest amount,  with GAAP net income of $1.4 billion, up 36 percent and representing just 2 percent of revenues. Dell earned $95.6 billion in revenues in its fiscal 2025 with 4.8 percent of that as profit; $4.6 billion. Lenovo could be viewed as similar to Dell but without its strong storage hardware and software revenues. The Infrastructure Solutions Group(ISG) in Lenovo experienced hyper-growth with with revenue up 63 percent Y/Y to a record $14.5 billion

Yuanqing Yang

Yuanqing Yang, Lenovo’s Chairman and CEO, stated: “This has been one of our best years yet, even in the face of significant macroeconomic uncertainty. We achieved strong top-line growth with all our business groups and sales geographies growing by double digits, and our bottom-line increased even faster. Our strategy to focus on hybrid AI has driven meaningful progress in both personal and enterprise AI, laying a strong foundation for leadership in this AI era.”

He thought it was “particularly remarkable that we achieved such results amid a volatile and challenging geopolitical landscape and the industry environment.”

Yang finished his results presentation with a particularly Chinese sentiment: “No matter what the future may hold, remember, while the tides answer to forces beyond us, how we sail the ship is always our decision.” Very true. If he was playing the Bridge card game he would prefer to make no trumps bids.

Lenovo experienced an unexpected and strong impact from tariffs and that, we understand, contributed to its profits decline in the quarter. Yang said tariff uncertainty was a bigger worry than tariffs themselves, as Lenovo, with its distributed manufacturing base, can adjust production once it knows what tariffs are and has a stable period in which to make changes. But President Trump’s tariff announcements are anything but stable.

Lenovo has three business units, Intelligent Devices Group (IDG) focusing on PCs, Infrastructure Solutions Group (ISG) meaning servers and storage, and SSG, the Solutions and Services Group.

It said IDG, Lenovo’s largest business unit by far, “enlarged its PC market leadership” in the quarter, gaining a percentage point of market share over second-placed Dell. IDG quarterly revenues grew 13 percent. ISG “achieved profitability for the 2nd consecutive quarter, with revenue hypergrowth of more than 60 percent year-on-year.” SSG delivered 18 percent revenue growth year-on year.

Yang said that, with ISG: “We have successfully built our cloud service provider or CSP business into a scale of USD 10 billion and self sustained profitability. Meanwhile, our traditional Enterprise SMB business also gained strong momentum with 20 per cent year-on-year growth, driving the revenue to a record high.”

The Infinidat business, when that acquisition completes, will enable Lenovo to enter the mid-to-upper enterprise storage market. CFO Winston Cheng indirectly referred to this in a comment:”Going forward, ISG will continue to focus on increasing volume and profitability for its E/SMB business through its streamlined portfolio, enhanced channel capabilities and high-value 3S offerings across storage, software and services.”

New ISG head Ashley Gorakhpurwalla answered a question in the earnings call about ISG’s Enterprise/SMB business: “A very new and refreshed set of compute and storage products from Lenovo offer all of our enterprise and small, medium customers a very rapid return on investment during their technology refresh cycles. So we believe firmly that we are on track for continued momentum in this space and improved profitability in the enterprise infrastructure.”

Lenovo said all its main businesses saw double-digit revenue growth in fy2025. It made significant progress in personal and enterprise AI, noting: “The AI server business … achieved hypergrowth thanks to the rising demand for AI infrastructure, with Lenovo’s industry-leading Neptune liquid cooling solutions as a key force behind this rapid growth.” 

The company referred to hybrid AI fueling its performance, meaning AI that integrates personal, enterprise and public data. 

It is increasing its R&D budget, up 13 percent Y/Y to $2.3 billion, and said it made an AI super agent breakthrough in the quarter, referring to its Tianxi personal agent, which it claims can handle intricate commands across various platforms and operating systems.

Nexsan launches Unity NV4000 for small and remote deployments

The new Unity 4000 is a flexible, multi-port, straightforward storage array from Nexsan.

The company has a focus on three storage product lines: E-Series block storage for direct-attached or SAN use, Assureon immutable storage to keep data compliant and safe in the face of malware, and the Unity NV series of unified file, block, and object systems. There are three Unity systems, with the NV10000 topping the range as an all-flash, 24-drive system. It is followed by the mid-range 60-disk drive NV6000 with a massive 5.18 PB capacity maximum, and now we have the NV4000 for branch offices and small/medium businesses.

Philippe Vincent, Nexsan
Vincent Phillips

CEO Vincent Phillips tells us: “Unity is versatile storage with unmatched value. We support multiple protocols, we support the flexibility to configure flash versus spinning drive in the mix, and that works for whatever the enterprise needs. Every system has dual controllers and redundancy for high availability and non-destructive upgrades. There’s no upcharges for individual features and unlike the cloud, the Unity is cost-effective for the life of the product and even as you grow.” 

Features include immutable snapshots and S3 object-locking. The latest v7.2 OS release adds a “cloud connector, which allows bi-directional syncing of files and directories with AWS S3, with Azure, Unity Object Store and Google Cloud coming soon.”

There are non-disruptive upgrades and an Assureon connector. With the latest v7.2 software, admins can now view file access by user via the SMB protocol as well as NFS.

Phillips said: “Every system has dual controllers and redundancy for high availability and non-destructive upgrades, but we also let customers choose between hardware and encryption on the drive with SED or they can utilize more cost-effective software-based encryption.”

All NV series models have FASTier technology to boost data access by caching. It uses a modest amount of solid-state storage to boost the performance of underlying hard disk drives by up to 10x, resulting in improved IOPS and throughput while maintaining cost-effectiveness and high capacity. FASTier supports both block (e.g. iSCSI, Fibre Channel) and file (e.g. NFS, SMB, FTP) protocols, allowing unified storage systems to handle diverse workloads, such as random I/O workloads in virtualized environments, efficiently.

The NV10000 flagship model “is employed by larger enterprises. It supports all flash or hybrid with performance up to 2 million IOPS, and you can expand with our 78-bay JBODs and scale up to 20 petabytes. One large cancer research facility that we have is installed at 38 petabytes and growing. They’re working on adding another 19 petabytes.”

The NV6000 was “introduced in January, and it’s been a mid-market workhorse. It can deliver one and a half million IOPS and it can scale up to five petabytes. We’ve seen the NV 6000 manage all kinds of customer workloads, but one that we consistently see is it utilized as a backup target for Veeam, Commvault and other backup software vendors.”

Nexsan Unity
Unity NV10000 (top), NV6000 (middle), and NV4000 (bottom)

Phillips said: “The product family needed a system that met the requirements at the edge or for smaller enterprises. That’s where the NV4000 that we’re talking about today comes in. It has all the enterprise features of the prior two models, but in a cost-effective model for the small organization or application or for deployment at the edge of an enterprise. The NV4000 is enterprise class, but priced and sized for the small or medium enterprise. It manages flexible workloads, backups, and S3 connections for hybrid solutions, all in one affordable box. The NV4000 can be configured in all flash or hybrid and it can deliver up to 1 million IOPS and it has the same connectivity flexibility as its bigger sisters.”

Nexsan Unity NV4000 specs

The NV4000 comes in a 4RU chassis with front-mounted 24 x 3.5-inch drive bays. The system has dual Intel Xeon Silver 4309Y CPUs (2.8 GHz, 8 cores, 16 threads) and 128 GB DDR4 RAM per controller. There is a 2 Gbps backplane and connectivity options include 10/25/40/100 GbE and 16/32 Gb Fibre Channel. The maximum flash capacity is 737.28 TB while the maximum capacity with spinning disk is 576 TB. The highest-capacity HDD available is 24 TB whereas Nexsan supplies SSDs with up to 30.72 TB.

That disparity is set to widen as disk capacities approach 30 TB and SSD makers supply 61 and 122 TB capacities. Typically these use QLC (4 bits/cell flash). Phillips said: “We don’t support the QLC today. We are working with one of the hardware vendors and we’ll begin testing those very shortly.”

Nexsan has a deduplication feature in beta testing and it’s also testing Western Digital’s coming HAMR disk drives.

There are around a thousand Nexsan channel partners and its customer count is in the 9,000-10,000 area. Partners and customers should welcome the NV4000 as a versatile edge, ROBO, and SMB storage workhorse.

Druva expands Azure coverage with SQL and Blob support

SaaS data protector Druva has expanded its coverage portfolio to include Azure SQL and Blob data stores, saying it’s unifying protection across Microsoft workloads with a single SaaS platform.

Druva already has a tight relationship with AWS, under which it offers managed storage and security on the AWS platform. It announced a stronger partnership with Microsoft in March, with a strategic relationship focused on deeper technical integration with Azure cloud services. At the time, Druva said this will protect and secure cloud and on-premises workloads with Azure as a storage target. It protects Microsoft 365, Entra ID, and Azure VMs, and now it has announced coverage for two more Azure services.

Stephen Manley, Druva
Stephen Manley

Stephen Manley, CTO at Druva, stated: “The need for cloud-native data protection continues to grow, and Druva’s support for Azure SQL and Azure Blob storage delivers customers the simplicity, security, and scalability they need to stay resilient in today’s threat landscape. By unifying protection across Microsoft workloads within a single SaaS platform, Druva continues to lead by delivering simplified, enterprise-grade cyber resilience with zero egress fees, zero management, and zero headaches.”

From a restore point of view, the zero egress fees sounds good. To emphasize this, Druva says it “offers a unified cloud-native platform with cross-region and cross-cloud protection, without the added cost or complexity of egress fees.”

The agentless Azure SQL coverage includes SQL Database, SQL Managed Instance, and SQL Server on Azure VMs. The Blob protection features granular, blob-level recovery, policy-based automation, and built-in global deduplication. Druva says it delivers “secure, air-gapped backups that protect critical data against ransomware, accidental deletion, and insider threats.”

Druva is now listed on the Azure Marketplace and is expanding support for additional Azure regions in North America, EMEA, and APAC. Druva protection for SQL Server and Blob storage was listed in the marketplace at the time of writing, with Druva protection for Enterprise Workloads including Azure VMs, SQL, and Blob.

Druva graphic

It is stepping into a field of other suppliers protecting Azure SQL and Blob, including Acronis, Cohesity, Commvault, Rubrik, and Veeam.

Support for Azure SQL is generally available today, as is Azure Blob.

Komprise CTO on how to accelerate cloud migration

Interview: Data manager Komprise takes an analytics-first approach to its Smart Data Migration service involving scanning and indexing all unstructured data across environments, then categorizing it based on access frequency (hot, warm, or cold), file types, data growth patterns, departmental ownership, and sensitivity (e.g. PII or regulated content).

Using this metadata, enterprises can:

  • Place data correctly: Ensure active data resides in high-performance cloud tiers, while infrequently accessed files move to lower-cost archival storage.
  • Reduce scope and risk: By offloading cold data first or excluding redundant and obsolete files, the total migration footprint is much smaller.
  • Avoid disruption: Non-disruptive migrations ensure that users and applications can still access data during the transfer process.
  • Optimize for compliance: Proper classification helps ensure sensitive files are placed in secure, policy-compliant storage.

We wondered about the cut-off point between digitally transferring files and physically transporting storage devices, and asked Komprise field CTO Ben Henry some questions about Smart Data Migration.

Blocks & Files: A look at the Smart Data Migration concept suggests that Komprise’s approach is to reduce the amount of data migrated to the cloud by filtering the overall dataset.

Ben Henry: Yes, we call this Smart Data Migration. Many of our customers have a digital landfill of rarely used data sitting on expensive storage. We recommend that they first tier off the cold data before the migration; that way they are only migrating the 20-30 percent of hot data along with the dynamic links to the cold files. In this way, they are using the new storage platform as it is meant to be used: for hot data that needs fast access.

Ben Henry, Komprise
Ben Henry

 
Blocks & Files: Suppose I have a 10 PB dataset and I use Komprise to shrink the amount actually sent to the cloud by 50 percent. How long will it take to move 5 PB of data to the cloud? 

Ben Henry: Komprise itself exploits the available parallelism at every level (volumes, shares, VMs, threads) and optimizes transfers to move data 27x faster than common migration tools. Having said this, the actual time taken to move data depends significantly on the topology of the customer environment. Network and security configurations can make a tremendous difference as well as where data resides. If it is spread across different networks that can impact the transfer times. We can use all available bandwidth when we are executing the migration if the customer chooses to do so.

 
Blocks & Files: Is there compute time involved at either end to verify the data has been sent correctly? 

Ben Henry: Yes. We do a checksum on the source and then on the destination and compare them to ensure that the data was moved correctly. We also provide a consolidated chain of custody report so that our customer has a log of all the data that was transferred for compliance reasons. Unlike legacy approaches that delay all data validation to the cutover, Komprise validates incrementally through every iteration as data is copied to make cutovers seamless. We are able to provide a current estimate of the final iteration because Komprise does all the validation up front when data is copied, not at the end during the time sensitive cutover events.

Blocks & Files: What does Komprise do to ensure that the data is moved as fast as possible? Is it compressed? Is it deduplicated? 

Ben Henry: Komprise has proprietary, optimized SMB and NFS clients that allow the solution to analyze and migrate data much faster. Komprise Hypertransfer optimizes cloud data migration performance by minimizing the WAN roundtrips using dedicated channels to send data, mitigating the SMB protocol issues.

Blocks & Files: At what point of capacity is it better to send physical disk drives or SSDs (124 TB ones are here now) to the cloud as that would be quicker than transmitting them across a network? Can Komprise help with this? 

Ben Henry: The cost of high-capacity circuits is now a fraction of what it was a few years ago. Most customers have adequate networks set up for hybrid cloud environments to handle data transfer without needing to use physical drives and doing things offline. It’s common for enterprises to have 1, 10, or even 100 Gb circuits. 

Physical media gets lost and corrupted and offline transfers may not be any quicker. For instance, sending 5 PB over an Amazon Snowball could easily take 25 shipments, since one Snowball only holds 210 TB. That’s painful to configure and track versus “set and forget” networking. Sneakernet in many scenarios is a thing of the past now. In fact, I was just talking with an offshore drilling customer who now uses satellite-based internet to transmit data from remote oil rigs that lack traditional network connectivity.

Blocks & Files: That’s a good example, but for land-based sites Seagate says its Lyve mobile offering can be used cost-effectively to physically transport data.

Ben Henry: Yes, we are not suggesting that you will never need offline transfer. It is just that the situations where this is needed have reduced significantly with the greater availability of high-speed internet and satellite internet. Now, the need for offline transfers has become more niche to largely high-security installations.

Blocks & Files: You mention that sending 5 PB by Amazon Snowball needs 25 shipments. I think sending 5 PB across a 100Gbit link could take around five days assuming full uninterrupted link speed and, say, 10 percent network overhead

Ben Henry: Using a steady, single stream of data sent over a 100Gbit link with 50 ms of average latency, which reduces the single stream to ~250 Mbps, the transfer time could take years. Komprise isn’t single stream. In fact, it’s distributed and multithreaded. So, instead of just using 250 Mbps of a 100Gbit link, we can utilize the entire circuit bringing the job down to days.

Blocks & Files: With Snowball Edge storage optimized devices, you can create a single, 16-node cluster with up to 2.6 PB of usable S3-compatible storage capacity. You would need just 2 x 16-node clusters for 5 PB.

Ben Henry: Yes, there are still scenarios that need edge computing or have high security constraints where network transfer is not preferred. For these scenarios, customers are willing to invest in additional infrastructure for edge storage and compute such as the options you mention. Our point is simply that the market has shifted and we are not seeing great demand for offline transfers largely because the bandwidth related demand for offline transfers has greatly reduced with the availability of high-speed internet and satellite internet. 

Storage news roundup – May 22

Arctera is collaborating with Red Hat and its InfoScale cyber-resilience product is certified on Red Hat OpenShift Virtualization.

Hitachi Vantara recently suffered a malware attack and we asked: “In the light of suffering its own malware attack, what would Hitachi Vantara say to customers about detecting, repelling and recovering from such attacks?”

The company replied: “Hitachi Vantara’s recent experience underscores the reality that no organization is immune from today’s sophisticated cyber threats, but it reinforces that how you detect, contain, and respond to such events is what matters most. At Hitachi Vantara, our focus has been on acting with integrity and urgency.

“We would emphasize three key lessons:

1. Containment Measures Must Be Quick and Decisive. The moment we detected suspicious activity on April 26, 2025, we immediately activated our incident response protocols and engaged leading third-party cybersecurity experts. We proactively took servers offline and restricted traffic to our data centers as a containment strategy.

2. Recovery Depends on Resilient Infrastructure. Our own technology played a key role in accelerating recovery. For example, we used immutable snapshot backups stored in an air-gapped data center to help restore core systems securely and efficiently. This approach helped reduce downtime and complexity during recovery.

3. Transparency and Continuous Communication Matter. Throughout the incident, we’ve prioritized open communication with customers, employees, and partners, while relying on the forensic analysis and our third-party cybersecurity experts to ensure decisions are based on verified data. As of April 27, we have no evidence of lateral movement beyond our environment.

“Ultimately, our experience reinforces the need for layered security, rigorous backup strategies, and well-practiced incident response plans. We continue to invest in and evolve our security posture, and we’re committed to sharing those insights to help other organizations strengthen theirs.”

HYCU has extended its support for Dell’s PowerProtect virtual backup target appliance to protect SaaS and cloud workloads with  backup, disaster recovery, data retention, and offline recovery.

The addition of support for Dell PowerProtect Data Domain Virtual Edition by HYCU R-Cloud SaaS  complements existing support for Dell PowerProtect Data Domain with R-Cloud Hybrid Cloud edition. HYCU says it’s first company to offer the ability to protect data across on-premises, cloud, and SaaS to the most efficient and secure storage in the market; PowerProtect. HYCU protects more than 90 SaaS apps. It says that what’s new here is that, of the 90+ offerings, only a handful of backup suppliers offer customer-owned storage.

OWC announced the launch and pre-order availability of the Thunderbolt 5 Dock with up to 80Gb/s of bi-directional data speed and 120Gb/s for higher display bandwidth needs. You can connect up to three 8K displays or dual 6K displays on Macs. It works with Thunderbolt 5, 4, 3, USB4, and USB-C devices, and delivers up to 140W of power to charge notebooks. The dock has 11 versatile ports, including three Thunderbolt 5 (USB-C), two USB-A 10Gb/s, one USB-A 5 Gbps, 2.5GbE Ethernet (MDM ready), microSD and SD UHS-II slots, and 3.5mm audio combo. The price is $329.99 with Thunderbolt 5 cable and external power supply.

At Taiwan’s Computex Phison announced the Pascari X200Z enterprise SSD with PCIe gen5 with near-SCM latency and – get this – up to 60 DWPD endurance – it’s designed for the high write endurance demands of generative AI and real-time analytics. It also announced aiDAPTIVGPT supporting generative tasks such as conversational AI, speech services, code generation, web search, and data analytics. Phison launched aiDAPTIVCache AI150EJ: a GPU memory extension optimized for AI Edge and Robotics Systems by enhancing edge inference performance by optimizing Time to First Token (TTFT) and increasing the number of tokens processed.

An E28 PCIe 5.0 SSD controller, built on TSMC’s 6nm process, is the first in the world to feature integrated AI processing, and achieving up to 2,600K/3,000K IOPS (random read/write) — over 10 percent higher than comparable products. It has up to 15 percent lower power consumption versus competing 6nm-based controllers. 

The  E31T DRAM-less PCIe 5.0 SSD controller is designed for ultra-thin laptops and handheld gaming devices with M.2 2230 and 2242 form factors. It delivers high performance, low power consumption, and space efficiency.

Phison also announced PCIe signal IC products:

  • The world’s first PCIe 5.0 Retimer certified for CXL 2.0
  • PCIe 5.0 Redriver with over 50% global market share
  • The industry’s first PCIe 6.0 Redriver
  • Upcoming PCIe 6.0 Retimer, Redriver, SerDes PHY, and PCIe-over-Optical platforms co-developed with customers

IBM-owned Red Hat has set up an open source llm-d project and community; llm-d standing for, we understand,  Large Language Model – Development. It is focused on AI inference at scale and aims to make production generative AI as omnipresent as Linux. It features:

  • vLLM, the open source de facto standard inference server, providing support for emerging frontier models and a broad list of accelerators, including Google Cloud Tensor Processor Units (TPUs).
  • Prefill and Decode Disaggregation to separate the input context and token generation phases of AI into discrete operations, where they can then be distributed across multiple servers.
  • KV (key-value) Cache Offloading, based on LMCache, shifts the memory burden of the KV cache from GPU memory to more cost-efficient and abundant standard storage, like CPU memory or network storage.
  • Kubernetes-powered clusters and controllers for more efficient scheduling of compute and storage resources as workload demands fluctuate, while maintaining performance and lower latency.
  • AI-Aware Network Routing for scheduling incoming requests to the servers and accelerators that are most likely to have hot caches of past inference calculations.
  • High-performance communication APIs for faster and more efficient data transfer between servers, with support for NVIDIA Inference Xfer Library (NIXL).

CoreWeave, Google Cloud, IBM Research and NVIDIA are founding contributors, with AMD, Cisco, Hugging Face, Intel, Lambda and Mistral AI as partners. The llm-d community includes founding supporters at the Sky Computing Lab at the University of California, originators of vLLM, and the LMCache Lab at the University of Chicago, originators of LMCache. Red Hat intends to make vLLM the definitive open standard for inference across the new hybrid cloud.

Red Hat has published a tech paper entitled “Accelerate model training on OpenShift AI with NVIDIA GPUDirect RDMA.” It says :”Starting with Red Hat OpenShift AI 2.19, you can leverage networking platforms such as Nvidia Spectrum-X with high-speed GPU interconnects to accelerate model training using GPUDirect RDMA over Ethernet or InfiniBand physical link. …this article demonstrates how to adapt the example from Fine-tune LLMs with Kubeflow Trainer on OpenShift AI so it runs on Red Hat OpenShift Container Platform with accelerated NVIDIA networking and gives you a sense of how it can improve performance dramatically.“ 

Red Hat graphic

SK hynix says it has developed UFS 4.1 product adopting the world’s highest 321-layer 1Tb triple level cell 3D NAND for mobile applications. It has a 7 percent improvement in power efficiency, compared with the previous generation based on 238-high NAND and a slimmer 0.85mm thickness, down from 1mm before, to fit into a ultra-slim smartphone. It supports data transfer speed of 4,300 MBps, the fastest sequential read for a fourth-generation UFS, while improving random read and write speed by 15 percent and 40 percent, respectively. SK hynix plans to win customer qualification within the year and ship in volume from the Q1 2026 in 512GB and 1TB capacities.

Snowflake’s latest quarterly revenues (Q1fy2026) showed 26 percent Y/Y growth to $1 billion. It’s still growing fast and its customer count is 11,578, up 19 percent. There was a loss of $429,952,000, compared to the previous quarter’s $325,724,000 loss – 32 percent worse. It’s expecting around 25 percent Y/Y revenue growth next quarter.

Data lakehouser Starburst announced a strategic investment from Citi without revealing the amount.

Starburst announced new Starburst AI Agent and new AI Workflows across its flagship offerings: Starburst Enterprise Platform and Starburst Galaxy. AI Agent is an out-of-the-box natural language interface for Starburst’s data platform that can be built and deployed by data analysts and application-layer AI agents.  AI Workflows connect the dots between vector-native search, metadata-driven context, and robust governance, all on an open data lakehouse architecture. With AI Workflows and the Starburst AI Agent, enterprises can build and scale AI applications faster, with reliable performance, lower cost, and greater confidence in security, compliance and control. AI Agent and AI Workflows are available in private preview.

Veeam’s Kasten for Kubernetes v8 release has new File Level Recovery (FLR) for KubeVirt VMs allowing granular restores enabling organizations to recover individual files from backups without the need to restore entire VM clones. A new virtual machine dashboard offers a workload-centric view across cluster namespaces to simplify the process of identifying each VM’s Kubernetes-dependent resources and makes configuring backup consistency easier. KforK v8 supports X8, ARM and IBM Power CPUs and is integrated with Veeam Vault. It broadens support for NetApp Trident storage provisioner with backup capabilities for ONTAP NAS “Economy” volumes. A  refreshed user interface simplifies onboarding, policy creation, and ongoing operations.

Veeam has released Kasten for Modern Virtualization – a tailored pricing option designed to align seamlessly with Red Hat OpenShift Virtualization Engine. Veeam Kasten for Kubernetes v8 and Kasten for Modern Virtualization are now available

… 

Wasabi’s S3 cloud storage service is using Kioxia CM7 Series and CD8 Series PCIe gen 5,  NVMe SSDs for its Hot Cloud Storage Service.

Zettlab launched its flagship product – Zettlab AI NAS, a high-performance personal cloud system that combines advanced offline AI, enterprise-grade hardware, and a sleek, modern design. Now live on Kickstarter, the Zettlab AI NAS gives users a smarter, more secure way to store, search, and manage digital files – with complete privacy, powerful performance, and an intuitive experience. Zettlab AI NAS transforms traditional NAS into a fully AI-powered data management platform with local computing, privacy-first AI tools, and a clean, user-friendly operating system – available to early backers at a special launch price. 

Zettlab AI NAS

it’s a full AI platform running locally on powerful hardware:

  •   Semantic file search, voice-to-text, media categorization, and OCR – all offline
  •   Built-in Creator Studio: plan shoots, auto-subtitle videos, organize files without lifting a finger
  •   ZettOS: an intuitive OS designed for everyday users with pro-level power
  •   Specs: Intel Core Ultra 5, up to 200TB, 10GbE, 96GB RAM expandable

AI and virtualization are two major headaches for CIOs. Can storage help solve them both?

A hand holding a chip against a swirling background. The chip has the letters 'AI' on it.

It’s about evolution not revolution, says Lenovo

CIOs have a storage problem, and the reason can seem pretty obvious.

AI is transforming the technology industry, and by implication, every other industry. AI relies on vast amounts of data, which means that storage has a massive part to play in every company’s ability to keep up. 

After all, according to Lenovo’s CIO Playbook report, data quality issues are the top inhibitor to AI projects meeting expectations.

There’s one problem with this answer: It only captures part of the picture. 

CIOs are also grappling with myriad other challenges. One of the biggest is the upheaval to their virtualization strategies caused by Broadcom’s acquisition of VMware at the close of 2023, and its subsequent licensing and pricing changes.

This has left CIOs contemplating three main options, says Stuart McRae, executive director and GM, data storage solutions at Lenovo. Number one is to adapt to the changes and stick with VMware, optimizing their systems as far as possible to ensure they harvest maximum value from those more expensive licenses. 

Another option is to look at alternative platforms to handle at least some of their virtualization workloads. Or they can simply jump the VMware ship entirely.

But options two and three will mean radically overhauling their infrastructure either to support new platforms or get the most from their legacy systems.

So, AI and virtualization are both forcing technology leaders to take a long hard look at their storage strategies. And, says McRae, these are not discrete challenges. Rather, they are intimately related.

This is because, as Lenovo’s CIO Playbook makes clear, tech leaders are not just looking to experiment with AI or start deploying the technology. The pressure is on to produce business outcomes, in areas such as customer experience, business growth, productivity and efficiency. At the same time, they are looking to make decision-making data-driven.

And this will mean their core legacy platforms, such as SAP, Oracle, and in-house applications will come into play, McRae says. This is where that corporate data lives after all. 

“They still have those systems,” he says. “AI will become embedded in many of those systems, and they will want to use that data to support their efforts in their RAG models.”

Storage is a real-world problem

It is precisely these systems that are running on enterprise virtualization platforms, so to develop AI strategies that deliver real world business value, CIOs need to get their virtualization strategy in order too. That means storage infrastructure that can deliver for both AI and virtualization.

One thing that is clear, McRae says, is that enterprise’s AI and virtualization storage will overwhelmingly be on-prem or co-located. These are core systems with critical data, and companies need to have hands-on control over them. Lenovo’s research shows that less than a quarter of enterprises are taking a “mainly” public cloud approach to their infrastructure for AI workloads.

But McRae explains, “If you look at the storage that customers have acquired and deployed in the last five years, 80 percent of that is hard drive-based storage.”

“The challenge with AI, especially from their storage infrastructure, is a lot of their storage is old and it doesn’t have the performance and resiliency to support their AI investments on the compute GPU side.”

From a technology point of view, a shift to flash is the obvious solution. The advantages from a performance point of view are straightforward when it comes to AI applications. AI relies on data, which in most enterprises will flow from established applications and systems. Moreover, having GPUs idling while waiting for data is massively wasteful. NVIDIA’s top-end GPUs consume roughly the same amount of energy as a domestic household does annually.

But there are broader data management implications as well. “If I want to use more of my data than maybe I did in the past, in a traditional model where they may have hot, warm, and cold data, they may want to make that data all more performant,” says McRae.

This even extends to backup and archive, he says. “We see customers moving backup data to flash for faster recovery.”

Flash offers other substantial power and footprint advantages as well. The highest capacity HDD announced at the time of writing is around 36TB, while enterprise-class SSDs range over 100TB. More importantly SSDs draw far less power than their moving part cousins.

This becomes critical given the concerns about overall datacenter power consumption and cooling requirements, and the challenges many organizations will face simply finding space for their infrastructure.

McRae says a key focus for Lenovo is to enable unified storage, “where customers can unify their file, block and object data on one platform and make that performant.”

That has a direct benefit for AI applications, allowing enterprises to extract value from the entirety of their data. But it also has a broader management impact by removing further complexity. 

“They don’t have different kits running different storage solutions, and so that gives them all the advantages of a unified backup and recovery strategy,” he says.

But modern flash-based systems offer resiliency benefits as well. McRae says a contemporary 20TB hard drive can take five to seven days to rebuild in the event of a failure. A flash drive will take maybe 30 hours.

Securing the future

In a similar vein, as AI becomes more closely intertwined with the broader, virtualized enterprise landscape, security becomes critical.

As McRae points out, while niche storage platforms might have a role to play in hyperscalers’ datacenters where the underlying LLMs are developed and trained, this is less likely to be the case for AI-enriched enterprise computing.

“When you’re deploying AI in your enterprise, that is your enterprise data, and other applications are using that data. It requires enterprise resiliency and security.” 

With Lenovo’s range, AI has a critical role to play in managing storage itself. Along with features such as immutable copies and snapshots, for example, “Having storage that provides autonomous, AI-driven ransomware protection to detect any anomalies or that something bad’s happening is really important for that data.”

So, it certainly makes sense for technology leaders to modernize their storage infrastructure. The question remains: Just how much storage will they need?

This is where Lenovo’s managed services offerings and its TruScale strategy come into play. They allow storage and other infrastructure to be procured on an OpEx, consumption-based basis, and for capacity to be unlocked and scaled up or down– over time.

“Every business is different based on their own capital model and structure,” says McRae. “But the consumption models work well for uncertain application and deployment rollouts.”

After all, most customers are only just beginning to roll out new virtualization and AI workloads. “We typically learn stuff as we start deploying it,” says McRae. “And it may not act exactly like we had planned. That flexibility and ability to scale performance and capacity is really important.”

Equally important, he says, is being able to call on experts who understand both the technology and the specifics of individual businesses. So, while AI can appear a purely technological play, McRae says its network of business partners is critical to its customers’ success.

“Working with a trusted business partner who’s going to have the day-to-day interaction with the customer and knowledge of their business is really important.” he adds.

AI will undoubtedly be revolutionary in the amount of data it requires, while VMware’s license changes have set off a revolution in their own way. But McRae says that data size apart, storage vendors need to ensure that upgrading enterprise storage to keep pace isn’t a dramatic step change.

“Your normal enterprise is going to go buy or license a model to use, and they’re going to go buy or license a vector database to pair it with it, and they’re going to get the tools to go do that,” he concludes. “So that’s what we have to make easy.”

Making modern IT easy means providing a storage infrastructure that offers end-to-end solutions encompassing storage, GPU, and computing capabilities that integrate to handle both AI and other applications using enterprise data. With over four decades’ experience in the technology sector, Lenovo is presenting itself as a go-to partner that will keep its customers at the cutting edge in fast-moving times.

Sponsored by Lenovo.