Home Blog Page 3

Redis Vector Sets and LangCache speed GenAI models

Redis has announced the Vector Sets data type and fully managed LangCache semantic caching service for faster and more effective GenAI model development and execution.

Redis (Remote Dictionary Server) is the developer and supplier of the eponymous real-time, in-memory key-value store structure, usable as a cache, database or message broker, which can also write persistent data to attached storage drives. It supports strings, lists, sorted and unsorted sets, hashes, bitmaps, streams and vectors, and now Vector Set data types.

Rowan Trollope

The Vector Set datatype complements “Redis’ existing vector similarity search, offering a lower-level way to work with vectors,” while LangCache lets developers integrate Redis-based LLM response caching into applications. This reduces calls to LLMs, storing and  reusing prompts and responses to minimize cost, improve prompt accuracy, and accelerate responsiveness,.

CEO Rowan Trollope said that the “Vector Set is modelled on a sorted set but it’s hyper-dimensional.” A vector set has string elements, like a sorted set, but they’re associated with a vector instead of a score. 

The Vector Set was devized by Redis creator Salvatore Sanfilippo, now one of Trollope’s advisors. He invented an algorithm that works with quanticized vectors, smaller in size than standard 32-bit vectors, enabling more vectors to be held in RAM, and so making semantic search more effective.  

Sanfilippo tweet on X.

Trollope says the fundamental goal of vector sets is to make it possible to add items, and later get a subset of the added items that are the most similar to a specified vector. He tells us: “We can do natural language search of this Redis Vector Set.”

In more detail Vector Sets implement:

  • Quantization with vector float embeddings quantized by default to 8 bit values. This can be modified to no quantization or binary quantization when adding the first element.
  • Dimensionality reduction: The number of dimensions in a vector can be reduced by random projection by specifying the option and  the number of dimensions.
  • Filtering: “Each element of the vector set can be associated with a set of attributes specified as a JSON blob via the VADD or VSETATTR  command. This allows the ability to filter for a subset of elements using VSIM that are verified by the expression.”
  • Multi-threading: Vector sets speed up vector similarity requests by splitting up the work across threads to provide faster results.

Redis claims that its quanticization enables the use of int8 embeddings to reduce memory usage and cost by 75 percent, improve search speed by 30 percent, while maintaining 99.99 percent of the original search accuracy.

Redis example of Vector Set commands.

Trollope said this the Vector Set is: “a more fundamental representation of vectors than in other vector databases. … We don’t store the original vector [as] we don’t believe the full vector is needed. … We quanticize the vector with 1 byte quanticization and variants, such as FP32 to binary – a 32x reduction.” Which quanticization method is used depends upon the use case.

Redis now has two complementary search capabilities:

  • Redis Query Engine for general search & querying, 
  • Vector set for specialized vector similarity search. 

LangCache

Trollope blogs that LangCache: “provides a hosted semantic cache using an API connection that makes AI apps faster and more accurate.” It is a REST API and “includes advanced optimizations to ensure highly accurate caching performance.”

LandCache uses a Redis custom fine-tuned model and configurable search criteria, including search algorithm and threshold distance. Developers can generate embeddings through their preferred model provider, eliminating the need to separately manage models, API keys, and model-specific variables.

LangCache can manage responses so that apps only return data that’s approved for the current user, eliminating the need for separate security protocols as part of the app.

Redis tools and Redis Cloud update

Redis has introduced more AI developer tools and features. 

  • A Redis Agent Memory Server is an open source service that provides memory management for AI apps and agents. Users can manage short-term  and long-term memory for AI conversations, with features like automatic topic extraction, entity recognition, and context summarization.
  • Redis hybrid search combines full-text search with vector similarity search to deliver more relevant results.
  • A portfolio of native integrations for LangGraph has been specifically designed for agent architectures and agentic apps. Developers can use Redis to build a LangGraph agent’s short-term memory via checkpointers, and long-term memory via Store, vector database, LLM cache, and rate limiting.

Some Redis Cloud updates provide GenAI app-building facilities

  • Redis Data Integration (RDI) is Redis’ change data capture offering which and automatically syncs data between cache and database to deliver data consistency in milliseconds. 
  • Redis Flex on Cloud Essentials is Redis rearchitected to natively span across RAM and SSD, “delivering the fastest speeds from the first byte to the largest of dataset sizes. Developers can store up to 5X more data in their app and database for the same price as before.”
  • Redis Insight on Cloud: Developers can now view, update, query, and search the data in Redis directly from their browser. Redis Insight gives access to the Redis developer environment, including the Workbench and tutorials, and new query autocompletion which pulls in and suggests schema, index, and key names from Redis data in real-time to allow developers to write queries faster and easier.

The Vector Set will be included in the Redis 8 Community Edition beta, due May 1. RDI is in private preview -sign up here. Redis Flex is in public preview. A Redis blog discusses LandCache, Vector Sets and the various tools, etc. Another blog discusses Vector Sets in more detail. 

Keepit answers SaaS app backup scheme questions

How is self-hosted SaaS backup service business Keepit going to back up hundreds of different SaaS apps by 2028, starting from just seven this year? 

We asked its CTO, Jakob Østergaard, three questions to find out more, and this is what he said:

Jakob Østergaard.

Blocks & Files: Could Keepit discuss how its SaaS app connector production concept differs from that of HYCU (based around R-Cloud)

Jakob Østergaard: While we lack detailed insight into the concrete mechanics of how HYCU is adding workload support to its RCloud, we can at least offer some perspective into how Keepit approaches the problem.

In the early days, everyone started out the same way, implementing direct support for each workload using traditional software development methodologies; writing one line of code at a  time in their general purpose programming language of choice. We believe HYCU, Keepit and others in the industry started very similarly in this respect.

HYCU announced a push in 2023 to support new workloads with Generative AI. From an engineering standpoint this is an interesting idea. If this can be made to work, it would potentially be a major productivity boost, allowing the vendor to more quickly add new workloads.

However, the real-world challenges of supporting a new workload go far beyond the (potentially AI-supported) implementation of API interactions. A vendor will need to understand the workload’s ecosystem and, more importantly, the workload’s users.

To back up, say, Miro, merely interacting with the Miro API is only a small piece of the puzzle. One needs to understand how an enterprise uses Miro in order to build a solution that  properly addresses the customers’ needs. This, and many other equally complex deliberations, are not easily solved with AI today and therefore while this is an interesting idea, the reality is more complicated.

At Keepit, we have been focusing on improving the “developer ergonomics” of workload creation – so that in the future, we could allow second- or third- parties to develop new workload  support. Our focus is on removing the need for creating complicated code, rather than automating its creation.

To illustrate the approach Keepit has taken, it is perhaps most useful to compare it to SQL. A relatively simple SQL statement is developed and sent to the database server – the advanced  query planner in the SQL server then devises an actual executable plan, a piece of software if you will, that will produce the results described in the original SQL statement.

The benefit of this approach is that the amount of code that needs to be maintained (the “SQL statement” in the example) is minimal, and that the execution engine (the “query planner”  in the example) can be upgraded and improved without the need to rewrite the workload code.

It is clear that creating a workload can neither be fully automated, nor built with zero code. No matter the approach, there is no completely free lunch when considering adding workload  support. There are many possible ways to improve how more workloads can be supported by a platform, HYCU and Keepit have picked two of them

The future is certainly interesting – we will be watching which strategies players in the industry undertake in the future to broaden workload support. Keepit has been following its own strategy to more effectively add serious support for a broader set of workloads.

Blocks & Files: Who produces the SaaS app connectors using DSL – Keepit or the SaaS app supplier?

Jakob Østergaard: With our DSL [Domain-Specific Language] technology, Keepit is currently responsible for the development of new connectors. As the technology and tooling around it matures, there is a lot of future potential in allowing second- or third-party development of connectors, and there are a number of interesting business models that could support this. For the time being, however, Keepit does the development.

Blocks & Files: What’s involved in writing a DSL-based SaaS app connector?

Jakob Østergaard: Where “classical” connector development involves writing a lot of code in a general purpose programming language, the DSL being a “domain specific language”, can lend itself better  to the specific job of connector development.

For example, where typical programming languages (like C++ or Python) are strictly imperative (“if A then do B”), and some other languages (like SQL or Prolog) are declarative (“find  all solutions satisfying X criteria”), we have been able to mix and match the paradigms we needed the most into our DSL.

Therefore, there are places where we wish to describe relationships (in a declarative fashion) and have the system “infer” the appropriate actions to take during backup, and there  are other places where we write more classic imperative code. Having a language that naturally caters to the problem domain at hand, has the potential to offer significant productivity benefits in developing new connectors.

This has been pioneering work that we started working on more than a year ago. New technology takes time to mature and we are currently getting ready to release the first workloads built using this new technology. This technology will help us on our journey to support the hundreds of workloads that the average business today is already using in the cloud, and we are very excited to launch the first of these new workloads a little later this year.

China’s YanRong integrates KVCache with its filesystem to accelerate AI inferencing

By integrating KVCache into its filesystem, YanRong says it has dramatically improved the KV cache hit rates and long-context processing, making AI inferencing cheaper.

Chinese storage software supplier YanRong provides the YRCloudFile distributed shared file system for HPC and AI workloads. It supports all-flash drives and Nvidia’s GPUDirect protocol. The KVCache is a way of storing intermediate results during an AI model’s inferencing stage so that they don’t have to be recomputed at every stage, lengthening response time. 

We understand that the KVCache in the YRCloudFile system likely serves as a distributed in-memory layer across a cluster of GPU servers to store frequently accessed metadata; the key-value pairs.

To see how its YRCloudFile KVCache performs, YanRong simulated realistic workloads, using publicly available datasets, industry-standard benchmarking tools, and NVIDIA GPU hardware. It found that YRCloudFile KVCache supports significantly higher concurrent query throughput, and offers concrete, quantifiable value for inference workloads.

YanRong conducted multi-phase tests comparing native vLLM performance against vLLM plus YRCloudFile KVCache across varying token counts and configurations.

One test evaluated total response time for a single query with from 8,000 to c30,000 tokens as its context input. KVCached YRCloudFile provided a 3x to >13x performance improvement in TTFT (Time to First Token) as the context length increased:

A second test measured how many concurrent queries were supported with a TTFT value of 2 seconds or less:

It found YRCloudFile KVCache enabled 8x more concurrent requests compared to native vLLM.

A third test result showed that, under high concurrency, YRCloudFile KVCache achieved over 4x lower TTFT across different context lengths.

YanRong says that these results show “how extending GPU memory via distributed storage can break traditional compute bottlenecks – unlocking exponential improvements in resource utilization.” All-in-all, “YRCloudFile KVCache redefines the economics of AI inference by transforming storage resources into computational gains through PB-scale cache expansion.”

You can find more details here.

Comment

We think YRCloudFile with KVCache shares some similarities with WEKA’s Augmented Memory Grid (AMG). This is a software-defined filesystem extension, which provides exascale cache capacity at microsecond latencies with multi-terabyte-per-second bandwidth, delivering near-memory speed performance.  

A WEKA blog says it “extends GPU memory to a token warehouse in the WEKA Data Platform to provide petabytes of persistent storage at near-memory speed. … The token warehouse provides a persistent, NVMe-backed store for tokenized data, allowing AI systems to store tokens and retrieve them at near-memory speed.”

This “enables you to cache tokens and deliver them to your GPUs at microsecond latencies, driving the massive-scale, low-latency inference and efficient reuse of compute necessary for the next generation of AI factories.” The AMG is: “Persistently storing tokenized data in NVMe” and “tokens are stored, and pulled “off the shelf” at inference time, instead of continuously being re-manufactured on-demand for every single request.”

AMG “extends GPU memory into a distributed, high-performance memory fabric that delivers microsecond latency and massive parallel I/O – critical for storing and retrieving tokens at scale in real-time.”

A YanRong spokesperson told us: “As WEKA has not disclosed further details about their Augmented Memory Grid, we have no way to make a direct comparison between the two systems’ implementations. However, when it comes to the general purpose and the impact on LLM inferencing, both YRCloudFile KVCache and WEKA’s Augmented Memory Grid share a similar goal, which is to extend the expensive HBM to a persistent, high-bandwidth, low-latency, and scalable parallel file system, so that a large number of KVs needed during the inferencing phase can be cached in the storage, avoiding repeated calculations and improving overall performance. To achieve this goal, we need to implement a mechanism in our product so that vLLM or other inference framework can read and write KV data.”

Cohesity goes Googlewards with Gaia, Gemini and Mandiant

Data protector Cohesity is integrating its Gaia GenAI search assistant with Google’s Gemini AI model and its Agentspace, using Google Threat Intelligence and working with Mandiant on incident response and a Google Cloud recovery environment.

These announcements came at the ongoing Google Cloud Next 2025 event in Las Vegas. Gemini is a family of multi-modal large language models (LLMs) covering text, images, audio, video and code and trained on Google’s own TPU hardware. Google subsidiary Mandiant is a cyber-security business unit in Google’s Cloud division. 

Agentspace is a Gemini-powered center or hub linking LLMs, Gemini itself obviously included, with data sources such as Google Workspace, Salesforce, Jira and SharePoint. It helps with the creation of custom LoB-focussed agents to automate multi-step jobs and supports Google’s Agent2Agent (A2A) protocol for intra-agent comms. The complementary Anthropic MCP protocol supports agent-to-tool comms. 

Cohesity is adding its Gaia agent to Agentspace so it can be used to analyse customer’s Cohesity-generated and other proprietary data. It says customers will be able to search across enterprise data regardless of where it’s hosted, using secure data APIs. They’ll “be able to unlock advanced reasoning capabilities powered by Google Cloud’s Gemini models, enabling deeper insights and smarter decision-making.”

This Gaia Agentspace integration will improve compliance, data security and data discovery, plus the Gaia-Gemini model combo will provide “more intelligent data analysis, discovery, and management.”

There are four security aspects to the Google-Cohesity partnership:

  • Google Threat Intelligence integrated in the Cohesity Data Cloud will enable customers to “rapidly detect new threats in their backup data … [and] significantly improve Cohesity’s existing threat detection and incident response capabilities.” 
  • Cohesity’s Cyber Events Response Team (CERT)  and Google’s Mandiant Incident Response teams can now work together to help customers minimise business downtime during incidents. The two provide more comprehensive incident response engagements for joint customers. Using data from Cohesity, Mandiant can expedite the containment, investigation, and mitigation of an attack from  the customer’s primary infrastructure, while Cohesity secures the backup infrastructure. 
  • Cohesity customers can work with Mandiant to establish, secure, and validate a Cloud Isolated Recovery Environment (CIRE) in Google Cloud before an incident occurs. 
  • Cohesity Data Cloud integration with Google’s security operations “for improved data resiliency and enhanced security posture management. 

The Google Threat Intelligence service uses Mandiant Frontline threat knowledge, the VirusTotal crowd-sourced malware database, Google’s own threat expertise and awareness plus Gemini AI model-powered analysis to alert users to new threats. It’s available on its own or integrated into Google Security Operations.

Cohesity’s Integrations with Google Cloud for cyber resilience, AI model data sourcing and analysis are expected to be available by the summer. Its incident response partnership with Mandiant and the integration of the Cohesity Data Cloud with Google’s Security Operations are available now.

There’s more in a Cohesity blog.

Bootnote

Cohesity competitor Rubrik has also announced a way for its customers to establish a Cloud-based Isolated Recovery Environment (CIRE) by working with Mandiant. You can read about this in a Rubrik blog.

Rubrik’s Annapurna feature provides data from its Rubrik Security Cloud to large language model (LLM) AI Agents. Annapurna will use Agentspace to provide “easy, secure access to data across cloud, on-premises, and SaaS environments.” It will offer:

  • API-driven secure access to enterprise-wide data for AI training and retrieval
  • Anomaly detection and access monitoring to prevent AI data leaks and unauthorized use
  • Seamless AI data pipelines to combine Google Cloud AI models with enterprise data
  • Automated compliance enforcement to protect sensitive AI training data 

Rubrik also tells us it’s been pronounced the 2025 Google Cloud Infrastructure Modernization Partner of the Year for Backup and Disaster Recovery.

Snowflake tethers itself to Iceberg

Cloud data warehouser Snowflake is supporting the Apache Iceberg open table format alongside its own native data table formats.

Iceberg is an open source table format for large-scale datasets in data lakes, layered above storage systems like Parquet, ORC, and Avro, and cloud object stores such as AWS S3, Azure Blob, and the Google Cloud Store. It provides database-like features to data lakes, such as ACID support, partitioning, time travel, and schema evolution, and enables SQL querying of data lake contents. 

Christian Kleinerman

Christian Kleinerman, Snowflake’s EVP of Product, stated: “The future of data is open, but it also needs to be easy.” 

“Customers shouldn’t have to choose between open formats and best-in-class performance or business continuity. With Snowflake’s latest Iceberg tables innovations, customers can work with their open data exactly as they would with data stored in the Snowflake platform, all while removing complexity and preserving Snowflake’s enterprise-grade performance and security.”

Snowflake says that, until now, organizations have either relied on integrated platforms, like Snowflake, to manage their data or use open, interoperable data formats like Parquet. It says its Iceberg support means that “customers now gain the best of both worlds. Users can store, manage, and analyze their data in an open, interoperable format, while still benefiting from Snowflake’s easy, connected, and trusted platform.”

Snowflake with Iceberg accelerates lakehouse analytics, applying its compute engine to Iceberg tables with two go-faster features coming soon: a Search Optimization service and a Query Acceleration Service. It is extending its data replication and syncing to Iceberg tables; now in private preview, so that customers can restore their data in the event of a ​​system failure, cyberattack, or other disaster.

Snowflake says it’s working with the Apache Iceberg community to launch support for VARIANT data types. It’s also focused on working with other open source projects.

In June last year Snowflake announced its Polaris Catalog, a managed service for Apache Polaris and a vendor-neutral, open catalog implementation for Apache Iceberg. Apache Polaris is an open-source catalog for Apache Iceberg, implementing Iceberg’s REST API, enabling multi-engine interoperability across a range of platforms, including an Apache trio: Doris, Flink, and Spark, plus Dremio, StarRocks, and Trino. Now Snowflake is getting even closer to Iceberg.

Other Snowflake open source activities include four recent acquisitions:

  • Apache NiFi: Datavolo (acquired by Snowflake in 2024) and built on NiFi, simplifies ingestion, transformation, and real-time pipeline management.
  • Modin: Snowflake accelerates pandas workloads with Modin (acquired by Snowflake in 2023), enabling seamless scaling without code change.
  • Streamlit: Snowflake’s integration with Streamlit (acquired by Snowflake in 2022) allows users to build and share interactive web applications, data dashboards, and visualizations with ease.
  • TruEra: TruEra (acquired by Snowflake in 2024) boosts AI explainability and model performance monitoring for bias detection, compliance, and performance insights.

Competitor Databricks acquired Tabular, and its data management software layer based on Iceberg tables, last year. Iceberg and Databricks’ Delta Lake are both based on Apache Parquet. Snowflake has now, like Databricks, recognized Iceberg is beginning to dominate.

The Ultra Accelerator Link Consortium has released its first spec

The Ultra Accelerator Link Consortium has released its 200G v1.0 spec – meaning competition for Nvidia’s BasePOD and SuperPOD GPU server systems from pods containing AMD and Intel GPUs/accelerators is coming closer.

The UALink consortium was set up in May last year to define a high-speed, low-latency interconnect specification for close-range scale-up communications between accelerators and switches in AI pods and clusters. It was incorporated in October 2024 by AMD, Astera Labs, AWS, Cisco, Google, HPE, Intel, Meta, and Microsoft. Alibaba Cloud Computing, Apple and Synopsis joined at board level in January this year. Other contributor-level members include Alphawave Semi, Lenovo, Lightmatter and, possibly, Samsung. We understand there are more than 65 members in total.

The members want to foster an open switch ecosystem for accelerators as an alternative to Nvidia’s proprietary NVLink networking. This v1.0 spec enables 200G per lane scale-up connection for up to a theoretical 1,024 accelerators in a pod. Nvidia’s NVLink supports up to 576 GPUs in a pod.

Kurtis Bowman

Kurtis Bowman, UALink Consortium Board Chair and Director, Architecture and Strategy at AMD, stated: “UALink is the only memory semantic solution for scale-up AI optimized for lower power, latency and cost while increasing effective bandwidth. The groundbreaking performance made possible with the UALink 200G 1.0 Specification will revolutionize how Cloud Service Providers, System OEMs, and IP/Silicon Providers approach AI workloads.”

This revolution depends first and foremost on UALink-supporting GPUs and other accelerators from AMD and Intel being used in preference to Nvidia products by enough customers to make a dent in the GPU/accelerator market.

NVLink is used by Nvidia as a near-or close-range link between CPUs and GPUs and between its GPUs. It’s a point-to-point mesh system which can also use an NVSwitch as a central hub.

UALink v1.0 provides a per-lane bidirectional data rate of 200 GTps (200 GBps) and allows 4 lanes per accelerator connection, meaning the total connection bandwidth is 800 GBps.

NVLink4.0, the Hopper GPU generation, delivers 900 GBps aggregate bidirectional bandwidth across 18 links, each running at 50 GBps. This is 100 GBps more than UALink v1.0.

NVLink v5.0, as used with Blackwell GPUs, provides 141 GBps per bidirectional link and up to 18 links per GPU connection, meaning a total of  2,538 GBps per connection, more than 3 times higher than UALink v1.0.

NVLink offers higher per-GPU bandwidth by supporting more links (lanes) than UALInk, which can, in theory, scale out to support more GPUs/accelerators than NVLink.

Should Nvidia be worried about UALink? Yes, if UALink encourages customers to use non-Nvidia GPUs. How might it respond? A potential Rubin GPU generation NVLink 6.0 could increase per-link bandwidth to match/exceed UALink’s 200 GBps and also extend scalability out to the 1,024 area. That could be enough to prevent competitive inroads into Nvidia’s customer base, unless its GPUs fall behind those of AMD and Intel.

UALink v1.0 hardware is expected in the 2026/2027 period, with accelerators and GPUs from, for example, AMD and Intel supporting it along with switches from Astera Labs and Broadcom.

You can download an evaluation copy of the UALInk v1.0 spec here.

Google Cloud’s NetApp Volumes will link to Vertex AI

Google Cloud and NetApp are extending the NetApp Volumes storage service to work better with Vertex AI, support larger data sets, separately scale capacity and performance, and meet regional compliance needs,

Google Cloud NetApp Volumes, GCNV for short, is a fully managed file service based on NetApp’s ONTAP operating system running on the Google Cloud Platform as a native GCP service. It supports NFS V3 and v4.1, and SMB, and provides snapshots, clones, replication, and cross-region backup. Google’s Vertex AI is a combined data engineering, data science, and ML engineering workflow platform for training, deploying, and customizing large language models (LLMs), and developing AI applications. It provides access to Google’s Gemini models, which work with text, images, video, or code, plus other models such as Anthropic’s Claude and Llama 3.2.

Pravjit Tiwana.

NetApp SVP and GM for Cloud Storage, Pravjit Tiwana, states: ”Our collaboration with Google Cloud is accelerating generative AI data pipelines by seamlessly integrating the latest AI innovations with the robust data management capabilities of NetApp ONTAP.”

He reckons: “The new capabilities of NetApp Volumes help customers scale their cloud storage to meet the demands of the modern, high-performance applications and datasets.”

The new capabilities in detail are:

  • Coming NetApp Volumes integration with Google Cloud’s Vertex AI Platform: so customers will be able to build custom agents without needing to build their own data pipeline management for retrieval augmented generation (RAG) applications.
  • Improvements for Premium and Extreme Service Levels in all 14 regions where the Premium and Extreme service levels are offered. Customers can now provision a single volume starting at 15TiB that can be scaled up to 1PiB with up to 30 GiB/s of throughput. This means customers can move petabyte-scale datasets for workloads like EDA, AI applications, and content data repositories to NetApp Volumes without partitioning data across multiple volumes. 
  • Flex Service Level previewing of independent scaling of capacity and performance to avoid over-provisioning of capacity to meet their performance needs with the NetApp Volumes Flex service level. Users can create storage pools by individually selecting capacity, throughput and IOPS with the ability to scale throughput up to 5 GiB/s and IOPS up to 160K to optimize costs. 
  • NetApp Volumes will soon support the Assured Workloads framework that Google Cloud customers use to configure and maintain controlled environments operating within the parameters of a specific compliance regime, meeting the data residency, transparent access control, and cloud key management requirements specific to their region.

GCNV flex, standard, premium and extreme service level offerings can be researched here. The GCNV-Vertex AI integration is coming “soon.”

Proprietary data in GCNV will be able to be used via Vertex AI to implement model agent RAG capabilities. 

NetApp has received the 2025 Google Cloud Infrastructure Modernization Partner of the Year for Storage award, which is a nice pat on the back.

Sameet Agarwal, Google Cloud Storage GM and VP, said: “Organizations can leverage their NetApp ONTAP on-premises data and hybrid cloud environments. By combining the capabilities of Google Cloud’s Vertex AI platform with Google Cloud NetApp Volumes, we’re delivering a powerful solution to help customers accelerate digital transformation and position themselves for long-term success.”

Google Cloud offering managed Lustre service with DDN

DDN is partnering Google Cloud with its Google Cloud Managed Lustre, powered by DDN offering.

The Lustre parallel file system enables Google Cloud to offer file storage and fast access services for enterprises and startups building AI, GenAI, and HPC applications. It provides up to to 1 TB/s throughput and can scale from terabytes to petabytes.

Alex Bouzari, Co-Founder and CEO of DDN, bigged this deal up by stating: “This partnership between DDN and Google Cloud is a seismic shift in AI and HPC infrastructure—rewriting the rules of performance, scale, and efficiency. … we’re not just accelerating AI—we’re unleashing an entirely new era of AI innovation at an unprecedented scale. This is the future, and it’s happening now.”

DDN says on-prem Lustre customers “can now extend their AI workloads to the cloud effortlessly.”

You might think that this is a revolution but, one, Google already has Lustre available on its cloud, just not as a managed service, and, two, its main competitors also offer Lustre services.

Google’s existing Lustre on GCP can be set up using deployment scripts or through DDN’s EXAScaler software, built on Lustre, which is available through the Google Cloud marketplace. Now it has moved on with this fully managed Lustre service offering which makes it easier for its customers to use Lustre.

AWS offers FSx for Lustre as well as FSx for OpenZFS and BeeGFS on AWS. Azure also offers Azure Managed Lustre plus BeeGFS on Azure and GlusterFS on Azure. You are spoilt for choice.

Google Cloud Managed Lustre (GCML) links to Google Cloud’s Compute Engine, GKE (Google Kubernetes Engine), Cloud Storage and other services for an integrated deployment. DDN and Google say it can speed up data pipelines for AI model training, tuning and deployment, and enable real-time inferencing.

The Google Cloud also has DAOS-powered ParallelStore available, DAOS being the open source Distributed Asynchronous Object Storage parallel file system.

GCML comes with 99.999 percent uptime and has a scalable pricing scheme. It can be seen at the Google Cloud Next 2025 event at the Mandalay Bay Convention Center, Las Vegas, April 9 to 11, where DDN is also demoing its Infinia object storage software.

Dell refreshes storage and server lines for AI workloads

Against a background of disaggregated IT and rising AI trends, Dell has announced refreshes of its PowerEdge, PowerStore, ObjectScale, PowerScale, and PowerProtect storage systems.

Dell is announcing both server and storage advances. It says its customers need to support existing and traditional workloads as well as provide IT for generative AI tasks. A disaggregated server, storage, and networking architecture is best suited for this and builds on three-tier and hyperconverged infrastructure designs, with separate scaling for the three components collected together in shared resource pools.

Arthur Lewis

Dell Infrastructure Solutions Group president Arthur Lewis stated: “From storage to servers to networking to data protection, only Dell Technologies provides an end-to-end disaggregated infrastructure portfolio that helps customers reduce complexity, increase IT agility, and accelerate datacenter modernization.” 

Dell’s PowerEdge R470, R570, R670, and R770 servers are equipped with Intel Xeon 6 processors with performance cores. These are single and double-socket servers in 1U and 2U form factors designed for traditional and emerging workloads like HPC, virtualization, analytics, and AI inference.

Our focus here is on the storage product announcements, which cover the unified file and block PowerStore arrays, cloud-native ObjectScale, scale-out clustered PowerScale filer system, and Dell’s deduplicating backup target PowerProtect systems developed from prior Data Domain arrays.

PowerStore

A PowerStore v4.1 software release provides AI-based analytics to detect potential issues before they occur, auto support ticket opening, carbon footprint forecasting, DoD CAC/PIV smart card support, automated certificate renewal and improved PowerProtect integration through Storage Direct Protection. This enables up to 4x faster backup restores and support for the latest PowerProtect systems; the DD6410 appliance and All-Flash Ready Nodes (see below).

Dell PowerStore node

The software provides better storage efficiency tracking, now covering both file and block data, and ransomware-resistant snapshots, supplementing the existing File Level Retention (FLR) and other local, remote, and cloud-based protection methods.

It offers file system QoS with more granular performance controls. Dell Unity customers migrating to PowerStore can preserve their existing Cloud Tiering Appliance (CTA) functionality. Archived files remain fully accessible, and customers can create new archiving policies for migrated file systems on PowerStore. 

Read a PowerStore 4.1 blog for more details.

ObjectScale

ObjectScale is scale-out, containerized object storage software running on ECS hardware nodes. Up until now there were three ECS hardware boxes: EX500 (12-24 HDDs, to 7.68 PB/rack), EX5000 (to 100 HDDs, to 14 PB/rack) and all-flash EXF900 (12-24 NVMe SSDs, to 24.6 PB/rack).

New ObjectScale v4.0 software boasts smart rebalancing, better space reclamation, and capacity utilization. It also has expanded system health metrics, alerting, and security enhancements. Dell claims it offers “the world’s most cyber-secure object storage.”

There are two new ObjectScale systems. The all-flash XF960 is said to be designed for AI workloads and is an evolution of the EXF900. It has extensive hardware advances based on PowerEdge servers and delivers up to 2x greater throughput per node than the closest but unnamed competitor, and up to 8x more density than the EXF900. 

ObjectScale X560 top and XF960 bottom

The HDD-based X560 accelerates media, backup, and AI model training ingest workloads with 83 percent higher small object read throughput than the EX500 running v3.8 software.

Dell is partnering with S3-compatible cloud storage supplier Wasabi to introduce Wasabi with Dell ObjectScale, a hybrid cloud object storage service with tiers starting from 25 TB of reserved storage per month. Wasabi has a global infrastructure, with more than 100,000 customers in 15 commercial and two government cloud regions worldwide.

More ObjectScale news is expected at the upcoming Dell Technologies World conference.

PowerScale

PowerScale all-flash F710 and F910 nodes get 122 TB Solidigm SSD support, doubling storage density. This, with 24 bays in their 2RU chassis and 2:1 data reduction, provides almost up to 6 PB of effective capacity per node. Dell says it’s the first supplier to offer an enterprise storage system with such SSDs.

Dell PowerScale F910 (top), A310 (middle), H7100 (bottom)

The PowerScale archive A and hybrid H series nodes – H710, H7100, A310, A3100 – have lower latencies and faster performance with a refreshed compute module for HDD-based products. Dell says the A-Series is optimized for TCO, while the H-series provides a balanced cost/performance mix. The  updated nodes feature:

  • Fourth-generation Intel Xeon Sapphire Rapids CPUs
  • DDR5 DRAM with up to 75 percent greater speed and bandwidth
  • NVMe M.2 persistent flash vault drives providing faster cache destage and recovery
  • Improved thermal operation reducing heat and stress on components
  • Updated drive carrier with 100 percent greater speed for SAS drives

Dell will introduce support for 32 TB HAMR disk drive technology later this year with “extended useful life.”

A PowerScale 1RU A110 Accelerator Node is a successor to the previous generation P100 and B100 performance and backup accelerators. It’s designed to solve CPU bottlenecks and boost overall cluster performance with higher cluster bandwidth. The A110 can be independently scaled in single node increments.

PowerProtect

There are three main developments here. First, the PowerProtect DD6410 is a new entry-level system with a capacity of 12 TB to 256 TB. It’s aimed at commercial, small business, and remote site environments, with up to 91 percent faster restores than the DD6400, up to 65x deduplication, and scalability for traditional and modern workloads. 

Secondly, the PowerProtect All-Flash Ready Node has 220 TB capacity with over 61 percent faster restore speeds, up to 36 percent less power, and a 5x smaller footprint than the PowerProtect DD6410 appliance. It does not support the 122 TB SSDs, built with QLC 3D NAND, because their write speed is not fast enough.

Both the DD6410 and All-Flash Ready Node support the Storage Direct Protection integration with PowerStore and PowerMax, providing faster, efficient, and secure backup and recovery.

PowerProtect DD6410 (top) and All-Flash Ready Node (bottom)

Thirdly, a PowerProtect DataManager software update reduces cyber-security risks with anomaly detection. This has “machine learning capabilities to identify vulnerabilities within the backup environment, enabling quarantine of compromised assets. It provides early insights in detecting threats in the backup environment while complementing the CyberSense deep forensics analysis of isolated recovery data in the Cyber Recovery vault, providing end-to-end cyber resilience of protected resources.”

As well as VMware, DataManager now manages Microsoft Hyper-V and Red Hat OpenShift Virtualization virtual machine backups. A suggestion of future Nutanix AHV support to Dell received a positive acknowledgement as a possibility.

DataManager archives data to ObjectScale for long-term retention. This is not tiering with a stub left behind. The archived data can be restored directly without first being rehydrated to a PowerProtect system. The archiving is to S3-compatible object stores.

DataManager also has Multi-System Reporting which offers centralized visibility and control across up to 150 PowerProtect Data Manager instances.

Availability

  • PowerProtect Data Manager updates are available now.
  • PowerEdge R470, R570, R670, and R770 servers are available now.
  • PowerStore software updates are available now.
  • ObjectScale is available now as a software update for current Dell ECS environments.
  • HDD-based ObjectScale X560 will be available April 9, 2025.
  • All-Flash ObjectScale XF960 appliances will be available beginning in Q3 2025.
  • The Wasabi with Dell ObjectScale service is available in the United States. UK availability begins this month, with expansion into other regions planned in the coming months.
  • PowerScale HDD-based nodes will be available in June 2025.
  • PowerScale with 122 TB drives will be available in May 2025.
  • PowerProtect DD6410 and All-Flash Ready Node will be available in April 2025.

ExaGrid posts record Q1 as on-prem backup demand climbs

As the on-premises backup target market grows, so too does ExaGrid – which just posted its best ever Q1.

The company supplies deduplicating backup appliances with a non-deduped landing zone for faster restores of recent data. Deduped data is moved to a non-network-facing area for further protection. Its appliances can be grouped with cross-appliance deduplication raising storage efficiency.

Bill Andrews, ExaGrid
Bill Andrews

At the end of 2025’s first quarter ExaGrid was was free cash flow (FCF) positive, P&L positive, and EBITDA positive for its 17th consecutive quarter and has no debt. CEO Bill Andrews emphasized this, telling us: “We have paid off all debt. We have zero debt. We don’t even have an account receivable line of credit (don’t need it).”

It recruited 155 new logos, taking its total well past 4,600 active upper mid-market to large enterprise customers. The company says it continues to have 75 percent of its new logo customer bookings come from six- and seven-figure purchase orders. Andrews tells us: “For the last 8 quarters, each quarter 75 percent of our new logo customer bookings dollars come from deals over $100K and over $1M. Only 25 percent of new customer bookings dollars come from deals under $100K.”

Andrews stated: “ExaGrid continues to profitably grow as it keeps us on our path to eventually becoming a billion-dollar company. We are the largest independent backup storage vendor and we’re very healthy … ExaGrid continues to have an over 70 percent competitive win rate replacing primary storage behind the backup application, as well as inline deduplication appliances such as Dell Data Domain and HPE StoreOnce.”

The company has a 95 percent net customer retention rate and an NPS score of +81. Andrews tells us: “Our customer retention is growing and is now at 95.3 percent. We think perfection is 96 percent because you can’t keep every customer as some go out of business, some get acquired, some move everything to the cloud, etc.”

For ExaGrid’s top 40 percent customers, its largest, “we have a 98 percent retention which is very high for storage.” He adds: “99 percent of our customers are on maintenance and support, also very high for the industry.”

The 5,000 customer level is in sight and Andrews left us with this thought: “Things are going well and shy of the tariffs throwing us into a depression, we should have yet another record bookings and revenue year. … The goal is to keep growing worldwide as there is a lot of headroom in our market.”

Bootnote

For reference Dell has more than 15,000 Data Domain/Power Protect customers.

Qumulo cranks up AI-powered NeuralCache

Qumulo has added a performance-enhancing NeuralCache predictive caching feature to its Cloud Data Fabric.

The Cloud Data Fabric (CDF) was launched in February and has a central file and object data core repository with coherent caches at the edge. The core is a distributed file and object data storage cluster that runs on most systems, vendors, or public cloud infrastructures. Consistency between the core and edge sites comes from file system awareness, block-level replication, distributed locking, access control authentication, and logging.

NeuralCache uses a set of supervised AI and machine learning models to dynamically optimize read/write caching, with Qumulo saying it’s “delivering unparalleled efficiency and scalability across both cloud and on-premises environments.”

Kiran Bhageshpur

CTO Kiran Bhageshpur states: “The Qumulo NeuralCache redefines how organizations manage and access massive datasets, from dozens of petabytes to exabyte-scale, by adapting in real-time to multi-variate factors such as users, machines, applications, date/time, system state, network state, and cloud conditions.”

NeuralCache, Qumulo says, “continuously tunes itself based on real-time data patterns. Each cache hit or miss refines the model, improving efficiency and performance as more users, machines, and AI agents interact with it.”

It “intelligently stacks and combines object writes, minimizing API charges in public cloud environments while optimizing I/O read/write cycles for on-premises deployments – delivering significant cost savings without compromising durability or latency.”

The NeuralCache software “automatically propagates changed data blocks in response to any write across the Cloud Data Fabric” and “users, machines, and AI agents always access the most current data.”

Bhageshpur says this “enhances application performance and reduces latency while ensuring data consistency, making it a game-changer for industries relying on data-intensive workflows, including AI research, media production, healthcare, pharmaceutical discovery, exploratory geophysics, space and orbital telemetry, national intelligence, and financial services.”

Qumulo says NeuralCache excels at dataset scales from 25 PB to multiple exabytes, “learning and improving as data volume and workload complexity grows.”

This predictive caching software was actually included in the February CDF release, but a Qumulo spokesperson told us it “wasn’t fully live and we were just referring to it generically as ‘Predictive Caching.’ Since then, we have had a customer test it out and provide feedback like a Beta test. And we formally named it NeuralCache.”

Interestingly, high-end storage array provider Infinidat has a caching feature that is similarly named but based on its array controller’s DRAM. Back in June 2020, we wrote that its array software has “data prefetched into a memory cache using a Neural Cache engine with predictive algorithms … The Neural Cache engine monitors which data blocks have been accessed and prefetches adjacent blocks into DRAM.” It enables more than 90 percent of the array data reads to be satisfied from memory instead of from much slower storage drives.

Despite the similarity in naming, however, Qumulo’s NeuralCache tech is distinct from Infinidat’s patented Neural Cache technology

Qumulo’s NeuralCache is available immediately as part of the vendor’s latest software release and is seamlessly integrated into the Qumulo Cloud Data Fabric. Existing customers can upgrade to it with no downtime. Find out more here.

Starburst CEO: In AI, it’s data access that wins

Interview: Startup Starburst develops and uses Trino open source distributed SQL to query and analyze distributed data sources. We spoke to CEO Justin Borgman about the company’s strategy.

A little history to set the scene, and it starts with Presto. This was a Facebook (now Meta) open source project from 2012 to provide analytics for its massive Hadoop data warehouses by using a distributed SQL query engine. It could analyze Hadoop, Cassandra, and MySQL data sources and was open sourced under the Apache license in 2013.

The four Presto creators – Martin Traverso, Dain Sundstrom, David Phillips, and Eric Hwang – left in 2018 after disagreements over Facebook’s influence on Presto governance. 

They then forked the Presto code to PrestoSQL. Facebook donated Presto to the Linux Foundation in 2019, which then set up the Presto Foundation. By then, thousands of businesses and other organizations were Presto users. PrestoSQL was rebranded to Trino to sidestep potential legal action after Facebook obtained the “Presto” trademark. The forkers set up Starburst in 2019, with co-founder and CEO Justin Borgman, to supply Trino and sell Trino connectors and support. 

Borgman co-founded SQL-on-Hadoop company Hadapt in 2010. Hadapt was bought by Teradata in 2014 with Borgman becoming VP and GM of its Hadoop portfolio unit. He resigned in 2019 to join the other Starburst founders.

Eric Hwang is a distinguished engineer at Starburst. David Phillips and Dain Sundstrom both had CTO responsibilities, but they left earlier this year to co-found IceGuard, a stealth data security company. Martin Traverso is Starburst’s current CTO.

Starburst graphic

Starburst has raised $414 million over four rounds in 2019 ($22 million A-round), 2020 ($42 million B-round), 2021 ($100 million C-round), and 2022 ($250 million D-round).

It hired additional execs in early 2024 and again later that year to help it grow its business in the hybrid data cloud and AI areas.

Earlier this year, Starburst reported its highest global sales to date, including significant growth in North America and EMEA, with ARR per customer over $325,000. There was increased adoption of Starburst Galaxy, its flagship cloud product, by 94 percent year-over-year, and it signed its largest ever deal – a multi-year, eight-figure contract per year, with a global financial institution.  

Blocks and Files: Starburst is, I think, a virtual data lakehouse facility in that you get data from various sources and then feed it upstream to whoever you need to.

Justin Borgman, Starburst
Justin Borgman

Justin Borgman: Yeah, I like that way of thinking about it. We don’t call ourselves a virtual lakehouse, but it makes sense.

Blocks and Files: Databricks and Snowflake have been getting into bed with AI for some time, with the last six to nine months seeing frenetic adoption of large language models. Is Starburst doing the same sort of thing?

Justin Borgman: In a way, yes, but maybe I’ll articulate a couple of the differences. So for us, we’re not focusing on the LLM itself.

We’re basically saying customers will choose their own LLM, whether that’s OpenAI or Anthropic or whatever the case may be. But where we are playing an important role is in those agentic RAG workflows that are accessing different data sources, passing that on to the LLM to ensure accurate contextual information. 

And that’s where we think we actually have a potential advantage relative to those two players. They’re much larger than us, so I can see they’re further along. But as you pointed out, we have access to all the data in an enterprise, and I think in this era of agents and AI, it’s really whoever has the most data that wins, I think, at the end of the day. And so that’s really what we provide is access to all of the data in the enterprise, not just the data in one individual lake or one individual warehouse, but all of the data.

Blocks and Files: That gives me two thoughts. One is that you must already have a vast number of connectors connecting Starburst to data sources. I imagine an important but background activity is to make sure that they’re up to date and you keep on connecting to as many data sources as possible.

Justin Borgman: That’s right.

Blocks and Files: The second one is that you are going to be, I think, providing some kind of AI pipeline, a pipeline to select data from your sources, filter it in some way. For instance, removing sensitive information and then sending it upstream, making it available. And the point at which you send it upstream and say Starburst’s work stops could be variable. For example, you select some filters, some data from various sources, and there it is sitting in, I guess, some kind of table format. But it’s raw data, effectively, and the AI models need it tokenized. They need it vectorized, which means the vectors have to be stored someplace and then they use it for training or for inference. So where does Starburst activity stop?

Justin Borgman: Everything you said is right. I’m going to quantify that a little bit. So we have over 50 connectors to your earlier point. So that covers every traditional database system you can think of, every NoSQL database, basically every database you can think of. And then where we started to expand is adding large SaaS providers like Salesforce and ServiceNow and things of that nature as well. So we have access to all those things. 

You’re also correct that we provide access control across all of those and very fine grain. So row level, column level, we can do data masking and that is part of the strength of our platform, that the data that you’re going to be leveraging for your AI can be managed and governed in a very fine-grained manner. So that’s role-based and attribute-based access controls. 

To address your question of where does it stop, the reason that’s such a great question is that actually in May, we’re going to be making some announcements of going a bit further than that, and I don’t want to quite scoop myself yet, but I’ll just say that I think in May you will see us doing pretty much the entire thing that you just described today. I would say we would stop before the vectorization and that’s where we stop today.

Blocks and Files:  I could see Starburst, thinking we are not a database company, but we do access stored vaults of data, and we probably access those by getting metadata about the data sources. So when we present data upstream, we could either present the actual data itself, in which case we suck it up from all our various sources and pump it out, or we just use the metadata and send that upstream. Who does it? Do you collect the actual data and send it upstream or does your target do that?

Justin Borgman: So we actually do both of the things you described. First of all, what we find is a lot of our customers are using an aspect of our product that we call data products, which is basically a way of creating curated datasets. And because, as you described it, we’re this sort of virtual lakehouse, those data products can actually be assembled from data that lives in multiple sources. And that data product is itself a view across those different data sources. So that’s one layer of abstraction. And in that case, no data needs to be moved necessarily. You’re just constructing this view. 

But at the end of the day, when you’re executing your RAG workflows and you’re passing data on, maybe as a prompt, to an LLM calling an LLM function, in those cases, we can be moving data. 

Blocks and Files: If you are going to be possibly vectorizing data, then the vectors need storing someplace, and you could do that yourself or you could ring up Pinecone or Milvus or Weaviate. Is it possible for you to say which way you are thinking?

Justin Borgman: Your questions are spot on. I’m trying to think of what I should say here … I’ll say nothing for today. Other than that, that is a perfect question and I will have a very clear answer in about six weeks.

Blocks and Files: If I get talking to a prospect and the prospect customer says, yes, I do have data in disparate sources within the individual datacenters and across datacenters and in the public cloud and I have SaaS datasets, should I then say, go to a single lakehouse data warehouse supplier, for example, Snowflake or Databricks or something? Or should I carry on using where my data currently is and just virtually collect it together as and when is necessary with, for example, Starburst? What are the pros and cons of doing that?

Justin Borgman: Our answer is actually a combination of the two, and I’ll explain what I mean by that. So we think that storing data in object storage in a lake in open formats like Iceberg tables is a wonderful place to store large amounts of data. I would even say as much as you reasonably can because the economics are going to be ideal for you, especially if you choose an open format like Iceberg, because the industry has decided that Iceberg is now the universal format, and that gives you a lot of flexibility as a customer. So we think data lakes are great. However, we also don’t think it is practical for you to have everything in your lake no matter what. Right? It is just a fantasy that you’ll never actually achieve. And I say this partly from my own experience…

So we need to learn from our past mistakes. And so I think that the approach has to have both. I think a data lake should be a large center of gravity, maybe the largest individual center of gravity, but you’re always going to have these other data sources, and so your strategy needs to take that into account.

I think that the notion that you have to move everything into one place to be able to have an AI strategy is not one that’s going to work well for you because your data is always going to be stale. It’s never going to be quite up to date. You’re always going to have purpose-built database systems that are running your transactional processing and different purposes. So our approach is both. Does that make sense?

Blocks and Files: It makes perfect sense, Justin. You mentioned databases, structured data. Can Starburst support the use of structured data in block storage databases?

Justin Borgman: Yes, it can.

Blocks and Files: Do you have anything to do or any connection at all with knowledge graphs for representing such data?

Justin Borgman: We do have connectors to a couple of different graph databases, so that is an option, but I wouldn’t say it’s a core competency for us today.

Blocks and Files: Stepping sideways slightly. Backup data protection companies such as Cohesity and Rubrik will say, we have vast amounts of backed-up data in data stores, and we’re a perfect source for retrieval-augmented generation. And that seems to me to be OK, up to a point. If you met a prospect who said, well, we’ve got lots of information in our Cohesity backup store, we’re using that for our AI pipelines, what can you do there? Or do you think it is just another approach that’s got its validity, but it’s not good enough on its own?

Justin Borgman: From our customer base, I have not seen a use case that was leveraging Cohesity or Rubrik as a data source, but we do see tons of object storage. So we have a partnership in fact with Dell, where Dell is actually selling Starburst on top of their object storage, and we do work with Pure and MinIO and all of these different storage providers that have made their storage really S3 compatible. It looks like it’s S3, and those are common data sources, but the Cohesity and Rubriks of the world, I haven’t seen that. So I’m not sure if the performance would be sufficient. It’s a fair question, I don’t know, but probably the reason that I haven’t seen it would suggest there’s probably a reason I haven’t seen it, is my guess.

Blocks and Files: Let’s take Veeam for a moment. Veeam can send its backups to object storage, which in principle gives you access to that through an S3-type connector. But if Veeam sends its backups to its own storage, then that becomes invisible to you unless you and Veeam get together and build a connector to it. And I daresay Veeam at that point would say, nice to hear from you, but we are not interested.

Justin Borgman: Yes, I think that’s right.

Blocks and Files: Could I take it for granted that you would think that although a Cohesity/Rubrik-style approach to providing information for RAG would have validity, it’s not real-time and therefore that puts the customers at a potential disadvantage?

Justin Borgman: That’s my impression. Yes, that’s my impression.