Storage news ticker – June 7

Storage news
Storage news

Open source data mover Airbyte announced the launch of its Snowflake Cortex connector for Snowflake users who are interested in building generative AI (GenAI) capabilities directly within their existing Snowflake accounts. With no coding required, users can create a dedicated vector store within Snowflake (compatible with OpenAI) and load data from more than 300 sources. Users can create a new Airbyte pipeline in a few minutes. The Airbyte protocol handles incremental processing automatically, ensuring that data is always up to date without manual intervention.

Data analyzer Amplitude announced GA of its Snowflake native offering. This allows companies to use Amplitude’s product analytics capabilities without their data ever leaving Snowflake, making it faster and easier to understand what customers are doing, build better products, and drive growth. 

Data protector Cohesity announced that Cohesity Data Cloud now supports AMD EPYC CPU-powered all-flash and hybrid servers from Dell, HPE, and Lenovo.

IDC’s “China Enterprise Solid-State Drive Market Share, 2023” report reveals that DapuStor has secured the fourth position in China’s enterprise SSD market share for 2023 (PCIe/SATA included). DapuStor serves more than 500 customers, spanning telecommunications operators, cloud computing, internet, energy and power, finance, and banking sectors, covering a range of datacenters and intelligent computing centers. 

Lakehouse supplier Databricks announced new and expanded strategic partnerships for data sharing and collaboration with industry-leading partners, including Acxiom, Atlassian, Epsilon, HealthVerity, LiveRamp, S&P Global, Shutterstock, T-Mobile, Tableau, TetraScience, and The Trade Desk. Data sharing has become critically important in the digital economy as enterprises need to easily and securely exchange data and AI assets. Collaboration on the Databricks Data Intelligence Platform is powered by Delta Sharing, an open, flexible, and secure approach for sharing live data to any recipient across clouds, platforms, and regions.

AI development pipeline software supplier Dataloop announced its integration with Nvidia NIM inference microservices. Users can deploy custom NIM microservices right into a pipeline up to 100 times faster with a single click, integrating them smoothly into any AI solution and workflow. Common use cases for this integration include retrieval-augmented generation (RAG), large language model (LLM) fine-tuning, chatbots, reinforcement learning from human feedback (RLHF) workflows, and more.

Data mover Fivetran announced an expanded partnership with Snowflake, including Iceberg Table Support and a commitment to build Native Connectors for Snowflake. Fivetran’s support of Iceberg Tables provides Snowflake customers the ability to create a lakehouse architecture with Apache Iceberg, all within the Snowflake AI Data Cloud. Fivetran will build its connectors using Snowflake’s Snowpark Container Services and they will be publicly available in the Snowflake Marketplace.

Data orchestrator Hammerspace said its Global Data Platform can be used to process, store, and orchestrate data in edge compute environments and the Gryf, a suitcase-sized AI supercomputer co-designed by SourceCode and GigaIO. Gryf + Hammerspace is highly relevant for use cases such as capturing large map sets and other types of geospatial data in tactical edge environments for satellite ground stations and natural disaster response. It is an effective way to transport large amounts of data quickly. 

Hammerspace says it can run on a single Gryf appliance alongside other software packages like Cyber, geospatial, and Kubernetes containerized applications, and other AI analytic packages. Its standards-based parallel file system architecture combines extreme parallel processing speed with the simplicity of NFS, making it ideal for ingesting and processing the large amounts of unstructured data generated by sensors, drones, satellites, cameras, and other devices at the edge. The full benefit of Hammerspace is unlocked when multiple Gryf appliances are deployed across a distributed edge environment so that Hammerspace can join multiple locations together into a single Global Data Platform.

Streaming data lake company Hydrolix has launched a Splunk connector that users can deploy to ingest data into Hydrolix while retaining query tooling in Splunk. “Splunk users love its exceptional tooling and UI. It also has a reputation for its hefty price tag, especially at scale,” said David Sztykman, vice president of product management at Hydrolix. “With the average volume of log data generated by enterprises growing by 500 percent over the past three years, many enterprises were until now faced with a dilemma: they can pay a growing portion of their cloud budget in order to retain data, or they can throw away the data along with the insights it contains. Our Splunk integration eliminates this dilemma. Users can keep their Splunk clusters and continue to use their familiar dashboards and features, while sending their most valuable log data to Hydrolix. It’s simple: ingesting data into Hydrolix and querying it in Splunk. Everybody wins.” Read more about the Hydrolix Splunk implementation and check out the docs.

IBM has provided updated Storage Scale 6000 performance numbers: 310 GBps read bandwidth and 155 GBps write bandwidth. It tells us that Storage Scale abstraction capabilities, powered by Active File Management (AFM), can virtualize one or more storage environments into a single namespace. This system can effectively integrate unstructured data, existing dispersed storage silos, and deliver data when and where it is needed, transparent to the workload. This common namespace spans across isolated data silos in legacy third-party data stores. It provides transparent access to all data regardless of silos with scale-out POSIX access and supports multiple data access protocols such as POSIX (IBM Storage Scale client), NFS, SMB, HDFS, and Object. 

AFM also serves as a caching layer that can deliver higher performance access to data deployed as a high-performance tier on top of less performant storage tiers. AFM can connect to on-prem, cloud, and edge storage deployments. A Storage Scale file system with AFM provides a consistent cache that provides a single source of truth with no stale data copies supporting multiple use cases.  

Enterprise cloud data management supplier Informatica announced Native SQL ELT support for Snowflake Cortex AI Functions, the launch of Enterprise Data Integrator (EDI), and Cloud Data Access Management (CDAM) for Snowflake. These new offerings on the Snowflake AI Data Cloud will enable organizations to develop GenAI applications, streamline data integration, and provide centralized, policy-based access management, simplifying data governance and ensuring control over data usage.

Kioxia Europe announced that its PCIe 5.0 NVMe SSDs have been successfully tested for compatibility and interoperability with Xinnor RAID software and demonstrated up to 25x higher performance in data degraded mode running PostgreSQL than software RAID solutions with the same hardware configuration. This setup is being demonstrated in the Kioxia booth at Computex Taipei.

Micron makes GDDR6 memory that pumps out data at 616 GBps of bandwidth to a GPU. Its GDDR7 product, sampling in this month, delivers 32 Gbps of high-performance memory, and has over 1.5 TBps of system bandwidth, which is up to 60 percent higher than GDDR6, and four independent channels to optimize workloads. GDDR7 also provides a greater than 50 percent power efficiency improvement compared to GDDR6 to improve thermals and battery life, while the new sleep mode reduces standby power by up to 70 percent. 

Wedbush analyst Matt Bryson tells subscribers: “According to DigiTimes, Nvidia’s upcoming Rubin platform (2026) will feature gen6 HBM4, with Micron holding a competitive edge in orders into Rubin due to core features such as capacity and transmission speed (expected to be competitive with South Korean products) and US status/support. Currently, SK Hynix and Samsung are also heavily investing in HBM4.”

Mirantis announced a collaboration with Pure Storage, enabling customers to use Mirantis Kubernetes Engine (MKE) with Pure’s Portworx container data management platform to automate, protect, and unify modern data and applications at enterprise scale – reducing deployment time by up to 50 percent. The combination of Portworx Enterprise and MKE makes it possible for customers to deploy and manage stateful containerized applications, with the option of deploying Portworx Data Services for a fully automated database-as-a-service. MKE runs on bare metal and on-premises private clouds, as well as AWS, Azure, and Google public clouds. With the Portworx integration, containerized applications can be migrated between different MKE clusters, or between different infrastructure providers and the data storage can also move without compromising integrity.

MSP data protector N-able has expanded Cove Data Protection disaster recovery flexibility by introducing Standby Image to VMware ESXi. The Standby Image recovery feature also includes support for Hyper-V and Microsoft Azure, providing MSPs and IT professionals with better Disaster Recovery as a Service (DRaaS) for their end users. Standby Image is Cove’s virtualized backup and disaster recovery (BDR) capability. It works by automatically creating, storing, and maintaining an up-to-date copy of protected data and system state in a bootable virtual machine format with each backup. These images can be stored in Hyper-V, Microsoft Azure, and now also in VMware ESXi.

AI storage supplier PEAK:AIO has launched PEAK:ARCHIVE, a 1.4 PB per 2U all-Solidigm QLC flash, AI storage, archive, and compliance offering, with plans to double capacity in 2025. It integrates with the PEAK:AIO Data Server with automated archive, eliminating the need for additional backup servers or software. The administrator, at the click of a button, can present immutable archived data for review and instant readability without any need to restore. Learn more here.

PEAK:AIO storage offering

William Blair analyst Jason Ader talked to Pure Storage CTO Rob Lee and tells subscribers Lee “highlighted the company’s opportunity to replace mainstream disk storage across hyperscaler datacenters, noting that it has moved into the co-design/co-engineering phase of discussions. …  management spoke to three primary factors that have tipped its discussions with the largest hyperscalers, including the company’s ability to reduce TCO (power, space, cooling, and maintenance) thanks to its higher-performance AFAs, improve the reliability and longevity of hyperscalers’ storage, and decrease power consumption within data centers as power constraints become more evident in the face of AI-related activity and GPU cluster buildouts. Given these technical advantages, Pure confirmed that it still expects a hyperscaler design win by the end of this fiscal year.”

Seagate announced the expansion of Lyve Cloud S3-compatible object storage as a service with a second datacenter in London. It will support growing local customer needs allowing easy access to Lyve Cloud and a portfolio of edge-to-cloud solutions, including secure mass data transfer and cloud import services. Seagate has also broadened its UK Lyve Cloud channel. This is through a new distribution agreement with Climb Channel Solutions as well as a long-term collaboration with Exertis, offering a full range of advanced cloud and on-prem storage offerings.

SnapLogic announced new connectivity and support for Snowflake vector data types, Snowflake Cortex, and Streamlit to help companies modernize their businesses and accelerate the creation of GenAI applications. Customers can leverage SnapLogic’s ability to integrate business critical information into Snowflake’s high-performance cloud-based data warehouse to build and deploy large language model (LLM) applications at scale in hours instead of days.

Justin Borgman, co-founder and CEO at Starburst, tells us: “As the dust starts to settle around the Tabular and Databricks news, a lot of us are left wondering what this means for the future of Apache Iceberg. As someone who has been a part of the data ecosystem for two decades, I think it is important to remember three things:

  • Iceberg is a community-driven project. Top committers span from a range of companies including Apple, AWS, Alibaba and Netflix. This is in stark contrast to Delta Lake, which is effectively an open sourced Databricks project. Iceberg is not Tabular, and Tabular is not Iceberg.
  • The next race will be won at the engine-layer. We see Trino – and Icehouse –as the winner. Iceberg was built at Netflix to be queried by Trino. The largest organizations in the world use Trino and Iceberg together as the bedrock for analytics. There’s a reason some of the biggest Iceberg users like Netflix, Pinterest, and Apple all talked about the Trino Icehouse at the Apache Iceberg Summit just three weeks ago.
  • It will take time. Moving from legacy formats to Apache Iceberg is not an overnight switch and has the potential to become as much of a pipe dream as ‘data centralization’ has been. The community needs platforms that will support their entire data architecture, not just a singular format. We originally founded Starburst 7 years ago to serve this mission when the battle was between Hadoop and Teradata, and the challenges are just as real today.”

Enterprise data manager Syniti announced that Caldic, a global distribution solutions provider for the life and material science markets, will use Syniti’s Knowledge Platform (SKP) to help improve its data quality and build a global master data management (MDM) platform for active data governance. This will allow Caldic to work with clean data, now and in the future.

Synology announced new ActiveProtect appliances, a purpose-built data protection lineup that combines centralized management with a scalable architecture. ActiveProtect centralizes organization-wide data protection policies, tasks, and appliances to offer a unified management and control plane. Comprehensive coverage for endpoints, servers, hypervisors, storage systems, databases, and Microsoft 365 and Google Workspace services dramatically reduce IT blind spots and the necessity of operating multiple data protection solutions. IT teams can quickly deploy ActiveProtect appliances in minutes and create comprehensive data protection plans via global policies using a centralized console. Each ActiveProtect appliance can operate in standalone or cluster-managed modes. Storage capacity can be tiered with Synology NAS/SAN storage solutions, C2 Object Storage, and other ActiveProtect appliances in the cluster. The appliances leverage incremental backups with source-side, global, and cross-site deduplication to ensure fast backups and replication with minimal bandwidth usage. ActiveProtect will be available through Synology distributors and partners later in 2024. Check it out here.

Synology ActiveProtect appliances
Synology ActiveProtect appliances

Veeam announced the introduction of Lenovo TruScale Backup with Veeam, a cloud-like experience on-premises that helps secure workloads regardless of their location and enables customers to scale infrastructure up or down as needed. TruScale Backup with Veeam combines Lenovo ThinkSystem servers and storage, Veeam Backup & Replication, Veeam ONE, and Lenovo TruScale services to provide data protection as a service for a hassle-free on-premises or co-located deployment. TruScale Backup with Veeam is available now.

Veeam announced its Backup & Replication product will be available on Linux and could be available in the first half of 2025.

Archival software supplier Versity announced the GA release of Versity S3 Gateway, an open source S3 translation tool for inline translation between AWS S3 object commands and file-based storage systems. Download a white paper about it here.

Re new Western Digital SSDs and HDD, Wedbush analyst Matt Bryson says: “We have validated that WDC is working with FADU on some new SSD products and would not be surprised if that controller vendor is supporting one or both of the new SSDs. … On the HDD front, we expect the new drives will likely include an 11th platter. We believe WDC has a near- to intermediate- term advantage around areal density (as STX works through the go-to production problems with HAMR that have led management to suggest an initial hyperscale qualification might have to wait until CQ3).”