Storage news ticker – March 15

ETL connector biz Airbyte announces that there are more than 5,000 data connectors, created by users with the Airbyte no-code builder, in active use. Its revenues increased by four times over the past six months. The company has appointed Jonathan Whitney as chief product officer. He has 20 years of experience in technology product development and strategy working previously at AppDynamics (acquired by Cisco), Bladelogic and BMC Software.

AWS announces the S3 Connector for PyTorch now supports saving PyTorch Lightning model checkpoints directly to Amazon S3. Model checkpointing typically requires pausing training jobs so the time needed to save a checkpoint directly impacts end-to-end model training times. PyTorch Lightning is an open source framework that provides a high-level interface for training and checkpointing with PyTorch.

Amazon Elastic File System (EFS) has an increased elastic throughput limit up to 20GB/sec for read operations and 5GB/sec for writes. Amazon FSx for NetApp ONTAP has increased the maximum throughput capacity per file system by 2x (from 36GB/sec to 72GB/sec).

AWS has decided to waive data transfer out to the internet (DTO) charges when you want to move your data outside of AWS – meaning no egress fees in that situation. This waiver on DTO charges also follows the direction set by the European Data Act and is available to all AWS customers around the world and from any AWS Region.

AWS in partnership with InfluxData, has announced Amazon Timestream for InfluxDB, a new managed offering for AWS customers to run InfluxDB within the AWS console but without the overhead that comes with self-managing InfluxDB. AWS and InfluxData are giving developers a simple and logical entry point to build and scale time series workloads in the cloud. AWS recognizes the rapid growth of this category and the need to give developers purpose-built tools to manage time series data with the lowest barrier to entry.

Analyst firm Coldago has produced its Coldago Map 2023 for File Storage segmented into 3 categories: enterprise, high performance and cloud file storage. There are 29 active distinct vendors selected, listed per segment in alphabetic order;

  • Enterprise File Storage (EFS): DDN, Dell, Huawei, IBM, iXsystems, Microsoft, NetApp, Pure Storage, Qumulo, SUSE and Vast Data. Huawei joins this year this category.
  • High Performance File Storage (HPFS): DDN, Dell, Fujitsu, Hammerspace, HPE, Huawei, IBM, NEC, Panasas, Pure Storage, Quantum, Qumulo, Quobyte, ThinkParQ, Vast Data and Weka. Hammerspace is the only new one here.
  • Cloud File Storage (CFS): AWS, Cohesity, CTera Networks, Egnyte, Hammerspace, LucidLink, Microsoft, Nasuni, NetApp, Panzura, Peer Software and Tiger Technology. Five players were removed: Buurst, Juice Data, Morro Data, ObjectiveFS and XenData.

You can buy the report from Coldago for $7,990.

Databricks has announced a partnership and participation in the Series A funding of Mistral AI, a European provider of generative AI offerings. Databricks and Mistral AI now offer Mistral AI’s open models natively integrated within the Databricks Data Intelligence Platform and Databricks customers can access Mistral AI’s models in the Databricks Marketplace, interact with these models in the Mosaic AI Playground, use them as optimized model endpoints through Mosaic AI Model Serving, and customize them on their own data through adaptation.

Mistral 7B is a small, powerful and dense transformer model, trained with 8K context length. It has a relatively small size of 7 billion parameters, and its model architecture leverages grouped query attention (GQA) and sliding window attention (SWA). To learn more about Mistral 7B, check out Mistral’s blog post.

Mixtral 8x7B is a sparse mixture of experts model (SMoE), supporting a context length of 32K, and capable of handling English, French, Italian, German, and Spanish. It outperforms Llama 2 70B on most benchmarks, while boasting 6x faster inference thanks to its SMoE architecture, which activates only 12 billion parameters during inference out of a total of 45 billion trained parameters. To learn more about Mixtral 8x7B, click here.

DataStax has open sourced the Astra Assistants API server, its drop-in replacement for the OpenAI Assistants API. It enables folks who want to deploy on-premises/self-hosted and point to their own self-managed Cassandra/DSE databases or locally hosted LLM inference servers. More info here.

DataStax has announced a partnership with AirByte to simplify the process for building production-ready GenAI applications that use both structured and unstructured data. Developers can use Airbyte Cloud to easily ingest and vectorize data from hundreds of sources directly into Astra DB. This speeds up the process of building RAG applications and frees up developers to focus on creating AI experiences for their end users. DataStax also announced that it has achieved the AWS Generative AI Competency certification.

A Dell data protection blog, Leading the Way Through Data Protection Industry Changes, by product management VP David Noy reflects on the Cohesity-Veritas acquisition news. It reads: “In these uncertain times, it’s critical for our customers to have confidence in their data protection and cyber resiliency infrastructure. They need solutions with a proven track record and a promising future. Dell is committed to helping our customers address their data management challenges, both now and in the future. Our solutions offer the operational simplicity, resilience, efficiency and innovation required to navigate the complexities of the digital era seamlessly. For our valued customers currently leveraging Veritas as their backup solution, Dell extends an invitation to engage in a discussion regarding PowerProtect Data Manager.”

Decentralized storage provider Filebase has introduced a content delivery network. This CDN features multiple Points of Presence (PoPs) in locations across North America, Europe, and Asia. It features geolocation-based load balancing, and automatically directs traffic to the nearest datacenter, minimizing latency and increasing content delivery speed – whether over HTTPS or Bitswap. This CDN’s global network ensures that no matter where users are located, they can access IPFS content quickly and reliably. A Filebase case study can be found here.

Filebase is updating its IPFS pricing structure. The Free plan is not changing and continues to include 5 GB of storage. The Starter plan is being upgraded to include 800 GB of storage and bandwidth, up from 200/400 GB. The cost for extra storage is being lowered from $0.15/GB to $0.08/GB. The monthly price for this plan remains at $20. The Pro plan is is being upgraded to include 4 TB of storage and bandwidth, up from 1000/2000 GB. The cost for extra storage is being lowered from $0.12/GB to $0.04/GB. The number of included dedicated IPFS gateways is also being increased from 3 to 5. The monthly price for this plan remains at $100. More details here.

Effective September 30, 2024, IBM will end of manufacturing and support for all 3592 Generation 3 Type C tape media products, including IBM 3592 Tape Media JC-JK-JY. The discontinued products include standard, economy, and WORM cartridge media supplies for IBM TS1140 tape drives. See IBM’s EoL notice here.

IBM has run Trino TPC-DS benchmarks using Ceph with the Object S3-select feature enabled. S3-select improves the efficient SQL processing of data stored in Object Storage. By pushing the query down to the IBM Storage Ceph cluster, S3-select can enhance performance, processing queries faster and minimising resource costs(Network/CPU). On average, queries run 2.5X faster. In some cases IBM achieved a 9x improvements with a network data processing reduction of 144TB compared to using Trino without the S3-select feature enabled. Combining IBM Storage Ceph (S3-Select) with Trino/Presto can enhance data lake performance, reduce costs, and simplify data access for organizations. Read the blog here.

IDrive has released unlimited storage for Office 365 personal users to backup OneDrive, Exchange, Word, Excel, PowerPoint, and OneNote data all for $20 per set per year.

Infinidat has been named a 2024 Gartner Peer Insights Customers’ Choice in North America for the second consecutive year. As of March 5, Infinidat has received 492 reviews for the InfiniBox and the InfiniBox SSA, with an average overall score of 4.9 stars out of 5 in the Primary Storage Arrays market. Furthermore, 98 percent of customers indicated their willingness to recommend Infinidat to their peers.

 …

Storage exec Edgar Masri
Edgar Masri

Composable systems vendor Liqid has appointed a new CEO, Edgar Masri. The former CEO, co-founder Sumit Puri, becomes president and chief strategy officer, retaining his board position. Masri has been a Liqid board advisor since January, and comes from being a board member at Spirent Communications and CEO at Accton, a Taiwanese networking gear manufacturer. Before that, he was CEO and president of inertial sensor specialist Qualtré Inc. Before that, he was CEO and president of digital electronics manufacturer 3Com. Masri will relocate to the greater Denver area and work on growing Liqid in this AI-driven era. Its composable technology includes GPUs. Liqid says securing marquee partnership opportunities demands ongoing, in-person engagement and hands-on oversight for demanding development projects. 

Microsoft has followed AWS and Google in opting to scrap egress fees for cloud migration to comply with the EU’s Data Act.

Data Center Intelligence Group (DCIG) has named the Panzura CloudFS global file management system a “DCIG TOP 5 Enterprise Multi-site File Collaboration Solution” in a report. The other storage software providers who earned a DCIG TOP 5 award (in alphabetical order) were CTERA Enterprise File Services Platform, Nasuni File Data Platform, NetApp Cloud Volumes Edge Cache, and Qumulo Scale Anywhere Platform.

Pure Storage Cloud Block Store for AVS (Azure VMware Solution) has reached General Availability. It improves TCO by independently scaling storage and compute, and allows customers to move or extend data-intensive VMware-based workloads from on-premises datacenters to Azure. Read a Pure blog here.

Last November, VAST Data raised $118 million in a Series E financing round and will be using the funds raised to launch into the APJ region.

Veeam announced a five-year strategic partnership with Microsoft to develop new AI-enhanced data protection and ransomware recovery offerings for over 18 million existing Microsoft 365 users. They will bring additional AI capabilities to Veeam backup and recovery products, including integrating Microsoft Copilot for automated data analysis, cost-effective insights powered by AI and easier data visualization. The companies will bring to market data protection for Microsoft 365 and Microsoft Azure with the recently announced Veeam Data Cloud.

SW RAID supplier Xinnor has released a white paper titled “Saturating infiniBand bandwidth with xiRAID, to keep NVIDIA DGX busy.” It details Xinnor’s collaboration with German system integrator DELTA Computer Products GMBH to develop a high-performance storage setup tailored explicitly for AI and HPC tasks. The key components include the integration of Micron NVMe SSDs, software RAID from Xinnor, and 400Gbit InfiniBand controllers from Nvidia.With a 2U dual socket server equipped with 24 x 7400 NVMe 15.36 TB Micron SSDs, it offers storage capacity of up to 368 TB and theoretical access speeds of up to 50GBps. Get the white paper via a Xinnor blog.