Home Blog Page 99

Storage news ticker – March 22

Storage news
Storage news

After running its IPO, Astera Labs saw its shares, initially priced at $36/share, soar by 72 percent on March 20, closing at $62.03. This surge in share price valued the company at $10.4 billion, taking into account stock options and restricted share units. Astera will hire another 200 engineers and look at M&As as a way to grow its engineering team.

The Backblaze product portfolio has been added to Carahsoft’s National Association of State Procurement Officials (NASPO) ValuePoint contract. This enables Carahsoft and Backblaze to provide the company’s cloud storage to participating states, local governments, and educational institutions. This certification is augmented by the recent addition of Backblaze to Carahsoft’s NYOGS and OMNIA-approved vendor lists.

Data manager Datadobi launched StorageMAP 6.7. A blog says it has extended REST API extensions:

  • Add or configure file or object servers, streamlining the setup process for large data management projects
  • Dynamically adjust server throttling, allowing for precise control over performance and resource utilization
  • Retrieve real-time status updates of ongoing data management jobs, including critical information such as status and error counts

v6.7 also integrates replication functionality into StorageMAP. It now encompasses both N2N (NAS-to-NAS) and O2O (object-to-object) replication capabilities, consolidating all replication functionalities in a single facility.

Data recovery services supplier Fields Data Recovery has launched a Partnership Program offering:

  • Access to state-of-the-art data recovery technologies and tools
  • Comprehensive training and support to enhance partners’ proficiency in data recovery processes 
  • Flexible partnership models tailored to meet the unique needs and objectives of each partner 
  • Marketing and sales assistance to help partners promote their data recovery services effectively
  • Priority access to Fields Data Recovery’s expert team for technical assistance and consultation

More info here.

China-based SmartX has released its SMTX Kubernetes Service 1.2 (SKS), a production-ready container management and service product. Updates include enhanced support for high-performance computing scenarios such as AI, multiple CPU architectures, and optimized management and use of container images. With these updates, SKS 1.2 helps users apply AI scenarios in containers based on easily created Kubernetes clusters. 

Swissbit says its new PS-66(u) DP SD and microSD cards have proven robustness and industrial suitability, and are certified CmReady by Wibu-Systems. This enables the integration of CodeMeter technology for software protection and license management. The use of the Swissbit cards as additional license containers within the CodeMeter ecosystem offers a way for developers of embedded and IoT devices to protect their software and monetize their licenses.

Taiwan-based TeamGroup has launched the MagSafe-compatible PD20M Mag Portable SSD, and the ULTRA CR-I MicroSD Memory Card Reader. The PD20M Mag External SSD is weighs 40g and measures 70 x 62 x 8.2 mm. It stores up to 2 TB, has a USB 3.2 Gen 2 x 2 Type-C interface, is compatible with the iPhone 15 Pro, and supports Thunderbolt interface ports with a maximum transfer speed of 20 Gbps. The ULTRA CR-I MicroSD Memory Card Reader supports UHS-I interface slots and USB Type C. The maximum data transfer speed is 180 MBps when used with TEAMGROUP MicroSD memory cards. It is meant for use with smartphones or other mobile devices.

Veritas announced enhancements to Backup Exec, which has more than 45,000 small and midsize business (SMBs) users worldwide. The updates include:

  • Malware detection – powered by Microsoft Defender, can be used to scan both VMware and Hyper-V backup sets at any time or prior to recovery
  • Role-based security – limits access to data based on a user’s specific role. In the event an account is compromised, hackers can only corrupt the small volume of data associated with that specific user’s account
  • Faster backup and recovery – optimizes protection performance with forever incremental backup of VMware and Hyper-V. Protection of virtual machines is now faster with parallel backup of multiple virtual disks and the included ability to recover virtual machines instantly

Backup Exec is available as a subscription service and can be installed in ten minutes or less, we’re told. Once installed, completing the first backup takes only five minutes. Find a Veritas-certified partner or sign up for the Veritas Backup Exec free 60-day trial.

UK disaster recovery specialist virtualDCS has launched CloudCover Guardian for Azure, which protects more than 250 configurable items in an established Microsoft 365 estate. Configurations covered by CloudCover Guardian for Azure span user accounts, access privileges, and unique security groups across popular applications such as SharePoint, Teams, OneDrive, Exchange, and Entra ID (formerly Azure Active Directory). Microsoft does not currently offer a backup service for these configurations. The system captures every vital configuration, meaning a Microsoft 365 environment can be proactively managed from a single console for a complete, clean recovery after an incident.

The company has also launched a “Clean Room” service for customers that need to restore their systems in a sterile and isolated environment in the event of a ransomware attack.

GenAI represents a Fourth Industrial Revolution, says Wedbush financial analyst Daniel Ives. He told subscribers: “With a March Madness-like atmosphere at the Nvidia GTC Conference with the Godfather of AI Jensen leading the way, its crystal clear to us and the Street that the use cases for enterprise generative AI are exploding across the landscape. The partnerships and spending wave around AI represents a generational tech spending wave with now Micron the latest example of beneficiaries from the 2nd/3rd/4th derivatives of AI just like we saw from Oracle a week ago. The ripple impact that starts with the golden Nvidia chips is now a tidal wave of spending hitting the rest of the tech world for the coming years. We estimate a $1 trillion-plus of AI spending will take place over the next decade as the enterprise and consumer use cases proliferate globally in this Fourth Industrial Revolution.”

The GenAI hype is getting relentless. Is it a dotcom-type boom and bust, though? Ives asserts this is “NOT a 1999 Bubble Moment.” 

Redis expands data management capabilities with Speedb acquisition

Real-time in-memory database vendor Redis is acquiring Speedb and its data storage engine.

A storage engine is used to write data to and read data from storage drives and is written in low-level code. Speedb, a 2020 startup, has developed an API-compatible plug-in replacement for the RocksDB storage engine to accelerate RocksDB-using applications. Redis is a distributed NoSQL database, keeping data on storage drives for persistence, and promoting it to DRAM when needed. It can be used as a database, including vector functions, cache, RAG, streaming engine, or message broker. Redis Enterprise Cloud, a real-time managed database-as-a-service, is offered on AWS.

Rowan Trollope, Redis
Rowan Trollope

Redis CEO Rowan Trollope said: “Modern app development and most recently generative AI are fundamentally changing how enterprises value and use their data, and as a result, the demands on developers and data platforms are intensifying. Whether it’s regarding development cycles or user experiences, a lot of this pressure revolves around speed.

“Acquiring Speedb takes Redis beyond RAM, and opens the door for us to serve an integral role in powering today’s most exciting and innovative applications, and supporting the developers that build them.”

The Speedb storage engine fits right into the Redis software stack, and can be used to accelerate its storage drive IO. In fact, Speedb was integrated as the default storage engine in the enterprise auto-tiering functionality launched in Redis 7.2. The GenAI angle is pertinent as Redis has developed its low-latency vector database for AI workloads such as semantic search, LLM semantic caching, and session management. Redis has developed a Redis Vector Library (RedisVL), a streamlined client that enables the use of Redis in AI-driven tasks.

Adi Gelvan, Speedb
Adi Gelvan in the foreground

Speedb co-founder and CEO Adi Gelvan said: “As we join the Redis team, we’re excited to scale our technology to thousands of organizations across the globe, and play a critical role in powering the emerging applications that will shape the future.”

A Redis statement says: “Taking full advantage of the advancements in SSD storage and transfer rates that Speedb leverages, Redis will serve the full spectrum of performance and cost requirements for enterprise applications where DRAM is not required – empowering developers across the Redis ecosystem to utilize Redis in more use cases.”

The acquisition cost has not been revealed but is unlikely to exceed seven figures. Speedb has taken in a relatively low $4 million in funding across a November 2020 seed round and a non-equity assistance raise in November 2023.

In the immediate future, Redis is developing Redis Community Edition 7.4, with the addition of hash field expiration and client consolidation features. It will formally release the first version of its Speedb-integrated product later this year with Redis 8.

Quantum announces i7 Raptor tape library for hyperscalers

Scalar tape library supplying Quantum has made an early announcement that an i7 Raptor library is coming, designed for hyperscale customers’ AI workloads.

Update: Quantum clarifies i7 Raptor status versus i6H. 25 March 2024.

Its present Scalar line has three products: 25–400 tape slot i3 with 1–24 half-height drives, 50–800 slot i6 with 1–24 full-height drives, and high-end i6000 with 100–14,100 slots and 1–192 full-height drives. There is also a scale-out i6H product, dating from 2022, which is for web-scale companies – hyperscalers in other words – and based on a Redundant Array of Independent Libraries (RAIL) design providing RAID-like protection and performance. The drive numbers and tape slot counts for the i6H have not been revealed, but we think its constituent 48RU-high racks each have 24 drives and 800 tape slots with in-rack robotics. There are many similarities between the i6H and i7.

A canned quote from Bruno Hald, VP of secondary storage at Quantum, states: “Large enterprises need a low cost, highly secure archival storage system that acts as the backbone of private and hybrid clouds, and creates data lakes to fuel AI models and initiatives. As the pioneers in tape technology, we used our expertise and input from our major hyperscale customers to develop the most advanced tape solution built for these emerging use cases.”

Quantum claims the i7 RAPTOR – it capitalizes the name – delivers the highest storage density of any tape library on the market, and offers unique anti-ransomware features like Tape Blocking, which prevents a library robot from moving tape cartridges from their slots via software commands.

The i6H also supported this logical tape block feature.

The i7 features AI-driven and automated predictive analytics that can monitor drive-media interactions, predict robotic failures, and gather system data to learn and help analyze and improve performance. 

The i6H also featured predictive analytics, but now it’s 2024 so it’s mandatory to use the term “AI”.

Quantum has an ActiveScale object storage system and the i7 integrates with this, offering an S3 interface, enabling integration into S3-compatible AI work streams.

The i6H also integrated with the ActiveSale repository.

The March 2022 i6H webpage URL now takes you to an i7 Raptor webpage.

Quantum claims that, because of the i7’s high storage density, fewer libraries are needed to achieve maximum storage capacity, saving on datacenter floorspace.  We’re also told that the product has the industry’s lowest power consumption and employs sustainable materials and processes throughout its lifecycle, from manufacture to delivery to operation to maintenance to disposition. It’s a good green story.

There are so many i6H and i7 similarities that we think the i7 may be a productized i6H, and have asked Quantum if this is the case – also asking for drive type and number data and the tape slot minimum to maximum range. A Quantum spokesperson told us: “The Scalar i7 is a completely new library with a new design from the i6H, and it’s also much denser.”

Quantum picture of i6H library.

We asked how many tapes it supports and the answer was: “Quantum will release more details on slot numbers when the product is generally available but anticipate it will be the densest library on the market.”

How many bulk load slots (I/E) does it support? “It will support bulk import/export of tape cartridges, and the numbers will be available closer to GA.”

What is its Capacity on Demand (CoD)? “It will have CoD, and Quantum expects it to be in 100 slot increments.”

What number and type of tape drives does it support? “LTO-9 FH drives at launch. LTO-10 will be added when it becomes available.

Please share an image of the product. “We will have product images to share later this year.”

The i7 Raptor is available as either an opex model – with monthly, usage-based payment schedules – or an as-a-service model, which provides more flexibility and scalability. It will be generally available early in 2025, with units going into customer testing and certification by late this year.

Comment

Quantum has announced an intention to ship a tape library product in ten to twelve months time, but tis is not an actualproduct announcement. The i7 Raptor has an as yet undefined number of tape slots, tape drives and bulk load cartridge numbers and there is no product image available.

Database startup Regatta emerges from stealth mode

Chad Sakac and fellow execs at Regatta are getting ready to discuss the OLTP and OLAP (Online Transaction Processing and Online Analytical Processing) database for developers that, we undertsand, is scheduled to be released later this year.

Also known as Captain Canada after a famous costumed zipline descent into an astounded 2017 Dell EMC World conference hall, Sakac previously served as Dell’s VP for VMware Tanzu in and among the many senior roles he held at the company. He left in March 2021 to take a sabbatical and by October that year he had moved to an unnamed start-up which was still locked in stealth mode.

That fledgling business was Regatta where Sakac became Vice President for Go To Market. The company was founded in 2019 by Boaz Palgi, CTO Erez Webman, and VP R&D Eran Borovik. The three have been involved in Topio (acquired by NetApp), Xtremio (acquired by EMC), Storwize (acquired by IBM), and ScaleIO (acquired by EMC). 

There is some information available about its funding, with Pitchbook mentioning a 2020 a $12.5 million A-round and a 2022 $45.8 million B-round. There are some 50 engineers developing the product. OLTP is generally used for real-time processing of online transactions, with OLAP used for multi-dimensional, multi-dataset analysis. OLAP databases may get their data from OLTP databases via Extract, Transform and Load (ETL) procedures which can be complex and take time.

Chad Sakac, Regatta
Chad “Captain Canada” Sakac soars down to the stage at the 2017 Dell EMC World Event in Las Vegas

Regatta is ACID (Atomicity, Consistency, Isolation, Durability) compliant, meaning database operations are completed properly or do not complete at all, ensuring the database is in a consistent state. In other words, a bank’s OLTP database will ensure that money has either been paid out from your account or has not. There is no in-between state.

Regatta is pitched as being linearly scalable, running across a cluster of nodes either on-premises or in the public cloud, and analytic queries execute against up-to-date transactional data. OLAP queries do not interfere with OLTP operations. The database can, Regatta says, be dropped in as a Postgres replacement, having Object-Relational Mapping (ORM), Client Libraries, and ANSI SQL support. It is a high-availability product, we’re told, with auto-healing and geographical multi-site disaster recovery.

Chad Sakac, Regatta
Chad Sakac

Sakac blogged about Regatta’s database in October last year, saying developers will like it because it won’t become complex and messy like some existing databases, urging them to try it out.

In a separate blog post, he wrote: “Regatta can store multiple types of data as well as perform transactional, analytical, and high-ingress workloads – and it can do them simultaneously. That eliminates the operational headaches of ETL, as well as the complexities of managing silos of data.”

Regatta is running a webcast on March 28 to introduce itself with Palgi, Webman, and Sakac speaking. We understand that Regatta will announce availability of its product later this year.

Micron roars back to profitability thanks to AI boom

Women sprint hurdlers

Flash memory and storage producer Micron topped its own financial guidance for Q2 of fiscal ’25 ended February 29, on the back of ever growing demand for IT infrastructure that trains and powers artificial intelligence.

Revenues of $5.8 billion were 57.6 percent higher than a year ago and Micron brought home $793 million in net profit versus a $2.3 billion net loss in the corresponding quarter of the prior financial year.

Sanjay Mehrotra, president and CEO, said in statement: “Micron delivered fiscal Q2 revenue, gross margin and EPS (earnings per share) well above the high end of guidance. Micron has returned to profitability and delivered positive operating margin a quarter ahead of expectation.”

Micron revenue and profit
Micron is climbing out of its revenue trough faster than it expected – thanks to the Gen AI boom.

Wedbush financial analyst Matt Bryson commented: “We believe Micron earnings commentary ticked every box in the investor checklist (and then some).”

Mehrotra thinks Micron is one of the biggest beneficiaries in the semiconductor industry of the multi-year opportunity enabled by AI. This is due to memory and NAND demand for AI training and also inference both in the datacenter and at the edge.

In Micron’s results presentation, server demand is said to be driving rapid growth in HBM, DDR5, and datacenter SSDs, which is tightening leading-edge supply availability for DRAM and NAND. This is resulting in a positive ripple effect on pricing across all memory and storage end markets.

The price rises are unlikely to slow down or stop during Micron’s financial year, it indicated. Mehrotra said in prepared remarks: “Our current profitability levels are still well below our long-term targets, and significantly improved profitability is required to support the R&D and capex investments needed for long-term innovation and supply growth.“

Financial summary

  • Gross margin: 18.5 percent vs -0.7 percent last year 
  • Operating cash flow: $1.22 billion vs year-ago $343 million
  • Free cash flow: -$29 million
  • Cash, marketable investments & restricted cash: $9.72 billion vs $12.1 billion last year
  • Diluted EPS: $0.71 vs -$1.12 last year
  • Dividend: $0.115/share

DRAM pulled in $42 billion, up 53.8 percent annually, while NAND revenues of $1.6 billion grew even faster at 80.7 percent. Mehrotra said: “We see strong customer pull and expect a robust volume ramp for our mono-die 128 GB product, with several hundred million dollars of revenue in the second half of fiscal 2024. We also started sampling our 256 GB MCRDIMM module, which further enhances performance and increases DRAM content per server.”

Micron DRAM and NAND revenues

He discussed HBM3E memory, used in Nvidia GPUs, saying: “Industry-wide, HBM3E consumes approximately three times the wafer supply as [DDR5] to produce a given number of bits in the same technology node; a 3:1 trade ratio. With increased performance and packaging complexity across the industry, Micron expects the trade ratio for HBM4 to be even higher than the trade ratio for HBM3E.”

That will limit the fab capacity available for DRAM and, we reckon, have an upward effect on prices. Micron is planning to increase its manufacturing capacity overall. Announced projects in China, India, and Japan are proceeding as planned and potential US expansion schemes assume CHIPS Act grants for Idaho and New York fabs in its in its capex plans for FY24.

Micron commenced volume production and recognized its first revenue from HBM3E in fiscal Q2 and has now begun high-volume shipments of its HBM3E product. Micron expects 12-high HBM3E will start ramping in high volume production and increase in mix throughout 2025.

End markets

  • Compute and Networking: $2.1285 billion, up 59 percent year-over-year
  • Mobile: $1.598 billion, up 69 percent
  • Embedded: $1.111 billion, up 28 percent
  • Storage: $905 million, up 79 percent
DRAM and NAND demand has sent the compute and networking BU’s revenues up sharply, with the mobile and storage BUs benefitting as well. The embedded sector, less affected by AI, is growing more slowly.

The PC market is starting to grow slowly, in the low single digits, after two years of double-digit declines. Mehrotra said momentum was developing for “next-generation AI PCs, which feature high-performance Neural Processing Unit chipsets and 40 to 80 percent more DRAM content versus today’s average PCs. We expect next-gen AI PC units to grow and become a meaningful portion of total PC units in calendar 2025.” 

Micron said smartphone unit volumes in calendar 2024 remain on track to grow at a low to mid single digit rate. It expects so-called AI phones to carry 50 to 100 percent more DRAM compared to non-AI flagship phones today. Mehrotra said: “The Honor Magic 6 Pro features the Magic LM, a seven-billion parameter large language model, which can intelligently understand a user’s intent based on language, image, eye movement and gestures, and proactively offer services to enhance and simplify the user experience.” 

B&F expects the other DRAM and NAND fabbers and SSD suppliers to report similar revenue uplifts in the coming weeks.

Outlook

Micron expects DRAM and NAND pricing levels to increase further throughout calendar 2024 and is guiding for record revenue and much improved profitability in fiscal 2025. It says we are in the very early innings of a multi-year growth phase driven by AI as this technology ay well transform every aspect of the business world. On that basis, its outlook for the next quarter is for revenues of $6.6 billion, give or take $200 million, a 76 percent increase on the year-ago Q3.

Samsung to offer SSDs on subscription

Samsung office
Samsung office

Samsung has published a blog discussing customers renting its petabyte-scale SSD (PBSSD) architecture products through a MinIO partnership.

It’s not that Samsung actually has a petabyte-capacity SSD – more that it wants to overcome customer resistance to capex purchases of large amounts of SSD capacity by changing to an opex subscription-style approach.

The blog notes that “applications today are demanding both high performance and capacity in excess of 10PB” due to analytics and AI/ML training workloads. Analytics workloads, it claims, can need from 1PB to 1EB of flash storage.

Samsung discusses a PBSSD server design – based on gen 4 AMD EPYC CPUs with 32-84 cores, a single socket, and 128 PCIe 5 lanes. It can support 16x E3.s 15.36 TB NVMe SSDs in a 1 RU chassis – 244TB in total. What’s more: “An upcoming system design will support 1PB in a 2 RU chassis.”

The system provides 232GB/sec sequential read bandwidth and 98GB/sec sequential write bandwidth, 9.5 million random read IOPS, and 5.1 million random write IOPS. Samsung proposes to offer this on a subscription basis: “Customers place an order for the desired storage capacity in increments of 244TB (one PBSSD unit) and subscription duration (1, 3, or 5 years) at a fixed monthly fee. No hidden or usage-based costs.”

Samsung notes that the MinIO partnership means MinIO object storage software holds the data on the SSDs. “MinIO uses familiar HTTP GET and PUT calls to initiate object transfer and management sequence. This cloud native approach is far more efficient than using half-century old file-oriented commands like copy, move, and delete. Moreover, MinIO bakes content, metadata, version, and security into the recipe for handling objects, simplifying the task of maintaining data integrity and recoverability across servers anywhere.”

According to the blog, MinIO and Samsung are creating “a seamless storage fabric for the most demanding workloads. Sixteen directly connected PCIe 5 lanes, compute power up to 84 cores, and sophisticated power reduction techniques result in breathtaking performance with sustainability. MinIO operations, such as maintaining data integrity and rebuilding after a data loss event, take place with virtually no performance impact on active requests.”

However, Samsung is not locked into MinIO. “When ordering products, customers can take advantage of Samsung partnerships with software-defined storage providers to deliver pre-certified object and file solutions like MinIO, Weka, and vSAN. Customers are also free to engage separately with the SDS software provider of their choosing, or even to use open source software like Ceph.”

Samsung and its PBSSD concept can be viewed at the exhibition area of Nvidia’s GTC event at booth #528.

Storage news ticker – March 21

An AWS blog explains how you can use InfluxDB as a database engine in Amazon Timestream. This makes it easier to run near-real-time time-series applications using InfluxDB and open source APIs, including open source Telegraf agents that collect time-series observations.

PCIe retimer and CXL device startup Astera Labs set the pricing of its IPO of 19.8 million shares at $36 apiece – aiming to raise around $712.8 million, with gross proceeds expected to be around $604.4 million. That would give it a valuation of about $5.5 billion. The shares are expected to begin trading on Nasdaq Global Select Market under the the ticker symbol “ALAB” on March 20.

CloudCasa by Catalogic announced the newest version of its CloudCasa software. It adds new migration and replication workflows to simplify Kubernetes use cases such as migrating on-premises clusters to cloud, migrating cloud to cloud, replicating production environments for test/dev and disaster recovery, and migrating locally between various Kubernetes configurations. It has new cloud integration and manageability features, extending and improving the backup, restore, and disaster recover capabilities of CloudCasa, as well as its ability to centrally manage Velero installations in large and complex environments. 

IBM’s latest Storage Scale System exceeds 340GB/sec sequential read bandwidth and 160GB/sec write bandwidth with current code and firmware from just 4U of rack space. There are 16x 200GbitE links in total. Client nodes use RoCE and each has a 200GbitE link. Check out a LinkedIn post here.

An update to IBM Power VS (Virtual Server) provides DR and Backup-as-a-Service as well as SAP installation, cloud security and compliance, and automated migration for IBM AIX and i. Details in a blog. (IBM Power Virtual Server is a family of configurable multi-tenant virtual IBM Power servers with access to IBM Cloud services.)

Cloud backup supplier Keepit announced the results available to organizations leveraging Keepit SaaS data protection. It gives customers up to 90 percent faster targeted restore time following a ransomware attack. And it limits the impact of a ransomware attack for the composite organization the study has based its calculations on “by allowing it to recover and restore data quickly, preventing data loss and reducing downtime. This benefit is worth $819,100.”

MSP backup supplier N-able announced Cove’s “Master of Disaster Recovery” Class – a free online course aimed at supporting disaster preparedness for MSPs worldwide. The class leverages Cove’s Disaster Recovery as a Service (DRaaS) capabilities to impart essential knowledge and skills to the MSP community in the event of a data breach. The classes will be conducted via GoToWebinar every two weeks on an ongoing basis. Upcoming 2024 sessions include April 2. 

HCI and hybrid multi-cloud vendor Nutanix has released the findings of its sixth annual Enterprise Cloud Index (ECI) survey and research report, which measures global enterprise progress with cloud adoption. The use of hybrid multi-cloud models is forecast to double over the next one to three years. Get more details here.

Veeam-focused object storage backup target supplier Object First is having its Ootbi products sold by Pedab in 11 Northern European countries.

German open source vector database startup Qdrant says its Rust-based database is being used by Elon Musk’s open release of the Grok AI model. There have been more than five million downloads of the Qdrant database.

The Turing Trust, the technology recycling and education charity founded by the family of Alan Turing, has received a donation of 100 hard drives from Seagate Technology. The hard drives will be installed in computers destined for Malawi to increase access to digital skills by students in primary and secondary schools across the country.

Ceph-based array manufacturer SoftIron announced VM Squared – a virtualization software as an alternative to VMware’s vSphere product suite. It installs in 30 minutes or less and there is a VM Squared migration tool that migrates an entire VMware vSphere estate quickly and easily. More details here.

Broadcom’s VMware unit has announced the GA of VMware Live Recovery, providing cyber and data resiliency for VMware Cloud Foundation environments. It combines enterprise-grade disaster recovery with purpose-built cyber recovery with a unified management experience across clouds. There are two underlying technologies: VMware Live Cyber Recovery (formerly VMware Cloud Disaster Recovery/Ransomware Recovery) and VMware Live Site Recovery (formerly VMware Site Recovery Manager). Check out a VMware web page to learn more.

Cloud storage provider Wasabi and Network-as-a-Service supplier Console Connect are collaborating to support seamless on-premises-to-cloud migration and cloud-to-cloud migration as well as instant multi-cloud provisioning.

CXL a no-go for AI training

Analysis. Computer Express Link (CXL) technology has been pushed into the backseat by the Nvidia GTC AI circus, yet Nvidia’s GPUs are costly and limited in supply. Increasing their memory capacity to enable them to do more work would seem a good idea, so why isn’t CXL – and its memory pooling – front and center in the Nvidia GPU scramble?

CXL connects pools of DRAM across the PCIe bus. There are three main variants:

  • CXL 1 provides memory expansion, letting x86 servers access memory on PCIe-linked accelerator devices such as smartNICs and DPUs;
  • CXL 2 provides memory pooling between several servers hosts and a CXL-attached device with memory;
  • CXL 3 provides memory sharing between servers and CXL devices using CXL switches.

All three have a coherent caching mechanism, meaning that the local CPU level 1 and instruction caches, containing a subset of what is in memory, have uniform content. CXLs 1 and 2 are based on the PCIe 5 bus with CXL 3 using the PCIe 6 bus. Access to external memory via CXL adds latency.

All the memory that is accessed, shared or pooled in a CXL system needs a CXL access method, meaning PCIe 5 or 6 bus access and CXL protocol support. The DRAM in x86 servers and the GDDR memory in GPUs is suitable. However, high-bandwidth memory (HBM) integrated with GPUs via an interposer in Nvidia’s universe is not suitable, as it has no PCIe interface.

AMD’s Instinct M1300A accelerated processing unit (APU), with combined CPU and GPU cores and a shared memory space, has a CXL 2 interface. Nvidia’s Grace Hopper superchip, with Armv9 Grace CPU and Hopper GPUs, has a split memory space.

SemiAnalysis analyst Dylan Patel writes about CXL and GPUs in his subscription newsletter. He observes that Nvidia’s H100 GPU chip supports NVLink, C2C (to link to the Grace CPU) and PCIe interconnect formats. But the PCIe interconnect scope is limited. There are just 16 PCIe 5 lanes which run overall at 64GB/sec, whereas NVlink and C2C both run at 450GB/sec – seven times faster. Patel notes that the I/O part of Nvidia’s GPUs is space-limited and Nvidia prefers bandwidth over standard interconnects – such as PCIe. 

Therefore the PCIe area on the chip will not grow in future, and may shrink.

There’s much more detail in Patel’s newsletter but it’s behind a subscription paywall.

The takeaway is that there will be no CXL access to an Nvidia GPU’s high-bandwidth memory. However, x86 CPUs don’t use NVLink and having extra memory in x86 servers means memory-bound jobs can run faster – even with added latency for external memory access.

It then follows that CXL will not feature in AI training workloads when they run on GPU systems fitted with HBM – but it may have a role in datacenter x86/GDDR-GPU servers running AI tuning and inference workloads. We also may not see CXL having a role in edge systems, as these will be simpler in design than datacenter systems and require less memory overall.

Storage news ticker – March 20

Data streamer Confluent announced that its Flink service is now available on AWS, Google Cloud, and Microsoft Azure. The cloud-native service enables reliable, serverless stream processing, which allows customers like Airbnb, Uber, and Netflix to gain timely insights from live data streams. This helps them to offer consumers real-time services – from personalized recommendations to dynamic pricing. It also announced the release of Tableflow, software that unites analytics and operations with data streaming in a single click to feed data warehouses, data lakes, and analytics engines.

NoSQL cloud database supplier Couchbase announced financial results for its fourth quarter and fiscal year ended January 31. Total revenue for the quarter was $50.1 million – an increase of 20 percent year-over-year, with a net loss of $21.4 million. Subscription revenue for the quarter was $48.1 million – an increase of 26 percent year-over-year. Total revenue for the year was $180 million – an increase of 16 percent year-over-year, with a loss of $80.2 million. Subscription revenue for the year was $171.6 million, an increase of 20 percent year-over-year. It expects next quarter’s revenue to be $48.1-$48.9 million.

Decentralized storage provider Cubbit announced general availability of its DS3 Composer, which is used to build virtual private S3-compatible public clouds. It uses DS3 technology – a multi-tenant, S3-compatible object store developed by Cubbit. DS3 Composer collects and aggregates new and recycled resources across the edge, on-prem, and public cloud – exposing them as an S3 object store repository via a SaaS control plane. Resources are organized in geo-distributed networks, and each network node can provide access and capacity via the S3 protocol. Cubbit has technology partnerships with HPE and with Equinix.

Mainframe supplier Fujitsu and Amazon Web Services (AWS) announced an expanded partnership to provide assessment, migration, and modernization of legacy mission critical applications running on on-premises mainframes and Unix servers onto the AWS Cloud.

NAKIVO’s agent-based backup support for Proxmox virtual machine data is now available.

Alessandra Yockelson of storage company NetApp
Alessandra Yockelson

NetApp has hired a chief human resources officer – Alessandra Yockelson, with over 25 years of experience – to drive a new chapter of NetApp’s culture and growth globally. She comes from a similar role at Pure Storage, where she led organizational change efforts that resulted in global performance scaling. Before that she was a chief talent officer at HPE. Normally we’d pass over such an appointment and praise, but NetApp describes her as a technology industry titan. What does that mean? We think NetApp is in for enhanced diversity in its hiring practices as it seeks to broaden the talent pool for managers, directors, and execs.

Paul Hiemstra of storage company Panasas
Paul Hiemstra

Panasas has hired a new CFO, Paul Hiemstra, declaring his role will be instrumental in shaping strategic direction for the executive team and board, ultimately driving profitability, growth, and value creation. Ken Claffey became Panasas CEO in September last year and the biz has been searching for a CFO for three months. Elliot Carpenter was Panasas CFO from March 2016 to August 2020.

Hiemstra’s background includes 11 years at Cray, where he ascended from corporate treasurer to head of investor solutions. In this latter position, he played a pivotal role in the 2019 integration of Cray and HPE, serving as CFO of HPE’s HPC and AI divisions. 

Earlier this month, Pure Storage announced self-service capabilities across its Pure1 storage management platform and Evergreen portfolio. More than 30 percent of Pure’s customer base uses ActiveCluster. Upgrades to the Purity operating environment for ActiveCluster required time-consuming manual effort, as customers jumped from one upgrade to another in an effort to keep servers in sync. With Autonomous Upgrades, customers can simply invoke a Purity upgrade, leaving the heavy lifting to Pure Storage via its Pure1 platform, and freeing up the time previously spent managing the process themselves. 

Also, in the event of a ransomware anomaly, Pure1 now recommends snapshots from which customers can restore their affected data (both locally and remote), eliminating cumbersome, manual snapshot catalog reviews, which can take anywhere from hours to days. 

Computational storage supplier ScaleFlux says it will integrate the Arm Cortex-R82 processor in its forthcoming line of enterprise Solid State Drive (SSD) controllers, following the newly announced SFX 5016. It says this is a strategic move to leverage the processor’s high performance and energy efficiency. The Cortex-R82, is the highest performance real-time processor from Arm and the first to implement the 64-bit Armv8-R AArch64 architecture, representing, ScaleFlux claims, a significant advancement in processing power and efficiency for enterprise storage systems. Perhaps ScaleFlux computational storage drives will do more than compression in the future.

SK hynix has begun volume production of HBM3E, the newest AI memory product with ultra-high performance, for supply to a customer (Nvidia) from late March.

Software RAID supplier Xinnor has a white paper titled “High Performance Storage Solution for PostgreSQL Database in Virtual Environment, Boosted by xiRAID Engine and KIOXIA PCIe5 Drives.” The paper presents detailed benchmarking results comparing the performance of different storage configurations, including vHOST Kernel Target with Mdadm and SPDK vhost-blk target protected by Xinnor’s xiRAID Opus (Optimized Performance in User Space). It says xiRAID provides 30–40 percent better transaction per second than Mdadm in select-only benchmarks, outperforms Mdadm by over 20 times in degraded mode, ensuring high performance even in the event of drive failures. 

Also xiRAID provides 30–40 percent better transaction per second than Mdadm in select-only benchmarks and demonstrates superior performance in write operations, outpacing Mdadm by six times in small block writes and five times in TPC-B-like script benchmarks. The scalability of xiRAID on virtual machines allows for the consolidation of servers, resulting in significant cost savings and simplified storage infrastructure. Download the white paper here.

Open source vector database supplier Zilliz says its Milvus 2.4 release has a groundbreaking GPU indexing feature powered by Nvidia’s CUDA-Accelerated Graph Index for Vector Retrieval (CAGRA). It claims GPU indexing represents a significant milestone in vector database technology, propelling Milvus 2.4 further ahead of traditional CPU-based indexes like HNSW. Leveraging the power of GPU acceleration, Milvus 2.4 delivers remarkable performance gains – particularly under large datasets, ensuring lightning-fast search responses and unparalleled efficiency for developers.

Milvus 2.4 also introduces support for GPU-based brute force search, further enhancing recall performance without sacrificing speed. Milvus 2.4 is now available for download. Explore the latest features and enhancements on the Milvus website.

HPE pushes packet of next-gen AI products in partnership with Nvidia

HPE is the latest vendor to roll out a portfolio of GenAI training and inference products amid plans to use Nvidia GPUs and microservices software announced at Nvidia’s GTC 2024 event this week.

The edge-to-datacenter, hybrid on-premises, and public cloud approach is being brought to the GenAI table by HPE, along with its Cray-based supercomputing capabilities, enterprise ProLiant servers, Aruba networking,  Ezmeral data fabric, and GreenLake for file storage. Where competitor Dell is stronger in storage, HPE is stronger in supercomputing and edge networking. The two are roughly equal in server tech and HPE is arguably further advanced in cloud computing with its GreenLake scheme.

HPE CEO and president Antonio Neri said: “From training and tuning models on-premises, in a colocation facility or the public cloud, to inferencing at the edge, AI is a hybrid cloud workload. HPE and Nvidia have a long history of collaborative innovation, and we will continue to deliver co-designed AI software and hardware solutions that help our customers accelerate the development and deployment of GenAI from concept into production.”

HPE is announcing: 

  • Availability of GenAI supercomputing systems with Nvidia components
  • Availability of GenAI enterprise computing systems with Nvidia components
  • Enterprise retrieval-augmented generation (RAG) reference architecture using Nvidia’s NeMo microservices
  • Preview of Machine Learning Inference Software using Nvidia’s NIM microservices
  • Planned future products based on Nvidia’s Blackwell platform

The supercomputing system was announced at SC23 as a turnkey and pre-configured system featuring liquid-cooled Cray AMD EPYC-powered EX2500 supercomputers, with EX254n blades, each carrying eight Nvidia GH200 Grace Hopper chips. It includes Nvidia’s AI Enterprise software and the system can scale to thousands of GH200s. A solution brief doc has more information.

HPE Cray supercomputer
HPE Cray supercomputer

The turnkey version is a limited configuration supporting up to 168 GH200s and is meant for GenAI training. The obvious comparison is with Nvidia’s SuperPOD and the DGX GH200 version of that supports up to 256 GH200s. Dell has no equivalent to the Cray supercomputer in its compute arsenal and is a full-bodied SuperPOD supporter.

HPE’s enterprise GenAI system was previewed at HPE’s Discover Barcelona 2023 event in December and is focused on AI model tuning and inference. It’s rack-scale and pre-configured, being built around 16 x ProLiant DL380a x86 servers,  64 x Nvidia L40S GPUs, BlueField-3 DPUs, and Nvidia’s Spectrum-X Ethernet networking. The software includes HPE’s machine learning and analytics software, Nvidia AI Enterprise 5.0 software with new NIM microservices for optimized inference of GenAI models, NeMo Retriever microservices, and other data science and AI libraries.

It’s been sized to fine-tune a 70 billion-parameter Llama 2 model. A 16-node system will run the model in six minutes, we’re told.

The HPE Machine Learning Inference Software is in preview and enables customers to deploy machine learning models at scale. It will integrate with Nvidia’s NIM microservices to deliver foundation models using pre-built containers optimized for Nvidia’s environment.

The enterprise RAG reference architecture, geared to bringing a customer’s proprietary digital information into the GenAI fold, consists of Nvidia’s NeMo Retriever microservices, HPE’s Ezmeral data fabric software, and GreenLake for File Storage (Alletra MP storage hardware twinned with VAST Data software).

This ref architecture is available now and will, HPE says, offer businesses a blueprint to create customized chatbots, generators, or copilots.

Nvidia has announced its Blackwell architecture GPUs and HPE will support this with future products.

Comment

What’s absent from HPE’s GTC news is full-throated support for Nvidia’s BasePOD and SuperPOD GPU supercomputers. HPE’s storage does not support GPUDirect, apart from the OEM’d VAST Data software forming its GreenLake for File Storage service. Competitors Dell, DDN, Hitachi Vantara, NetApp, Pure Storage, VAST Data, and WEKA are all signed up members of the SuperPOD supporters’ club. Their collective Cray support is a lot weaker.

WEKA intros turnkey appliance to plug into SuperPOD

Parallel file system supplier WEKA has devised an on-premises WEKApod storage appliance to plug into Nvidia’s SuperPOD.

Update. SuperPod details updated and image updated as well. 19 March 2024.

SuperPOD is Nvidia’s rack-scale architecture designed for deploying its GPUs. It houses 32 to 256 DGX H100 AI-focused GPU servers and connects over the InfiniBand NDR network. A DGX A100 is a 8RU chassis containing 8 x H100 GPUs, meaning up to 2,048 in a SuperPOD, 640 GB Memory, dual Xeon 8480C CPUs and BlueField IO-accelerating DPUs.

Nilesh Patel, WEKA
Nilesh Patel

WEKA chief product officer Nilesh Patel said: “WEKA is thrilled to achieve Nvidia DGX SuperPOD certification and deliver a powerful new data platform option for enterprise AI customers … Using the WEKApod Data Platform Appliance with DGX SuperPOD delivers the quantum leap in the speed, scale, simplicity, and sustainability needed for enterprises to support future-ready AI projects quickly, efficiently, and successfully.”

WEKApod is a turnkey hardware and software appliance, purpose-built as a high-performance data store for the DGX SuperPOD. Each appliance consists of pre-configured storage nodes and software for simplified and faster deployment. A 1 PB WEKApod configuration starts with eight storage nodes and scales up to hundreds. It uses Nvidia’s ConnectX-7 400 GBps InfiniBand network card and integrates with Nvidia’s Base Command manager for observability and monitoring.

Nvidia SuperPOD.

WEKA has supplied storage already for BasePOD use and its WEKApod is certified for the SuperPOD system, delivering up to 18.3 million IOPS, 720 GBps sequential read bandwidth, and 186 GBps write bandwidth from eight nodes. That’s 90 GBps/node when reading and 23.3 GBps/node when writing.

WEKApod

WEKA claims its Data Platform’s AI-native architecture delivers the world’s fastest AI storage, based on SPECstorage Solution 2020 benchmark scores. The WEKApod’s performance numbers for an 8-node 1PB cluster are 720 Gbps read bandwidth and 18.3 million IOPS. Per 1RU node that means 90 GBs read and 23.3 GBps write bandwidth and 2.3 million IOPS. A WEKA slide shows the hardware spec:

A WEKA blog declares: “Over the past three decades, computing power has experienced an astonishing surge, and more recent advancements in compute and networking have democratized access to AI technologies, empowering researchers and practitioners to tackle increasingly complex problems and drive innovation across diverse domains.”

But with storage, WEKA says: “Antiquated legacy storage systems, such as those relying on the 30-year-old NFS protocol, present significant challenges for modern AI development. These systems struggle to fully utilize the bandwidth of modern networks, limiting the speed at which data can be transferred and processed.” 

They also struggle with large numbers of small files. WEKA says it fixes these problems. In fact, it claims: “WEKA brings the storage leg of the triangle up to par with the others.”

Find out more about the SuperPOD-certified WEKApod here.

Comment

Dell, DDN, Hitachi Vantara, HPE, NetApp, Pure Storage, and VAST Data have all made announcements relating to their product’s support for and integration with Nvidia’s GPU servers at GTC. Nvidia GPU hardware and software support is now table stakes for storage suppliers wanting to play in the GenAI space particularly for GenAI training workloads. Any supplier without such support faces being knocked out of a storage bid for AI workloads running on Nvidia gear.

Nvidia GTC storage news roundup

Cohesity announced a collaboration with NVIDIA for its Gaia Gen AI software to include Nvidia’s NIM microservices. Gaia will be integrated with NVIDIA’s AI Enterprise software. Nvidia has also invested in Cohesity. Gaia nearest and dearest now get:

  • Domain-specific and performant generative AI models based on the customers’ Cohesity-managed data, using NVIDIA NIM. Customers can fine-tune their large language models with their data and adapt them to fit their organisation.
  • A tool for customers to query their data via a generative AI assistant to gain insights from their data, such as deployment and configuration information, security, and more.
  • Cohesity’s secondary data to build Gen AI apps that provide insights based on a customer’s own data.
  • Gen AI intelligence to data backups and archives with NVIDIA NIM.

Distributor TD SYNNEX’ Hyve hardware building business unit has announced an array of products tailored for the AI lifecycle at NVIDIA GTC. They include:

  • NVIDIA MGX / OCP DC-MHS 2U Optimized Generational Platforms for Compute, Storage and Ruggedized Edge 
  • OAM-UBB 4U Scalable HPC-AI Reference Platform                                     
  • Next Generation AI-Optimized ORv3 Liquid-Cooled Enclosure                                                                                   
  • Various 1U Liquid-Cooled Inference Platforms

Real-time streaming analytics data warehouse supplier Kinetica, which has integrated ChatGPT to its offering, is adding a real-time RAG capability, based on NVIDIA’s NeMo Retriever microservices and low-latency vector search using NVIDIA RAPIDS RAFT technology. Kinetica has built native database objects that allow users to define the semantic context for enterprise data. An LLM can use these objects to grasp the referential context it needs to interact with a database in a context-aware manner. All the features in Kinetica’s GEN AI offering are exposed to developers via a relational SQL API and LangChain plugins.

Kinetica claims its real-time generative AI offering removes the requirement for reindexing vectors before they are available for query. Additionally, we’re told it can ingest vector embeddings 5X faster than the previous unnamed market leader, based on the VectorDBBench benchmark.

Lenovo has announced that hybrid AI systems, built in collaboration with NVIDIA, and already optimized to run NVIDIA AI Enterprise software for production AI, will now provide developers access to the justannounced NVIDIA microservices, including NIM and NeMo Retriever. Lenovo has expanded the ThinkSystem AI portfolio, featuring two new 8-way NVIDIA GPU systems that are purpose-built to deliver massive computational capabilities. 

They are designed for Gen AI, natural language processing (NLP), and large language model (LLM) development, with support for the HGX AI supercomputing platform, including H100 and H200 Tensor Core GPUs and the Grace Blackwell GB200 Superchip, as well as Quantum-X800 InfiniBand and Spectrum-X800 Ethernet networking platforms. The new Lenovo ThinkSystem SR780a V3 is a 5U system that uses Lenovo Neptune liquid cooling.

Lenovo claims it’s the leading provider of workstation-to-cloud support for designing, engineering and powering OVX systems and the Omniverse development platform. It’s partnering with NVIDIA to build accelerated models faster using MGX modular reference designs. 

NVIDIA launched enterprise-grade generative AI microservices that businesses can use to create and deploy custom applications on their own platforms. The catalog of cloud-native microservices built on top of NVIDIA’s CUDA software, includes NIM microservices for optimized inference on more than two dozen popular AI models from NVIDIA and its partner ecosystem.  

NIM microservices provide pre-built containers powered by NVIDIA inference software — including Triton Inference Server and TensorRT-LLM — which, we’re told, enable developers to reduce deployment times. They provide industry-standard APIs for domains such as language, speech and drug discovery to enable developers to build AI applications using their proprietary data hosted in their own infrastructure. Customers will be able to access NIM microservices from Amazon SageMaker, Google Kubernetes Engine and Microsoft Azure AI, and integrate with popular AI frameworks like Deepset, LangChain and LlamaIndex.

Box, Cloudera, Cohesity, Datastax, Dropbox, NetApp and Snowflake are working with NVIDIA microservices to help customers optimize their RAG pipelines and integrate their proprietary data into generative AI applications. 

NVIDIA accelerated software development kits, libraries and tools can now be accessed as CUDA-X microservices for retrieval-augmented generation (RAG), guardrails, data processing, HPC and more. NVIDIA separately announced over two dozen healthcare NIM and CUDA-X microservices.

NVIDIA has introduced a storage partner validation program for Its OVX computing systems. OVX servers have L40S GPUs and include AI Enterprise software with Quantum-2 InfiniBand or Spectrum-X Ethernet networking, as well as BlueField-3 DPUs. They’re optimized for generative AI workloads, including training for smaller LLMs (for example, Llama 2 7B or 70B), fine-tuning existing models and inference with high throughput and low latency.

NVIDIA-Certified OVX servers are available and shipping from GIGABYTE, HPE and Lenovo.

The validation program provides a standardized process for partners to validate their storage appliances. They can use the same framework and testing that’s needed to validate storage for the DGX BasePOD reference architecture. Each test is run multiple times to verify the results and gather the required data, which is then audited by NVIDIA engineering teams to determine whether the storage system has passed. The first OVX validated storage partners are DDN, Dell (PowerScale), NetApp, Pure Storage and WEKA.

SSD controller array startup Pliops has a collaboration with composability supplier Liqid to create an accelerated vector database offering to improve performance, capacity and resource requirements. Pliops says its XDP-AccelKV addresses GPU performance and scale by both breaking the GPU memory wall and eliminating the CPU as a coordination bottleneck for storage IO. It extends HBM memory with fast storage to enable terabyte-scale AI applications to run on a single GPU. XDP-AccelKV is part of the XDP Data Services platform, which runs on the Pliops Extreme Data Processor (XDP).

Pliops has worked with Liqid to create an accelerated vector database product, based on Dell servers with Liqid’s LQD450032TB PCIe 4.0 NVMe SSDs, known as the Honey Badger, and managed by Pliops XDP. Sumit Puri, president, chief strategy officer & co-founder at Liqid, said: “We are excited to collaborate with Pliops and leverage their XDP-AccelKV acceleration solution to address critical challenges faced by users running RAG applications with vector DBs.”

Snowflake and NVIDIA announced an expanded agreement around adding NVIDIA AI products to Snowflake’s data warehouse. NVIDIA accelerated compute powers several of Snowflake’s AI products:

  • Snowpark Container Services
  • Snowflake Cortex LLM Functions (public preview)
  • Snowflake Copilot (private preview)
  • Document AI (private preview)

We have updated our table of suppliers’ GPUDirect file access performance in the light of GTC announcements;

A chart show the suppliers’ relative positions better;

There is a divide opening up between DDN, Huawei, IBM, VAST Data and WEKA on the one hand and slower performers such as Dell (PowerScale), NetApp and Pure Storage on the other. Recharting to show performance density per rack unit shows the split more clearly.

We note that no analysis or consultancy business such as ESG, Forrester, Futurum, Gartner, or IDC has published research looking at supplier’s GPUDirect performance. That means there is no authoritative external validation of these results.