Home Blog Page 107

OCP 2023 Summit unveils cutting-edge storage solutions

Meta’s Open Compute Project held an OCP 2023 Summit in San Jose last week, showcasing a variety of storage products.

Distributor TD SYNNEX is in the storage business through its Hyve Solutions Corp subsidiary. Hyve said it will support OCP through motherboards and systems compliant with the the OCP’s DC-MHS (Data Center Modular Hardware System) specification.

It also showed a roadmap of future products, including P-core and E-core-based Intel Xeon processors. Steve Ichinaga, Hyve president, said: “The DC-MHS modular server architecture is yet another way OCP works with the community to create and deliver innovation to the hyperscale market.”

NAND fabricator and SSD supplier Kioxia exhibited:

● XD7P Series EDSFF E1.S SSDs. 

● LD2-L Series E1.L 30.72TB NVMe SSD prototype, with PCIe 3.0, supporting 983 terabytes in a single 32-slot rack unit. That’s near enough 1PB per RU, meaning up to 40PB in a standard rack. An FIO sequential read workload saw throughput from 32 SSDs top 100GB/sec. 

Kioxia SSD specs from OCP summit
The performance is low, with 210,000 random read IOPS, but the density very high. More details will no doubt be forthcoming

● CD8P Series U.2 Data Center NVMe SSDs, which are among the first datacenter-class PCIe 5.0 SSDs in the market. 

● The first software-defined flash storage hardware device in an E1.L form factor, supporting the Linux Foundation Software-Enabled Flash Project.

Kioxia claims software-defined technology allows developers to maximize as yet untapped capabilities in flash storage, with finer-grained data placement and lower write amplification.

Phison Electronics said it had PCIe 5.0, CXL 2.0-compatible, redriver and retimer data signal conditioning IC products. These have been designed to meet the demands of artificial intelligence and machine learning, edge computing, high-performance computing, and other data-intensive, next-gen applications.

Michael Wu, president and general manager for Phison US, said: “Phison has focused … R&D efforts on developing in-house, chip-to-chip communication technologies since the introduction of the PCIe 3.0 protocol, with PCIe 4.0 and PCIe 5.0 solutions now in mass production, and PCIe 6.0 solutions now in the design phase.”

Phison products from OCP summit

All of Phison’s redriver solutions are certified by PCI-SIG – the consortium that owns and manages PCI specifications as open industry standards. Phison uses machine learning technology to work out the best system-level signal optimization parameters to store in the redriver’s non-volatile memory for each customer’s environment.

Pliops showed how its XDP-Rocks tech helps solve datacenter infrastructure problems, and how its AccelKV data service can take key-value performance to higher levels.

SK hynix and its Solidigm subsidiary showed HBM (HBM3/HBM3E), MCR DIMM, DDR5 RDIMM, and LPDDR CAMM (Low Power Double Data Rate Compression Attached Memory Module) memory tech products. 

The HBM3 product is used in Nvidia’s H100 GPU. CAMM is a next-generation memory standard for laptops and mobile devices with has a single-sided configuration half as thick as conventional SO-DIMM modules. SK hynix claims it offers improved capacity and power efficiency.

It also showed its Accelerator in Memory (AiM) – its processor-in-memory (PIM) semiconductor product brand, which includes GDDR6-AiM, and is aimed at large language model processing. There was a  prototype of AiMX, a generative AI accelerator card based on GDDR6-AiM.

SK hynix CXL products were exhibited including pooled memory with MemVerge software and a CXL-based memory expander applied to Meta’s software caching engine, CacheLib.

It also showed off its computational memory solution (CMS) 2.0. This integrates computational functions into CXL memory, using near-memory processing (NMP) to minimize data movement between the CPU and memory.

The SK hynix stand featured a 176-layer PS1010 E3.S SSD that first appeared in January at CES.

XConn Technologies exhibited its Apollo CXL/PCIe switch in a demonstration with AMD, Intel, MemVerge, Montage, and SMART Modular. It is a CXL 2.0-based Composable Memory System featuring memory expansion, pooling, and sharing for HPC and AI applications. XConn says the Apollo switch is the industry’s first and only hybrid CXL 2.0 and PCIe 5 interconnect. On a single 256-lane SoC it offers the industry’s lowest port-to-port latency and lowest power consumption per port in a single chip at a low total cost of ownership.

XConn says the switch supports a PCIe 5 only mode for AI-intensive applications and is a key component in Open Accelerator Module (OAM), Just-a-Bunch-of-GPUs (JBOG), and Just-a-Bunch-of-Accelerators (JBOA) environments. The Apollo switch is available now.

VAST Data bags second GPU cloud customer, Lambda Labs

GPU cloud service provider Lambda Labs is using VAST Data to store customers’ AI training data across its 60,000+ Nvidia GPU server cloud and co-location datacenters.

This deal closely follows a similar one between VAST and CoreWeave last month. Like CoreWeave, Lambda is seeing a sustained rise in demand for its GPU services as customers use them for large language model training. It offers a hyperscale GPU cloud, co-location-housed, GPU-focused datacenters, and customer-housed Echelon hardware systems, with an open source Lambda software stack covering all three.

Mitesh Agrawal, head of cloud and COO at Lambda, said: “The VAST Data Platform enables Lambda customers with private cloud deployments to burst swiftly into Lambda’s public cloud as workloads demand. Going forward, we plan to integrate all of the features of VAST’s Data Platform to help our customers get the most value from their GPU cloud investments and from their data.” 

VAST Lambda graphic
Private Cloud’ means Lambda’s Echelon hardware systems

The deal takes advantage of VAST Data’s Nvidia SuperPod certification and also its DataBase and DataSpace, a global file and object namespace across private and public cloud VAST deployments. Customers can store, retrieve, and process data consistently within this namespace.

Renen Hallak, VAST Data founder and CEO, said: “We could not be happier to partner with a company like Lambda, who are at the forefront of AI public and private cloud architecture. Together, we look forward to providing organizations with cloud solutions and services that are engineered for AI workloads, offering faster LLM training, more efficient data management and enabling global collaboration.”

VAST’s DataSpace means that customer data is accessible across the three Lambda environments. In a blog we saw before publication, VAST co-founder Jeff Denworth says: “We’re already working with several large Lambda customers who want to realize this vision of hybrid cloud AI data management.”

Bootnote

Lambda Labs provides access to Nvidia H100 GPUs and 3,200 Gbps InfiniBand from $1.89/hour, the claimed lowest public price in the world.

It says it’s a deep learning infrastructure company building a huge GPU cloud for AI training. It was founded in 2012 by Michael Balaban and CEO Stephen Balaban. They developed a facial recognition API for Google Glass, based on machine learning, a wearable camera, AI Echelon and Tensor hardware, and an internal GPU cloud. This became the Lambda GPU cloud in 2018. People could use its Nvidia GPU compute resources in Amazon compute instance style. It also offers co-location space in San Francisco and Allen, Texas.

Lambda has had several funding rounds totaling $112 million, including a $15 million round in July 2021 and a $44 million B-round in March this year. As the GenAI surge accelerates, a follow-on $300 million round is being teed up, according to outlets such as The Information, so it can buy more GPU server systems.

It expects 2023 revenues in the $250 million area and $600 million for 2024.

Download a Lambda Stack PDF doc here.

Komprise adds zero cutover downtime to data migration

Data manager Komprise has enabled system availability when a migration source is switched over to the destination, and added bootstrap and consolidation support.

The final stage of migrating data from a source to a destination system is when the last changes made on the source are copied to the destination. This can mean both systems are read-only for a time, seconds to minutes to hours, to make sure they are in sync. There are situations, though, where data-generating sources cannot be switched off. For example, medical imaging equipment, lab instruments, and IoT sensors must be able to always write the data they generate. 

Komprise’s v5.0 Elastic Data Migration now allows users and applications to add, modify, and delete data on the destination shares during this warm cutover period, eliminating downtime while the migration process is completed and the source data access switched off.

Komprise v5.0 Elastic Data Migration software
Partial screen grab from v5.0 Elastic Data Migration software showing warm cutover setup

Co-founder and CEO Kumar Goswami said: “Komprise eliminates the dread of unstructured data migrations with analytics-first intelligent automation and the fastest WAN migration speeds for data center or cloud consolidations.”

v5 also lets users bring in Komprise when they have already moved some data over, using a tool like Rsync, AWS Snowball, or Azure Data Box, for example. Komprise may be used to complete this jump-started migration. Another jump-start situation v5 handles is where customers have a mirrored copy of data, using NetApp SnapMirror for instance. Komprise can now finish the migration using that data, without having to move it all over again. 

A third migration scenario supported by v5 is the consolidation migration, with data from multiple sources being combined in a single destination. This might happen when dealing with consolidating shares and sites during an acquisition, or a data center consolidation project, or to migrate to the public cloud. Komprise’s v5 software fully automates the process of consolidating multiple migrations to the same destination.

The Komprise migration process includes copying all the access control permissions, performing data integrity checks and creating necessary reports.

Komprise Elastic Data Migration 5.0 is available as a standalone offering and is included in the Komprise Intelligent Data Management platform. Find out more about the company’s Elastic Data Migration here.

HPE and Dell diverge on AI strategy

An HPE Securities Analyst Meeting presentation provided a good look into the AI strategy differences between HPE and its closest competitor, Dell.

Update: Note about Dell and supercomouters added as Bootnote 2; 25 Oct 2023.

Both rate AI as the biggest driver of IT infrastructure revenue growth in the next few years and both have a strategy that extends across edge sites and hybrid cloud datacenters. Both see x86 servers doing AI inferencing, but Dell’s strategy extends its AI inferencing down to PCs, which it makes and sells but HPE does not. HPE’s strategy extends upwards into supercomputers at the Cray level, which it makes and sells but Dell does not.

That means Dell customers need access to Nvidia GPU server systems for large-scale AI training while HPE has its Cray systems for that. An HPE statement said it “believes it is differentiated from its competition in the ability to capture significant value from the growing AI market through its IP, trusted expertise, and long-term sustained market leadership in supercomputing.”

HPE president and CEO Antonio Neri set the scene by saying HPE has a growing total addressable market (TAM) with three components: 

  • Edge and networking growing 1.5x from $62 billion in 2022 to $94 billion by 2026
  • Hybrid cloud growing 1.7x from $97 billion in 2022 to $164 billion
  • AI growing 2.4x from $62 billion in 2022 to $146 billion in 2026
HPE TAM slide

Within hybrid cloud, storage has a 2 percent constant currency compound annual growth rate (CAGR) between 2022 and 2026. A year ago it had told analysts there was a -3 percent storage infrastructure CAGR from 2022 to 2025. Dell does not predict revenues in constant currency and due to inflation, HPE’s constant currency numbers generally translate into lower real-world currency numbers.

Storage looks substantial, but the chart bars are indicative and not to scale. We can only conclude growth is slowing. A slide deck appendix identified the actual storage TAM as $58 billion in fiscal 2022 and $66 billion in 2026.

Storage did not feature as a separate topic in the Edge & Networking section of HPE’s presentation deck, where it noted that GreenLake has >2 EB of data under management. But it did feature in the hybrid cloud section.

Fidelma Russo, EVP for Hybrid Cloud and CTO, said HPE wanted to capture market sub-sector share in hybrid cloud, calling out storage:

HPE hybrid cloud slide

She said there are three hybrid cloud challenges: transform through data, modernize IT infrastructure, and simplify hybrid cloud operations.

The transform through data concept was based on the Alletra 900, 600, and 500 series, and Alletra MP’s block and file storage platform, with file based on VAST Data’s software technology. This Alletra base provides a scalable platform with a cloud-native architecture and AIOps support.

Modernizing IT infrastructure means moving to cloud-like operations through GreenLake with hybrid cloud orchestration, private clouds, and AI-optimized infrastructure.

GreenLake for Backup and Recovery and Zerto (DR) appeared in the Simplifying Hybrid Cloud Operations section, where the overall theme was SaaS.

In the AI section of the presentation, which was given by Justin Hotard, EVP and GM for HPC, AI & Labs, HPE said its focus was on capturing growth through investing in its own IP in supercomputing and AI. 

It wants high-end AI training models to run on its own Cray hardware and says the x86-based compute AI Inference TAM CAGR will be 22 percent from 2022 to 2026. Non-AI x86 compute TAM has a 2 percent CAGR in comparison.

So we have Cray hardware for AI training and x86 servers (ProLiants) with accelerators like GPUs for AI inferencing – nice and simple.

A slide shows that Nvidia GPUs are present as accelerator components throughout HPE’s AI compute portfolio, as are GPUs from AMD and Intel but to a lesser extent.

HPE AI portfolio

SVP and interim CFO Jeremy Cox talked about all this from the finance point of view. For the hybrid cloud storage area, he said HPE wanted to drive a portfolio mix shift to margin-rich, owned-IP offerings, which poses a potential problem for storage partners such as Scality, Weka, and Qumulo as they represent non-HP-owned IP.

Cox said HPE wanted to expand its storage portfolio into newer addressable markets such as file and object storage, and expand its SaaS portfolio, mentioning Alletra, AIOps management, SaaS data protection services, and unified analytics. This storage area will grow revenues in line with the market, which differs from Russo’s point about growing storage market share.

Overall, Cox said, HPE sees a 2-4 percent revenue CAGR from fiscal 2023 to 2026. Dell has a higher 3-4 percent long-term revenue CAGR, remembering the constant currency point made earlier that indicates Dell’s CAGR is likely to be even higher than HPE’s.

Unlike Dell, HPE has not appointed a chief AI officer, nor is it putting major effort into bringing out customizable and validated GenAI systems and services. For now, it appears that Dell is forecasting higher AI-driven revenue growth over the next few years than HPE.

Get a recording of the meeting webcast here.

Bootnote

Because of its Cray and ClusterStor technology, HPE has a unique position among the mainstream system vendors in not having to cosy up to Nvidia for GPU-based AI accelerators like SuperPod for AI training workloads. Its Alletra systems do not, for example, support Nvidia’s GPUDirect storage access protocol. This means that HPE is absent from the GPUDirect supporters club whose members include Dell, DDN, IBM, NetApp, Pure Storage, Weka, and VAST Data.

With its OEM deal for VAST Data’s Universal Storage software for file storage, though, HPE could use the GPUDirect support in this.

Bootnote 2

Dell told us industry analysts recognize HPE and Dell as the two leading providers of supercomputers/HPC systems. It said HPC-focused analyst firm Hyperion around ISC 2023 earlier this year, regarding supercomputer sales: “HPE tallied $5.1 billion in server revenue in 2022 while Dell came in at $3.6 billion. They were followed by Lenovo ($1.2 billion), Inspur ($1.1 billion) and Sugon at $600 million. Other leading HPC server vendors, in order of sales: IBM, Atos (Eviden), Fujitsu, NEC and Penguin.”

Ww did not mean to imply that Dell was not a supercomputer supplier, but wished to highlight that HP’s AI strategy spcifically includes Cray-level supercomputers while Dell’s does not.

Druva releases GenAI copilot for SaaS backup

Public domain image from https://www.goodfreephotos.com/albums/people/fighter-pilot-in-cockpit.jpg

Druva has released a GenAI-powered Dru copilot for its SaaS backup services, making the grand claim that the tool helps seasoned admins make smarter decisions and novices perform like experts.

Update: Dru conversation screen image added. Oct 20, 2023.

A GenAI-powered copilot is a large language model-based machine learning system that is trained on a set of specific subject area issues, in this case, Druva’s SaaS backup. Users interact with the copilot software using conversational text to request system reports and analyses or solve problems. The copilot generates detailed system interface requests, including SQL command strings, to fulfill the request and return results to the user. Dru is built on AWS Bedrock, which itself hosts generative AI models built by AI21 Labs, Anthropic, and Stability AI.

Druva says Dru grants IT teams a new way to access information and drive actions through simple conversations to increase their productivity. Users can request custom reports, ask follow-up questions to refine report variables, and act on AI-powered suggestions to remediate backup failures within user reports.

Jaspreet Singh, Druva
Jaspreet Singh

Jaspreet Singh, Druva CEO, said in a statement: “Our customers … now with generative AI integrated directly into the solution … can further expedite decision-making across their environment.”

The company says Dru uses Druva backup metadata, but does not have access to actual customer data, and is built to ensure the AI learning process respects user control and permissions.

Dru provides:

  • Conversational interface making it easier for IT teams to interact, analyze, and find the information they’re looking for.
  • Interactive reporting: Streamlined data access and reporting with an interactive and visual design.
  • Assisted troubleshooting with simple written prompts. Dru can analyze logs, and troubleshoot errors.
  • Intelligent responses: Dru proactively prompts users with recommendations and best practices customized to their specific environments. It also advises users how to use new and advanced functionality to get the most value out of the platform.
  • Simplified admin operations: Dru can execute simple protection tasks for users, such as creating new backup policies to triggering new backups of specific workloads. 
  • Customized navigation: Dru is, we’re told, architected with simplicity in mind, allowing users to navigate their custom data protection environments with an easy-to-use conversational interface.
Drun conversation session.

Druva wants to reduce detailed user interaction with its services even more, as Singh explains: “We believe that the future is autonomous.”

Druva says Dru is designed to equip every user with the insight and foresight to make data protection more autonomous.

Storage news ticker Oct 19

It’s the season for reports as suppliers try to pump up their relevance in a crowded market.

Alluxio introduced Alluxio Enterprise AI, a high-performance data platform for making data available to AI. Alluxio has developed open source data store virtualization software, a virtual distributed file system with multi-tier caching. Alluxio Enterprise AI is a new product that builds on the previous Alluxio Enterprise Editions with a new architecture optimized for AI/ML workloads. It has has a distributed system architecture with decentralized metadata to eliminate bottlenecks when accessing massive numbers of small files, typical of AI workloads. A distributed cache is tailored to AI workload I/O patterns, unlike traditional analytics. The product supports analytics and full machine learning pipelines – from ingestion to ETL, pre-processing, training, and serving. The enhanced set of APIs for model training can deliver up to 20x performance over commodity storage. 

To integrate Alluxio with the existing platform, users can deploy an Alluxio cluster between compute engines and storage systems. On the compute engine side, Alluxio integrates with popular machine learning frameworks like PyTorch, Apache Spark, TensorFlow and Ray. Enterprises can integrate Alluxio with these compute frameworks via REST API, POSIX API or S3 API. On the storage side, Alluxio connects with all types of filesystems or object storage in any location, whether on-premises, in the cloud, or both. Supported storage systems include Amazon S3, Google GCS, Azure Blob Storage, MinIO, Ceph, HDFS, and more.  

Alluxio works on both on-premise and cloud, either bare-metal or containerized environments. Supported cloud platforms include AWS, GCP, and Azure Cloud.

Find out more here.

Data protector Asigra says the SaaS app backup market will grow to $232 billion by 2024.

A Commvault commissioned IDC report surveyed 500+ security and IT operations leaders worldwide to show a current view of how organisations are perceiving modern security threats and approaching cyber resilience. It found:

  • Only 33 percent of CEOs/Managing Directors and 21 percent of line-of-business leaders are highly involved in cyber preparedness  
  • 61 percent believe that data loss within the next 12 months due to increasingly sophisticated attacks is “likely” to “very likely”   
  • Exfiltration attacks occur almost 50 percent more often than encryption attacks, whilst phishing was ranked as the most concerning threat to address  
  • 57 percent of organisations have limited automation for key functions; only 22 percent report being fully automated  

Data lakehouse supplier Dremio published an academic paper with arXiv, titled “The Data Lakehouse: Data Warehousing and More,” exploring the data lakehouse model. It says the idea through this preprint publication is to gather feedback from the open source research and scientific community and make it available to the wider community of practitioners. Get the paper here.

Everspin STT MRAM chips.

Everspin is expanding its high-density EMxxLX STT-MRAM product family. The extended family of devices are now available in densities from 4 to 64 megabits with new, smaller packaging for the 4-to-16-megabit products. The EMxxLX, announced last year is the only commercially available persistent memory with a full read and write bandwidth of 400 megabytes per second via eight I/O signals with a clock frequency of 200MHz.  

NetApp released its 2023 Data Complexity Report with a focus on business need for unified data storage. It found that, of all tech executives with plans to migrate workloads to the cloud, three out of four still have most of their workloads stored on-premises. However, AI adoption is the biggest driver for cloud migration, and cloud is a major enabler for AI adoption. Seventy-four percent of respondents said they’re using public cloud services for AI and analytics. Tech executives globally (39 percent) say their top need for Flash innovation is to optimize AI performance, cost and efficiency. Get a copy here.

Victoria Grey

Recovering and now private equity-owned storage array supplier Nexsan announced Victoria Grey and Andy Hill are returning to the company as CMO and EVP of Sales Worldwide, respectively. Nexsan was purchased by Serene Investment Management earlier this year and it appointed Dan Shimmerman as CEO. Since then, we’re told, it has achieved positive operational cashflow and increases in both sales and profitability. Grey returns to Nexsan for the third time, having previously served as CMO in 2016-2017 and Senior VP of Marketing in 2010-2013. She has held marketing executive, leadership. and advisory positions at FalconStor, APARAVI, GridStore (HyperGrid), Quantum, and Dell EMC. Hill previously served as Nexsan’s EVP Sales, Europe, Middle East & Africa from 2006-2013, and has held executive sales leadership positions at Sequent, VERITAS, Komprise and Virtuozzo.

Nyriad’s GPU-accelerated storage controller array product is now available as a service. The company announced UltraIO-as-a-Service, an on-premise Storage-as-a-Service (STaaS) offering. The user experience is defined by three decisions: Contract Term, Data Services and Reserve Commitment. From there, Nyriad and its resellers handle implementation and ongoing 24/7/365 proactive monitoring, alerting and customer support. There is real-time flexibility to increase the Reserve Capacity throughout the term as business needs evolve. Customers can scale the Reserve Capacity back down within the term as long as it remains above the initial contracted amount. To learn more visit: www.nyriad.io/staas.

Ismail Elmas

Data protector Rubrik has appointed Ismail Elmas as Group VP of International Business. He joins after a four-year stint at Zscaler following periods at AppDynamics and BMC Software. 

Wells Fargo analyst Aaron Rakers tells subscribers regarding Seagate and Western Digital nearline disk ships: ”We expect STX (and WDC) to point to some signs of a nearline recovery (albeit slow) looking forward. … We think investors have become increasingly focused on the risk of accelerating Flash cannibalization driven by significant Flash price declines (albeit now trending higher), increasing eSSD Flash density (e.g. Pure Storage roadmap), and the insertion of AI (i.e., data velocity).”

Scality announced the integration of its RING object storage with VMware Cloud Director. It has become the first S3 object storage vendor to become an OSIS-compliant partner for VMware Cloud Director. VMware created OSIS (Object Storage Interoperability Service) to help partners integrate with VMware’s existing Object Storage Extension (OSE) platform. Plug-and-play interoperability between Scality RING and VMware Cloud Director can be used to operate new object storage services for a range of data protection workloads, helping service providers increase revenue streams from advanced storage services without a huge engineering lift. 

It says that, by using new Scality RING OSIS Adapter, Scality delivers rapid deployment of a variety of object storage service offerings that include: Storage-as-a-Service (STaaS), Backup-as-a-Service (BaaS), and Ransomware-protection-as-a-Service (RPaaS).

Block storage software supplier StorMagic announced a SaaS tool so admins can monitor and manage their SvSAN clusters globally. StorMagic Edge Control simplifies day-to-day SvSAN cluster administration, reducing the time admins spend managing their edge sites, whether they are using VMware, Microsoft or KVM hypervisors. SvSAN customers can download and begin using the software immediately, free of charge.

Veritas research that shows 45 percent of organizations may be miscalculating the severity of threats to their business. The study, Data Risk Management: The State of the Market – Cyber to Compliance, which polled 1,600 executives and IT practitioners across 13 global markets, provides insights into the most pressing risks, their impacts and how organizations plan to navigate them. When survey respondents were initially asked whether their organizations were currently at risk, almost half (48 percent) said no. But, after being presented with a list of individual risk factors, respondents of all levels recognized the challenges facing their organizations with 97 percent then identifying a risk. 15 percent of all those surveyed did not believe their organizations could survive another 12 months, given the risks they currently face. Download your copy here.

Weebit Nano signed its second foundry agreement after Skywater, licensing its back-end-of-line (BEOL) ReRAM technology IP to DB HiTek, a contract chip manufacturer. DB HiTek will offer Weebit ReRAM as embedded non-volatile memory (NVM) in its 130nm BCD process – used for many analog, mixed-signal and power designs. The integration of ReRAM in BCD will allow analog, power and mixed-signal designers to increase system integration, integrating power management monolithically with an MCU and NVM. Weebit will receive manufacturing license fees, project use fees and royalties based on production volumes.

Sk hynix said to have concerns over Kioxia-Western Digital merger

Horse fly head
Wikipedia public domain image - https://en.wikipedia.org/wiki/Fly#/media/File:Tabanus_atratus,_U,_Face,_MD_2013-08-21-16.06.31_ZS_PMax_(9599360121).jpg

The long-running and ongoing merger discussions between Bain Capital-owned Kioxia and its manufacturing joint-venture partner Western Digital’s NAND and SSD business is being reportedly upset by SK hynix’s concerns.

SK hynix is a partner in the Bain Capital consortium that took over Kioxia, then the Toshiba Memory business in 2018, paying $18 billion to do so. The other players in this high-stakes poker game are Toshiba, which owns 40.64 percent of Kioxia, and is about to be taken private, giving it a strong interest in selling its Kioxia stake for wads of cash, and activist investor Elliott Management. 

Elliott Management already has its financial hooks into Western Digital, saying WD is undervalued because its component NAND/SSD business, unlike its disk drive business, is deprecated by investors. Set it free from the disk drive mother ship and investors would see just how good it was and value its shares, and Elliott’s holdings in them, much higher.

A combined Kioxia/Western Digital NAND business would have a 34 percent revenue share of the NAND market, according to TrendForce numbers from November 2022, more than current market leader Samsung’s 31 percent share. The merger makes market sense in this regard. The merged business would have its stock traded on Nasdaq and also pursue a Tokyo exchange listing.

What these players see is that, if Western Digital spun out its NAND/SSD business and that business was financed enough to buy into a merger with Kioxia, then many investors would get tasty morsels.

  • Toshiba could get cash for its Kioxia stake – tick.
  • Bain and its consortium members would get cash for their Kioxia stake – tick.
  • Elliott would get cash for its stake in Western Digital’s spun off NAND/SSD business – tick.
  • Western Digital would get Elliott Management off its back – tick.

As the long-running reports have progressed there have been leaks of details in the media, such as one in Bloomberg about financing the deal. It reported Kioxia was looking to refinance a ¥2 trillion ($14 billion) loan arrangement pursuant to the potential merger. Some of the loan funds would fund dividends for Kioxia shareholders. The deal was rumoured to compete this month.

Western Digital would own 50.1 percent of the merged operation, which would be headquartered in Japan and registered in the USA, and Kioxia the remaining 49.9 percent. Kioxia’s President Nobuo Hayasaka would be president of the merged business and most of the board members would be from Kioxia.

As a side point we would say this means Western Digital’s CEO, David Goeckeler may not have an executive operational role in the merged entity. That would mean playing second string to Kioxia’s Hayasaka, which would surely be unattractive. So Goeckeler will likely be staying with the WD hard disk drive business.

Now, reports in the FT and Nikkei say that SK hynix is worried the merged entity would not be strong enough to compete with Samsung.

The Nikkei points out that SK hynix is currently number 2 in the NAND market, behind Samsung and ahead of Kioxia and Western Digital. It would drop to number 3 if the merger goes ahead.

Comment

We think there is something else going on here. SK hynix could want to merge with or buy Kioxia itself, adding it alongside its Solidigm unit, bought for $9 billion from Intel in January 2022. But it would need to deconstruct the Kioxia-WD manufacturing JV or take it over and persuade WD to continue the arrangement.

SK hynix+Solidigm+Kioxia would have 40 percent of the global NAND market, leaving Samsung with 31 percent and Western Digital with 13 percent. That would look good from Sk hynix headquarters but not unlock the rivers of cash coming to Elliott Management from a Kioxia-Western Digital deal and leave Western Digital in a difficult situation. We can’t see Goeckeler agreeing to that.

It’s hard not to think that the Bain people have been traversing this ground over and over again with its consortium members and everyone else involved; getting all the ducks lined up in a row. Now, at a last minute stage, SK hynix steps out of line and starts quacking. Does it just want more cash for its holding? Is there a new view of the deal emerging in its boardroom? What on earth does it really and realistically want?

None of the participants in this corporate dance responded to requests for comment. It’s big business poker; you keep your hands hidden and present a poker face to the public watchers.

Seagate pips Toshiba and WD to 24 TB drive

Seagate Exos X24
Seagate Exos X24

Seagate has announced the Exos X24 at 24 TB, the highest capacity nearline disk drive using conventional perpendicular magnetic recording (PMR) technology, giving it a 2 TB capacity advantage over rivals Toshiba and Western Digital.

Update: WD sampling of unannounced 28TB SMR drive point added. 19 Oct 2023.

Seagate Exos X24
Seagate Exos X24

The X24 succeeds the Exos X20, a 20 TB maximum capacity drive announced at the end of 2021. Both drives are helium-filled, with ten platters spinning at 7,200 rpm, and use single-port 6 Gbps SATA or dual-port 12 Gbps SAS interfaces. They have five-year warranties and a 2.5 million hour mean time before failure (MTBF) rating. The X20 is fitted with a 256 MB cache, which the X24 doubles to 512 MB.

The latest model comes in 12, 16, 20, and 24 TB capacity points and features SED (self-encrypting drive), SED-FIPS, and instant secure erase (ISE). A 28 TB shingled magnetic media recording (SMR) version is available for a few cloud hyperscaler customers.

Seagate says the X24 has “enhanced caching that performs up to three times better than solutions that only utilize read or write caching.” Despite this, the X24 delivers the same maximum sustained data rate of 285 MBps as the X20.

A few months ago, Seagate CFO Gianluca Romano told financial analysts that this drive was coming and said it would be Seagate’s last PMR drive as increasing the areal density beyond 2.4 TB/platter was just not feasible. New HAMR (heat-assisted magnetic recording) technology will replace it, which, Seagate now says, is on track to begin ramping production in early 2024.

Toshiba’s highest capacity disk drive is its MG10F at 22 TB, announced last month. Western Digital has 22 TB Gold, Purple Pro and Red Pro drives that it announced in July 2022, along with the 26 TB Ultrastar shingled magnetic media recording (SMR) variant. These companies will now try to leapfrog Seagate with their energy-assisted magnetic recording tech.

Back in August we noted WD said it was about to begin sampling a new and as yet unannounced 28TB SMR drive. This suggests a 24TB conventional non-SMR drive might be coming as well.

Seagate says Exos X24 qualification drives are shipping to key customers and production drives will be available in volume for tech distributors in December.

StorPool: erasure coding protects against drive and node failure

StorPool Storage has added erasure coding in v21 of its block storage software, meaning its data should be able to survive more device and node failures.

The storage platform biz says it has also added cloud management integration improvements and enhanced data efficiency. StorPool has built a multi-controller, scaleable block storage system, based on standard server nodes, running applications as well as storage. It’s programmable, flexible, integrated, and always on. It claims that its erasure coding implementation protects against drive failure or corruption with virtually no impact on read/write performance. 

StorPool CEO Boyan Ivanov told us: “Contrary to the common belief, most vendors only have replication of RAID in the same chassis, not Erasure Coding between multiple nodes or racks.”

StorPool produced the chart below that surveys the erasure coding competitive landscape from its point of view:

StorPool’s erasure coding needs a minimum of five all-NVMe server nodes to deliver four features:

Boyan Ivanov
  1. Cross-Node Data Protection – information is protected across servers with two parity objects so that any two servers can fail and data remains safe and accessible.
  2. Per-Volume Policy Management – volumes can be protected with triple replication or Erasure Coding, with per-volume live conversion between data protection schemes.
  3. Delayed Batch-Encoding – incoming data is initially written with three copies and later encoded in bulk to greatly reduce data processing overhead and minimize impact on latency for user I/O operations.
  4. Always-On Operations – up to two storage nodes can be rebooted or brought down for maintenance while the entire storage system remains running with all data remaining available.

Customers can now select a more granular data protection scheme for each workload, right-sizing the data footprint for each individual use case. In large-scale deployments, customers can perform cross-rack Erasure Coding to enable their storage systems to benefit from data efficiency gains while simultaneously ensuring the survival of failure from up to two racks.

The v21 release also includes:

  • Improved iSCSI Scalability – allowing customers to export up to 1,000 iSCSI targets per node, especially useful for large-scale deployments.
  • CloudStack Plug-In Improvements – support for CloudStack’s volume encryption and partial zone-wide storage that enables live migration between compute hosts.
  • OpenNebula Add-On Improvements – supports multi-cluster deployments where multiple StorPool sub-clusters behave as a single large-scale primary storage system with a unified global namespace.
  • OpenStack Cinder Driver Improvements – enabled deployment and management of StorPool Storage clusters backing Canonical Charmed OpenStack and OpenStack instances managed with kolla-ansible.
  • Deep Integration with Proxmox Virtual Environment – with the integration, any company utilizing Proxmox VE can benefit from end-to-end automation.
  • Additional hardware and software compatibility – increased the number of validated hardware and operating systems resulting in easier deployment of StorPool Storage in customers’ preferred environments.

The company’s StorPool VolumeCare backup and disaster recovery function is now installed on each management node in the cluster for improved business continuity. VolumeCare is always running in each of the management nodes with only the instance on the active management node actively executing snapshot operations.

Dell PowerMax update boosts green credentials

A v10.1 software update to the high-end PowerMax storage array from Dell reduces its electricity consumption, the company says.

PowerMax is a mission-critical array supporting open systems, mainframe, and virtualized environments. Compute and media elements can be scaled independently with integrated Nvidia BlueField Data Processing Unit (DPU) technology. There are four models: 2000, 2500, 8000 and 8500. This new software release comes 15 months after Dell announced PowerMaxOS 10 software, which provided more than 200 new features in the storage intelligence, automation, cybersecurity, and resiliency fields. 

Ben Jastrab, Dell
Ben Jastrab

Ben Jastrab, director of Storage Product Marketing at Dell, said: “We are making it easier than ever for organizations to improve storage efficiency and strengthen cybersecurity, while providing the flexibility to seamlessly scale capacity and performance to keep pace with new business demands.”

Most storage suppliers could say the same things about their software releases these days.

Dell reckons the latest software provides real-time power (voltage, current, frequency) and environmental (temperature, humidity) monitoring, dynamic workload mobility, improved 5:1 data deduplication for open systems, and up to 2.8x better performance per watt, potentially saving $207,000 in electricity costs per array and reducing greenhouse gases by up to 82 percent.

We’re told there are enhanced security measures with federal certification, stronger Transport Layer Security (TLS) encryption, mainframe intrusion detection – claimed to be an industry first – anomaly detection, ignition key support, and NIST-compliant data erasure. Some of these almost seem like fixing niche vulnerabilities, such as Ignition Key Support, a data-at-rest capability leveraging external key managers to protect against physical theft of the array.

Dell PowerMax power and environment monitoring dashboard
PowerMax power and environment monitoring dashboard

The zCID (z-mainframe Cyber Intrusion Detection) monitors PowerMax’s mainframe data access patterns from a zero-trust base and can detect anomalies.

Dell has also improved the automation features with AI-driven health checks, Dell’s CloudIQ AIOps, automated storage provisioning via REST APIs, and a software-defined NVMe/TCP utility that reduces setup time by up to 44 percent.

This is a nice incremental feature set release from Dell for PowerMax users, albeit with a flavor of tinkering at the edges. Very few going to get that excited about ignition key support.

We doubt it will be of much concern to Infinidat, which competes with PowerMax. Another competitor, Hitachi Vantara, is unifying its various storage products, including the top end VSP array under a Virtual Storage Platform One scheme. Dell could do the same using its APEX concept.

Read a blog post by Jastrab titled “Save energy, accelerate cyber resiliency with PowerMax innovation” to find out more details.

DDN’s Paul Bloch: Race to AI has companies making purchase orders just weeks after they decide they need it

Mosaic AI
Mosaic AI

AI is developing so fast and being applied across the board so widely that its accelerative effect will be exponential, said DDN co-founder Paul Bloch, and it will need tens of thousands of GPUs and exabytes of storage.

Bloch, DDN’s president, was presenting to the faculty of Purdue University at a Cyberinfrastructure Symposium last month. He said that modern AI’s arrival and speed was astounding.

He had a chart that shows “how LLM has evolved over the past few years. And, what you can see here is that in the short three years, the models’ size increased over a thousand times and the GPU memory has increased times five.”

 

But this is small potatoes: “Picture the fact that right now ChatGPT uses data that’s about four or five years old, so ChatGPT doesn’t even have the data for the past four years. We’ve probably created much more data than the years before over the past four years. So, this is just the infancy.” 

Paul Bloch

He likened Gen AI stages to biological brains: “I recall AlexNet being able to classify in a smart way. … It’s about 262 petaflops in AI terms. It’s not the same as HPC petaflops. But this is equivalent to a worm brain.”

He said: “Then, you have autonomous driving. It’s a race to get to full autonomy and to do that, you need to have about 6 billion miles proven or in simulation. So, you need to show 6 billion miles on your software without any accident or any problem. That requires a lot of compute and storage as well, and what’s interesting is a large-scale SuperPOD is still the size of a fly brain.”

Accelerating storage bandwidth

GPUs need fast data transfer from storage, with Bloch saying: “As an overall rule, one GPU will require about one to two gigabytes a second of data storage. … If you buy a thousand GPUs, you’re going to need about a terabyte a second.  .. We’re deploying systems today over the past couple quarters that are delivering 5 terabytes, 10 terabytes a second.”

He added: “Some of the larger systems that we’ve deployed … in the past two-three months is 20,000 GPUs, and the only reason why it’s not more is because they cannot get their hands fast enough on them. So, 20,000 GPUs and a hundred of our systems. So, a hundred of those systems deliver about 10 terabytes a second. And, the capacity is about 40 or 50 petabytes and they’re just starting.” 

It’s not going to be enough. Storage delivery speeds are going to have to increase, as are the capacities needed: “We’ve been thinking gigabytes a second. Now it’s going to be hundreds of gigabytes a second or terabytes a second or tens of terabytes a second. We will have a handful of systems running at 50 to 100 terabytes a second before the end of 2024. And, they will have probably between 100 and 500 petabytes of data attached to it. That’s the hot file system, right? I’m not even talking about the cold file system behind which will be in the exabytes.”

Accelerated AI buy decisions

Generative AI development speed and potential is so huge that AI-related IT purchasing decisions are being made much faster than others. Bloch said: ”We started the year saying, ‘Okay, it’s going to be the same year as usual.’ Great, having fun…and then ChatGPT happened and the world changed. At all levels, it’s already changed. That means that what we have seen in the second quarter, and in the third quarter is totally inordinate. We are seeing companies go from ‘we need to get into AI, we need to do either internal ChatGPT or we need to do LLM’ and so forth decision process within two weeks. 

“From the first talks to those companies to an actual purchase order to deliver systems…about two weeks. We’ve never seen that. We don’t know if it’s going to keep on going, but this is basically the way industry sees AI. 

“They see it as a ‘do or die’ situation and a lot of them, the big guys, are putting their money where their mouth is.”

Part of the urgency they feel is “because if you don’t do it, and someone’s going to have a better service, better access to customers, better service to customers, better response, more efficiency people-wise – you’re going to have an issue with your company.”

Block thinks: ”We are in a new era because the world has been delivering a lot of R&D, a lot of advancements across the board. You can talk about transportation, precision therapies, digital wallets, biology, bioinformatics, 3D printing, autonomous – you name it. All these technologies have already been developed and delivered, but now they’re going to be catalyzed with AI.

He says: “You can get to a whole different level if you apply AI to any and all of these technologies.” 

And the future?

The accelerated AI development rate is huge. Bloch said: “What’s next? I don’t even know. All I know is that the speed of creation is going to accelerate… So, the cycles are shortening themselves very fast.”

What’s next for DDN’s filesystem? “We’ve been working on the next generation file system for the past six-seven years. It’s not replacing the current one we have, but it’s going to be a, we call it, a next generation for the future. So, on-prem, in the cloud, instantiating as software or appliances that’s going to be able to be workload-centric, intelligent in fashion, on a guarantee of service, and being able to deliver access to data in a simpler, more efficient and cost-efficient matter.”

More of the same but faster, smarter and simpler to use.

Micron introduces advanced 7500 datacenter SSD

Micron is launching a 7500 datacenter SSD using 232-layer 3D NAND technology, exceeding the prior 7450 drive on all counts except random write IOPS where both drives are equal.

The 176-layer 7450 with a PCI gen 4 interface was launched in March 2022 as a successor to the 7400 and its 96-layer 3D NAND. It came in U3, E1.S, and M.2 formats, whereas the 7500 is solely available in the U.3 format. Micron is focusing consistently on reliable low latencies with the 7450 delivering sub-2ms latency and the 7500 improving that. 

Micron 7500
The 7500 drive has a cross-hatched design for cooing flexibility

Alvaro Toledo, VP and GM of Micron’s Data Center Storage group, said: “We have achieved a breakthrough in latency, enabling response times below 1ms for 6x9s QoS in mainstream drives. This means our customers can run their data-intensive workloads faster, more efficiently and with more predictability than ever before.”

Micron 7450/7500 specs

Compared to the 7450, the 7500 has faster sequential read and write bandwidth and random read IOPS. Speeds in our table are “up to” speeds and vary with capacity. 

The 7500 has broad support for the Open Compute Project (OCP) SSD 2.0 specification, which provides intelligent management, performance optimization, seamless integration, and error handling for datacenter environments.

Micron provided charts showing how the 7500’s performance compares with other suppliers’ drives in the same market category: 

We understand the competing drives may be Solidigm’s DC P5430 (red line), an SK hynix SSD (light grey) and a Kioxia drive (dark grey). The chart shows the 7500 being equivalent to Kioxia’s drive but eventually beating it, while it is always faster than the SK hynix drive and much better than Solidigm’s SSD. That’s not surprising as Solidigm’s device uses QLC NAND, not TLC as all the others do.

A RocksDB competitive comparison shows the 7500 having better performance again:

The light grey and dark grey lines represent the same competitors as above.

Micron has equipped the 7500 with many security features:

  • Administrative commands to allow standardized control over functions such as namespaces and security, which easily integrate with OCP-compliant management systems.
  • Latency monitoring to improve performance by enabling the tracking and diagnosis of latency issues reported through the storage stack.
  • Error recovery and error injection features to enable rapid recovery of the drive and simulation of errors commonly encountered in servers.
  • Self-encrypting drive (SED) options with AES-256 hardware-based data encryption, running at line rate, SHA-512 and RSA to keep data safe.
  • Secure Encrypted Environment (SEE) to provide dedicated security processing hardware with physical isolation for improved security.
  • SPDM 1.2 attestation verifies device identity and firmware integrity to validate trust in the SSD from manufacturing through deployment.
  • Options for FIPS 140-3 Level 2 and TAA compliance to meet U.S. federal government procurement requirements.

The 7500 is available now through Micron’s OEMs and channel partners. Check out a product brief here.