Home Blog Page 120

TD SYNNEX gets into data migration transport biz

Distributor TD SYNNEX has a physical data migration offering using a Western Digital flash server chassis and MinIO object storage.

This is quite different from software-defined data migration services from suppliers such as Cirrus Data, Datadobi, Data DynamicsKomprise, which rely on network data transmission. It is akin to Seagate’s Lyve Drive Mobile Array offering, with its six drive bays for physically transporting data on the drives.

Matt Dyenson, SVP, Product Management at TD SYNNEX said: “Speed and efficiency are crucial to avoiding system downtime and, consequently, lost revenue during data migration, which can be a costly, frustrating and risky process for any organization.”

The SYNNEX service is based on a rental deal to physically migrate data using Western Digital’s Ultrastar Edge Transportable Edge Server which comes in a wheeled transport case. This has a 2RU chassis containing 40 CPU cores (2 x 20-core Xeon Gold 6230T 2.1GHz processors), a Tesla T4 GPU, and 512GiB of memory fronting 8 x Ultrastar DC SN640 NVMe 7.68TB SSDs, and 100GbitE networking.

Werstern Digital Ultrastar Edge transportable server.

That totals 61.44TB – not an especially large dataset capacity. Not when Solidigm has a single 61.44TB SSD announced. Data is stored using MinIO object storage with its erasure coding, encryption, and object locking.

Kris Inapurapu, Chief Business Officer at MinIO, played the repatriation card, saying: “Our high-performance, cloud-agnostic object storage perfectly complements TD SYNNEX’s suite of services. As customers migrate data and repatriate from the cloud they need a combination of resilience, security and logistical support – this solution delivers just that.”

Businesses can schedule windows to take delivery of the Western Digital hardware and pay for what they need during the rental period.

TD SYNNEX’s website has a Data Migration Service microsite that says the “offering delivers an integrated, tested solution that lets you safely and securely ship all the required components you need overnight to provide a quick, easy and cost-effective path for physically migrating data.”

The software-defined services above provide a framework within which data sources are scanned to identify a migration dataset, data set files (or blocks with Cirrus) are transmitted to a target system, the data movement is verified and, when migration is complete, a cutover process can be instituted. These framework elements are not included in the SYNNEX offering.

Instead SYNNEX says customers “select the date(s) for your data migration and we will ship the transportable migration platform to you directly. Rentals begin at a ten-day minimum.” There is nothing about how the data is extracted from the Ultrastar/MinIO chassis and moved to the destination system. This will be a manual process.

Tintri launches software-only VMstore and managed infrastructure service

DDN’s Tintri unit has unveiled a software-only rendition of its VMstore product and now offers VMstore through a managed infrastructure service framework.

Historically, Tintri’s VM-aware VMstore operated exclusively on its T7000 series hardware, using VMware storage abstractions for storage solutions. The separation of the VMstore software from Tintri’s hardware was announced as a Virtual Series project a year ago. Now the VMstore software, independent of the hardware, is termed the Tintri Cloud Engine (TCE), whereas the Tintri Cloud Platform (TCP) encompasses Tintri’s VMstore in a managed infrastructure capacity.

Phil Trickovic, Tintri SVP of Revenue, said: “The new Tintri Cloud Platform and Tintri Cloud Engine offerings are a testament to our commitment to providing customers with all of the tools they need to manage their infrastructure, no matter the size of their workloads or where they are on their hybrid cloud transformation journey.” 

Customers now have three avenues to access VMstore: provisioned on purchased Tintri hardware; consumed as managed infrastructure via TCP; or as software running in the public cloud through TCE.

TCE is containerized and runs on the AWS public cloud, serving as an AWS VM with EBS storage. Its primary purpose is to provide public cloud storage for on-premises Tintri workload-based snapshots, facilitating recovery from interruptions and ransomware incursions. TCE features real-time deduplication and compression, copy data management, and both real-time and predictive analytics.

TCP is marketed as a turnkey offering that promises “host-in-cloud and process-in-cloud capabilities.” Tintri says TCP’s potential uses include virtual datacenter (VDC), Infrastructure-as-a-Service (IaaS), and Disaster Recovery-as-a-Service (DRaaS). The VDC service is a private virtual resource pool, based on VMstore, with a self-service portal, unlimited internet traffic, and firewall protection. The resource pool consists of CPU, RAM, all-flash storage, and network delivered through an enterprise-grade managed cloud platform co-located in two carrier-neutral datacenters in the US – one in Reno, Nevada, and the other in Grand Rapids, Michigan.

Tintri datacenter locations

TCP IaaS includes compute, network, storage, security, backup, recovery, and disaster recovery with the flexibility to scale as needed.

TCP DRaaS, tailored for customers using VMware on-site or within the TCP VDC, facilitates the replication of virtual workloads either from on-premises sources to TCP or between TCP regions. Additionally, users can integrate various applications directly into their VDC from a dedicated marketplace.

All of a customer’s VMstore systems can be managed from a single console with Tintri Global Center. 

TCP is available today for new and existing customers. TCE is available for new and existing Tintri VMstore T7000 customers running on AWS. There’s a TCE datasheet here and TCP-VDC, TCP-IaaS, and TCP-DRaaS datasheets here.

Kioxia has killed its Kumoscale networked flash system

Quietly and with no fanfare, Kioxia has killed its Kumoscale networked flash array or JBOF. The deed was done three months ago, in May, with a note for partners: “Thank you for interest in KumoScale software (“Product”). There is no plan for enhancement beyond Version 3.22 as the Product has transitioned to maintenance only, and no new evaluation or production licenses will be granted. If you have any questions, please contact us.”

Joel Dedrick

“Kumo” is a noun meaning cloud in Japanese and Toshiba Memory set up its cloud scale SSD seven years ago.

Kioxia America, or Toshiba Memory America as it then was, recruited Joel Dedrick from being a consultant at Intel and previously SanDisk, to become VP and GM for its networked storage software business unit in September 2016. He says on LinkedIn that he was “recruited to build and drive a new product line” which was the KumoScale networked block storage software. Dedrick says he: “Defined [a] new product category; “networked block-storage node” to distinguish KumoScale from all-flash arrays and “JBOFs.”

But KumoScale was, to all intents and purposes, a JBOF. The networked block storage node concept signalled it was equivalent to a flash storage chassis in an external block storage array with some controller software functionality – an inferior external and scale-out all-flash SAN, in other words.

Dedrick’s team built up a software stack to run the hardware, with much use of OpenStack. There was no reinventing of existing software wheels. Instead the software stack was consistently enhanced. For example:

  • Nov 2020 – integrated KumoScale flash storage array into the Kubernetes world.
  • Dec 2021 – added admin tools and support for latest version of OpenStack.
  • June 2021 – added up-to-date OpenStack access control and open source integrations and increased its network access availability and bandwidth with a preview of multi-path networking support for NVMe-oF storage over TCP/IP networks.
  • April 2022 – v2.0 includes additional bare metal deployment options, seamless support for OpenID Connect 1.0, and support for NVIDIA Magnum IO GPUDirect Storage (GDS).
  • July 2023 – a cluster-wide Command Line Interface (Cluster CLI), compatibility with OpenStack Yoga multipathing, and interoperability with Microsoft Azure Active Directory.

The big issue was that disk and SSD supplier Toshiba did not want to make a full-scale SAN storage array, because to do so would pit it in competition against its own OEM customers who built their SANs with Toshiba SSDs. Rule number 1: suppliers shouldn’t compete with customers. So Dedrick attempted to define a middle ground product market category, between drives and full SAN arrays, which sidestepped that trap, but it did not have enough substance. Seven years after it was founded, the business unit has had its main product put into maintenance.

We have asked Koixia and it reiterated its earlier statement: “While KIOXIA is still supporting existing customers and KumoScale deployments there is no plan for enhancement beyond Version 3.22 as the product has transitioned to maintenance only, and no new evaluation or production licenses will be granted. We cannot comment on any additional details.”

Western Digital example

Western Digital faced the same kind of problems when it tried to build a datacenter storage business. It produced IntelliFlash, based on acquired Tegile disk and NVME SSD array technology, and ActiveScale archival array products. ActiveScale was based on HGST’s 2015 Amplidata object storage acquisition; WD having bought HGST in 2012.

The WD data center products business was killed off in September 2019 with the IntelliFlash product sold to DDN and the ActiveScale archival array to Quantum. WD still has its Ultrastar Edge Server rack chassis containing 8 x 7.68TB SN640 NVME SSDs for edge data collection and physical transport to a data center.

Storage drive manufacturers should not generally compete with their channel of storage hardware/software system builders. WD learnt that lesson in 2019 and now, four years later, Kioxia has too.

Seagate still has product lines built on and around its disk drives, including the Lyve Drive Mobile, Lyve Cloud and Exos RAID array product.

Pure Storage extends Azure Cloud Block Store to lower-cost SSD instances

Pure Storage’s Azure Cloud Block Store, its FlashArray Purity operating environment in Microsoft’s cloud, now supports Premium SSD v2 Disk Storage instances. This feature separates storage from compute to reduce costs and is in preview for the Azure VMware Solution (AVS).

Update: Pure Storage table entries added and cost saving quote updated. Lightbits table entries updated, 23 Aug 2023. Silk table entries updated, 24 Aug 2023. Volumez table entries updated, 25 Aug 2023.

Introduced in 2021, Cloud Block Store (CBS) for Azure enhances Microsoft’s fully managed VMware-as-a-Service offering – the Azure VMware Solution (AVS). Purchased by customers based on host nodes, AVS integrates compute, memory, network, and vSAN storage. However, this integration can escalate costs for those simply seeking increased storage since it necessitates the concurrent purchase of added compute and networking resources.

Pure chief product officer Ajay Singh said: “This expanded partnership between Pure Storage and Microsoft creates a significant milestone, ushering in a new age of cloud migration, and ultimately driving faster, more cost effective adoption of cloud services.”

Upon its debut, CBS for Azure employed Azure Ultra Disk Storage instances, block-level storage volumes paired with Azure Virtual Machines. The Azure alternatives encompass premium SSDs, Premium SSD v2, standard SSDs, standard HDDs, and the Elastic SAN (currently in preview), which consolidates a customer’s Azure storage needs. A table summarizes their characteristics:

Pure now offers CBS on Premium SSD v2

As the table shows, Premium SSD v2 offers 80,000 IOPS, half of Ultra Disk, yet quadruple that of a regular Premium SSD instance (20,000 IOPS). Its throughput stands at 1,200MBps, below the 4,000MBps of UltraDisk but above the Premium SSD instance’s 900MBps.

Pure says it has worked with a new version of CBS to make its use of Premium SSD v2 instances as fast as before. Cody Hosterman, Pure’s senior director of product management for cloud, told us: “Based on Premium SSD v2 inside of Azure, we were able to take this less expensive tier versus Ultra without any performance change on our side.”

Further benefits of CBS on Premium SSD v2 include immutable Safemode snapshots, compression, deduplication, thin-provisioning, multi-tenancy, encryption, disaster recovery, and high availability. Importantly, it allows for flexible storage scalability independent of compute resources, a limitation of the Ultra disk instances. Hosterman said “Our new premium model is 1/3 the cost of the previous model, … but the overall savings to their cloud storage bill is usually more like 40 percent.”

He says this enables customers to migrate on-premises workloads more cost-effectively, with potential reasons including business continuity and DR, datacenter expansion to include the cloud or the reverse – datacenter reduction and to use VDI.

Microsoft and Pure’s collaborative efforts have paved the way for CBS Azure to serve as an external block storage option for AVS. Pure says this is the first external block storage for VMware Cloud. Microsoft built a framework with its PowerShell to enable vSphere VMFS (Virtual Machine File System) support. Pure integrated this AVS PowerShell with its own PowerShell SDK to make a new version of its Plugin in VMware Cloud Manager enabling Azure AVS customers to use Cloud Block Store.

Hosterman said a 100TB AVS configuration using vSAN could need 10 AV36 host nodes. Moving to Pure’s CBS reduces that number to three and provides cost savings, he claimed – 56 percent on hourly commit, 50 percent on one-year reserved and 42 percent on three-year reserved rates.

Singh said: “Pure Cloud Block Store for Azure VMware Solution is just the beginning. By optimizing performance and cost at scale, we look forward to unlocking the number of mission-critical use cases that we can serve in the coming years.”

In simpler terms, with these enhancements, migrating on-premises, storage-intensive database workloads to Azure becomes more economical.

Competition

Azure’s block storage space also hosts other third-party suppliers: Lightbits, Silk, and Volumez. A preliminary comparison, based on publicly available documentation, outlines their offerings relative to Pure on the Azure platform. However, this should be regarded as a basic overview rather than a comprehensive analysis:

For comparison, AWS block storage competitors include the likes of Infinidat and Dell’s PowerFlex (formerly ScaleIO).

Databricks reportedly seeking VC funding

AI-focused analytics lakehouse supplier Databricks wants to raise more funding to continue its breakneck expansion and aims to overtake Snowflake as the largest data analytics company in the world.

Databricks supplies a data lakehouse, a combination of a data warehouse and data lake, and was founded in 2013 by the original creators of the Apache Spark in-memory big data processing platform. It has raised $3.5 billion in funding through nine funding events and is heavily focussed on AI/ML analytics workloads. Databricks had a $38 billion valuation in 2021. In August 2022 Databricks said it had achieved $1 billion in annual recurring revenues, up from $350 million ARR two years prior, but it did not say it had a positive cash flow.

Both Silicon Angle and The Information report Databricks wants to raise hundreds of millions of dollars. The two mention sources close to the company that say Databricks made an operating loss of $380 million in its fiscal 2023, which ended in January, and has lost around $900 million in its fy2023 and fy2022 combined.

Databricks pulled in a massive $2.6 billion in funding in 2021 and embarked on an acquisition spree to buy in AI/ML-related software technology:

  • October 2021 – 8080 Labs – a no-code data analysis tool built for citizen data scientists,
  • April 2022  – Cortex Labs – an open-source platform for deploying and managing ML models in production,
  • Oct 2022  – DataJoy – which raised a $6 million seed round for its ML-based revenue intelligence software,
  • May 2023 – Okera – definitive agreement to buy AI-centric data governance platform,
  • June 2023 – Rubicon – storage infrastructure for AI,
  • June 2023 – Mosaic – definitive agreement for $1.3 billion in a what Databricks tells us is a mostly stock deal. Buy completed 19 July 2023.

Charting these with its funding rounds gives an indication of Databricks’ fund raising and acquisition spending:

Blocks & Files chart.

The cost of the Mosaic acquisition was $1.3 billion while the other five acquisitions were for undisclosed amounts. In essence Databricks pulled in a lot of cash in 2021 and then spent a lot of cash in 2021 (buying 8080 Labs), 2022 (buying Cortex Labs and DataJo) and so far this year (buying Okera, Rubicon and Mosaic). That was in addition to its normal cash burn growing the company with marketing spend on trade shows, etc.

Databricks sees the generative AI market as a huge opportunity to grow its business substantially. For that it needs more cash. We’ve asked the company for a comment and were told: ”Databricks won’t comment on this occasion.”

DRAM, that’s fast: SK hynix reaches for HBM3E sky

South Korean memory fabber SK hynix is sampling an HBM3E chip, a month after Micron’s gen 2 HBM3 chip was unveiled.

HBM3E is High Bandwidth Memory gen 3 Extended and follows the HBM3 standard which was introduced in January 2022. Such memory is built from stacks of DIMM chips placed above a logic die, which is attached to an interposer that connects them to a GPU or CPU. Alternatively the memory chips can be directly stacked on the GPU. Either way the DRAM-to-GPU/CPU bandwidth is higher than the traditional X86 architecture of DRAM connected by sockets to the processor. Industry body JEDEC specifies HBM standards and its HBM3 standard was issued in January. Now vendors, incentivized by the AI and ML boom, are rushing to make it out of date.

SK hynix is facing depressed revenues because of memory and NAND over-supply in a low-demand market, though the memory market is starting to show some signs of recovery. Sungsoo Ryu, Head of DRAM Product Planning at SK hynix, said: “By increasing the supply share of the high-value HBM products, SK hynix will also seek a fast business turnaround.”

HBM generations table.

The company characterizes itself as one of the the world’s only “mass producers” of HBM3 product and plans to volume produce HBM3E from the first half of next year. SK hynix talked up its production of memory for the AI market, currently being significantly enlarged by demand for ChatGPT-type Large Language Models (LLMs). SK believes that LLM processing is memory limited and aims to rectify that.

Details of the SK hynix product are few, with the company only saying it can process data up to 1.15 terabytes(TB) a second, which is equivalent to processing more than 230 full-HD movies of 5GB-size each in a second. Micron announced a more than 1.2TBps HBM3 gen 2 product last month, suggesting that SK hynix has work to do.

Micron’s HBM3 gen 2 product has 24GB capacity using an 8-high stack, with a 36GB capacity 12-high stack version coming. SK hynix announced a 12-stack HBM3 product in April, with 24GB of capacity.

We suspect that SK hynix’s HBM3E product may be developed from this 24GB capacity, 12-stack offering and could achieve 36GB.

SK hynix says the HMB3E product is backwards-compatible with HBM3; just drop it in to an existing design and make the system go faster.

HDDs may be greener than SSDs

Green computing
Green computing

Research house Futurum has backed a recent research paper that suggests HDDs could be greener than SSDs. The paper states that the biggest carbon emissions happen at the time of manufacture, with production of SSDs creating more carbon than disk drives.

In a research note “Are SSDs really more sustainable than HDDs?” Futurum analyst Mitch Lewis claims “the manufacturing process for the flash devices used in SSDs is highly energy-intensive [which is] about 8x higher embodied cost compared to Hard Disk Drives (HDDs) with an identical capacity.”

He is referencing a recent study presented at HotCarbon 2022 titled “The Dirty Secret of SSDs: Embodied Carbon,” co authored by University of Wisconsin–Madison comp sci prof Swamit Tannu, and Prashant J Nair, an assistant comp sci professor at the University of British Columbia.

The study claims: “Manufacturing a gigabyte of flash emits 0.16 Kg CO2 and is a significant fraction of the total carbon emission in the system… the flash and DRAM fabrication centers have limited renewable electricity supply, forcing fabs to use electricity generated from carbon-intensive sources.” This 0.16kg CO2 is the embodied carbon cost of SSDs. 

Despite being physically bulkier, “compared to SSDs, the embodied carbon cost of HDDs is at least an order of magnitude lower.” This is because HDD manufacturing is much less semiconductor-intensive than SSD fabrication and semiconductor manufacturing needs more electricity than HDD manufacturing. Also, “most semiconductor fabrication plants rely on the electricity generated by coal and natural gas plants” rather than renewable sources.

Deciding how green an SSD or HDD is isn’t simply a matter of assessing its operational electricity usage and “one must account for the embodied [carbon] cost when deciding on the storage architecture.”

The study assesses and compares the lifetime emissions of HDDs and SSDs, with five-year and 10-year life span periods, in a table:

SSDs HDDs emissions

CO2e is the emitted carbon cost in kilograms. We charted the total lifetime HDD and SSD emissions numbers (rightmost two columns) to show the result graphically:

SSDs HDDs emissions

According to this study, HDDs also emit less carbon during manufacture and their operational life than SSDs, and are thus more acceptable from an environmental view point.

Some caveats:

  1. The study assumes average HDD power consumption to be 4.2W, whereas SSD consumes 1.3W power.
  2. The opex (operational life) CO2e number uses a total energy consumption emission factor specified by the US EPA – 0.7kg/kWh from 2019 data.
  3. The capex is the embedded carbon cost from manufacturing.
  4. The 10-year calculation has a capex upgrade cost as both SSD and HDD are assumed to have five-year lifetimes.

Lewis writes: “The common thought is that SSDs are generally more sustainable than HDDs because they are more power efficient, primarily due to the lack of moving parts. This energy efficiency argument is heavily used in vendor marketing, specifically from storage vendors selling all-flash storage systems.”

But the research study changes things. “This is a fairly surprising result. The vast majority of vendors claim that SSDs are far more sustainable than HDDs, yet this report seems to show otherwise.”

Lewis suggests the authors may not have considered that “the technology refresh cycles of HDDs and SSDs have grown apart… A 4- or 5-year refresh cycle… is what is typically used for HDDs… [Longer vendor warranties] are allowing IT organizations to keep SSD devices for up to 10 years.”

Making that assumption gives the SSD a 10-year carbon cost of 209.2kg CO2e, which is much closer to the HDD figure, though still in excess of it. 

As SSDs increase their capacity (density) faster than HDDs “this difference in density may allow IT organizations to reduce their overall physical footprint, and therefore total emissions [and this] would make up for the difference between HDD and SSD emissions.”

“Still, the difference in the sustainability of SSDs and HDDs is likely much closer than many flash storage vendors would like to admit.” But in the future “denser devices that provide significant footprint consolidation may also improve the total carbon emissions in the datacenter,” meaning SSDs.

Samsung has 300-layer NAND coming, with 430 layers after that – report

Samsung is planning to build 300-layer3D NAND according to a paywalled DigiTimes Asia report.

This would make Samsung the second 3D NAND fabber after SK hynix to reach the 300-layer point. Sk hynix announced its 321-layer NAND a few days ago at FMS 2023. Micron has a 232-layer tech, SK hynix subsidiary Solidigm is at 192 layers while Kioxia/WD are at the 218-layer level with their BiCS gen 8 technology. China’s YMTC technology progress has been halted due to US IT technology export controls. 

Blocks & Files table

Generally speaking, the more layers in 3D NAND, the higher the capacity of the die, assuming the cell dimensions don’t change, and the lower the production cost per TB of flash. This leads to fewer NAND chips being needed to produce flash drives at existing capacity levels, higher-capacity flash drives and, hopefully, a lower cost in $/TB terms.

The 300L Samsung V-NAND device will be made by string-stacking two 150-layer components (strings of cells) together. As more layers are added to a 3D NAND element on a flash wafer it is necessary to etch holes (vias) between layers, and line the holes with chemical substances as part of the die fabrication. These need to be perpendicular to the plane of the wafer and have a regular cross-section and shape as they penetrate the myriad layers involved.

As the layer count increases, it becomes more and more difficult to ensure these characteristics and the yield of the wafer, in terms of good dies versus bad dies, decreases. Stacking two 150-layer components will be easier in manufacturing terms than building a single 300-layer product, although manufacturing takes longer.

The SK hynix 321-layer product is formed from 3 strings, or plugs, stacked together according to a PC Watch photo of an SK hynix slide shown at FMS 2023. Each string has 107 layers. The existing hynix 238-layer technology has two strings, each with 119 layers.

Samsung’s 10th 3D NAND generation could be a 430 layer die and that could also use 3-string stacking technology.

Micron and WD/Kioxia will be encouraged to move to 300-layer technology because, unless they do, their production costs will be higher than both Samsung and SK hynix, putting them at a price disadvantage. Similarly, SK hynix subsidiary Solidigm will face the same pressures.

Hammerspace CEO pushes NFS SSD concept

David Flynn, Hammerspace
David Flynn, Hammerspace

Hammerspace founder and CEO David Flynn has proposed an NFS SSD. Flynn’s notion is based on NFS Ethernet SSDs. The Ethernet SSD concept has been floated before with Kioxia in 2020, for example, putting a Marvell Ethernet controller into an SSD and addressing it as an NVMe device using RoCE. It had previously promoted an Ethernet SSD concept in 2018 as Toshiba Memory. 

The Hammerspace CEO says conventional SSDs, when accessed by NFS clients over NVMe, have chatty PCIe connections with up to nine of them per NFS access. 

The X device is a PCIe switch.

This can be simplified. The DENTRY (in-memory representation of Directory Entry in Linux) INODE mapping can be carried out by the storage system CPU separate from the client-to-NVMe SSD link chain. This mapping identifies where the data being accessed is located in the target SSD.

Hammerspace NFS SSD

The SSD is then accessed by Ethernet and carries out mapping operations to find the file data’s address. Part of the file system now resides in the NFS SSD, which needs software running in its controller processor to achieve. The end result is only three connections between the accessing client and the NVMe SSD. That should lower cost and could improve data access performance and write amplification.

With an array of NFS SSDs you could get parallel access, speeding up IO.

Flynn presented his ideas at the 2023 IEEE Massive Storage Systems and Technology (MSST) conference in SantaClara. Analyst Tom Coughlin wrote about them here.

Bootnote

In November 2021 Kioxia launched EM6 SSDs accessed over RoCE NVMe-over-Fabric links and installed in a 24-slot EBOF — an Ethernet Bunch of Flash box — capable of pumping out 20 million random read IOPS. This provided block access to Ethernet SSDs. The EM6 is a native NVMe SSD fitted with a Marvell 88SN2400 NVMe-oF SSD converter controller that provides dual-port 256Gbit/sec Ethernet access. This product appears to have gone away; we’ve asked Kioxia to confirm this and the company said: “The EM6 is no longer available and it was never released to production.”

IBM developing S3 interface tape library

IBM is adding a server and software to its Diamondback tape library to build an on-premises S3 object storage archive.

The DiamondBack (TS6000), introduced in October last year, is a single-frame tape library with up to 14 TS1170 tape drives and 1,458 tape LTO-9 cartridges, storing 27.8PB of raw data, and 69.6PB with 2.5:1 compression. It transfers data at up to 400MBps with 12 drives active and has a maximum 17.2TB/hour transfer rate.

IBM Diamondback S3
Diamondback

DiamondBack S3 has an added x86 server, as the image shows, which provides the S3 interface and S3 object-to-tape cartridge/track mapping. Client systems will send Get (read) and Put (write) requests to DiamondBack S3 and it will read S3 objects from, or write the objects to, a tape cartridge mounted in one of the drives.

IBM’s Advanced Technology Group Tape Team is running an early access program for this Diamondback S3 tape library. Julien Demeulenaere, sales leader EMEA – Tape & High-End Storage, says Diamondback S3 will be a low-cost repository target for a secure copy of current or archive data. It will enable any user familiar with S3 to move their data to Diamondback S3. A storage architect can sign up for a 14-day shared trial on a Diamondback S3 managed by IBM, so they can verify the behavior of S3 for tape.

The S3-object-on-tape idea is not new, as seen with Germany’s PoINT Software and Systems and its Point Archival Gateway product. This provides unified object storage with software-defined S3 object storage for disk and tape, presenting their capacity in a single namespace. It is a combined disk plus tape archive product with disk random access speed and tape capacity.

Archiving systems supplier XenData has launched an appliance which makes a local tape copy of a public cloud archive to save on geo-replication and egress fees.

Quantum has an object-storage-on-tape tier added to its ActiveScale object storage system, providing an on-premises Amazon S3 Glacier-like managed service offering. SpectraLogic’s BlackPearl system can also provide an S3 interface to a backend tape library.

DiamondBack S3 does for objects and tape what the LTFS (Linear Tape File System) does for files and tape, with its file:folder interface to tape cartridges and libraries. Storing objects on tape should be lower cost than storing them on disk, once a sufficient amount has been put on the tapes – but at the cost of longer object read and write times compared to disk. IBM suggests it costs four times less than AWS Glacier with, of course, no data egress fees.

Demeulenaere told us: “There is no miracle, we can’t store bucket on tape natively. It’s just a software abstraction layer on the server which will present the data as S3 object to the user. So, from a user point of view, they just see a list of bucket and can only operate it with S3 standard command (get/put). But it is still files that are written by the tape drive. The server will be accessed exclusively through Ethernet; dual 100GB port for the S3 command, one GB Ethernet port for admin.

“The server is exclusively for object storage. It can’t be a file repository target. For that, you will need to buy the library alone (which is possible) and operate it as everybody is doing (FC, backup server).”

Data recovery service for corrupted WD SanDisk SSDs

Secure Data Recovery (SDR) is offering to rescue files up to 256KB from corrupted SanDisk Extreme SSDs for free. A $79.99 licence fee is required for larger file recoveries.

Some 2TB and 4TB Western Digital SanDisk Extreme Pro, Extreme Portable, Extreme Pro Portable and MyPassport SSDs, released in 2023, have suffered data loss due to a possible firmware bug. This has sparked a lawsuit which hopes to become a class action. Secure Data Recovery thinks it has a way to salvage most if not all of the data from such drives.

SDR reckons the firmware bug messes up allocation metadata which stops the SSD managing data distribution among its flash pages and cells. SDR says its “software detects traces of scrambled metadata and file indexes. Once located, it pieces the information together and retrieves the data.” It is providing a process to recover the lost data with users paying a license fee if files above a certain size are recovered. We haven’t tried this process or verified it and include it here for information purposes only. 

SanDisk Extreme Pro SSD.

The sequence of operations a user needs to carry out, after connecting the SSD to a PC or Mac host, is this:

1)    Download the Secure Recovery software (free)

2)    Select the location containing lost data. That is C: drive for most users.

3)    Select Quick File Search or Full Scan.

4)    Choose the relevant file or folder.

5)    Continue with the relevant subfolder if applicable.

6)    Select the file from the list on the right.

7)    Click to select the output folder for recoverable files. That is the E: drive for most users.

8)    Click Recover in the bottom-right corner of the window. A status update will appear if successful.

9)    Click Finish in the bottom-right corner of the window.

SDR says that “sometimes, the parent folder structure is lost in the process. However, the files inside them are unaffected. Most subfolders remain intact. Recovered files from missing directories are assigned to a Lost and Found folder.”

The company is a SanDisk platinum partner for its recovery services under a no-data-no-recovery-fee deal. WD’s data recovery webpage lists four US-based recovery operators: Ontrack Data Recovery, Drive SaversDataRecovery, Datarecovery.com and Secure Data Recovery.

Storage news ticker – August 16

Ascend.io’s CEO/founder Sean Knapp says he believes that the data ingestion market won’t exist within a decade, because cloud data players will provide free connectors and different ways to connect to external sources without moving data. He thinks consolidation in the data stack industry is reaching new heights as standalone capabilities are getting absorbed into the major clouds.

SaaS app and backup provider AvePoint reported Q2 revenue of $64.9 million, up 16 percent year on year. SaaS revenue was $38.3 million, up 39 percent year on year,  and its total ARR was $236.2 million, up 26 percent. There was a loss of $7.1 million, better than the year-ago $11.1 million loss. It expects Q3 revenues to be $67.6 to $69.6 million, up 9 percent year on year at the mid-point.

Backblaze, which supplies cloud backup and general storage services, has hired Chris Opat as SVP for cloud operations. Backblaze has more than 500,000 customers and three billion gigabytes of data under management. Opat will oversee cloud strategy, platform engineering, and technology infrastructure, enabling Backblaze to scale capacity and improve performance and provide for the growing pool of larger-sized customers’ needs. Previously, he was SVP at StackPath, a specialized provider in edge technology and content delivery. He also spent time at CyrusOne, CompuCom, Cloudreach, and Bear Stearns/JPMorgan.

An IBM Research paper and presentation [PDF] proposes to decouple a file system client from its backend implementation by virtualizing it with an off-the-shelf DPU using the Linux virtio-fs/FUSE framework. The decoupling allows the offloading of the file system client execution to an ARM Linux DPU, which is managed and optimized by the cloud provider, while freeing the host CPU cycles. The proposed framework – DPFS, or DPU-powered File System Virtualization – claims to be 4.4× more CPU efficient per I/O, delivers comparable performance to a tenant with zero-configuration or modification to their host software stack, while allowing workload-specific backend optimizations. This is currently only available with the limited technical preview program of Nvidia BlueField.

MongoDB has launched Queryable Encryption with which data can be kept encrypted while it’s being searched. Customers select the fields in MongoDB databases that contain sensitive data that need to be encrypted while in-use. With this the content of the query and the data in the reference field will remain encrypted when traveling over the network, while it is stored in the database, and while the query processes the data to retrieve relevant information. The MongoDB Cryptography Research Group developed the underlying encryption technology behind MongoDB Queryable Encryption and is open source.

MongoDB Queryable Encryption can be used with AWS Key Management Service, Microsoft Azure Key Vault, Google Cloud Key Management Service, and other services compliant with the key management interoperability protocol (KMIP) to manage cryptographic keys.

Nexsan – the StorCentric brand survivor after its Chapter 11 bankruptcy and February 2023 purchase by Serene Investment Management – has had a second good quarter after its successful Q1. It said it accelerated growth in Q2 by delivering on a backlog of orders that accumulated during restructuring after it was acquired.  Nexsan had positive operational cash flow and saw growth, particularly in APAC. It  had a  96 percent customer satisfaction rating in a recent independent survey.

CEO Dan Shimmerman said: “Looking ahead, we’re recruiting in many areas of the company, including key executive roles, and expanding our sales and go-to-market teams. Additionally, we’re working on roadmaps for all our product lines and expect to roll these out in the coming months.” 

Nyriad, which supplies UltraIO storage arrays with GPU-based controllers, is partnering with RackTop to combine its BrickStor SP cyber storage product with Nyriad’s array. The intent is to safeguard data from modern cyber attacks, offering a secure enterprise file location accessible via SMB and NFS protocols and enabling secure unstructured data services. The BrickStor Security Platform continually evaluates trust at the file level, while Nyriad’s UltraIO storage system ensures data integrity at the erasure coded block level. BrickStor SP grants or denies access to data in real time without any agents, detecting and mitigating cyberattacks to minimize their impact and reduce the blast radius. Simultaneously, the UltraIO storage system verifies block data integrity and dynamically recreates any failed blocks seamlessly, ensuring uninterrupted operations. More info here.

Cloud file services supplier Panzura has been ranked at 2,075 on the 2023 Inc. 5000 annual list of fastest-growing private companies in America, with a 271 percent increase year on year in its ARR. Last year Panzura was ranked 1,343 with 485 percent ARR growth. This year the Inc. 5000 list also mentions OwnBackup at 944 with 625 percent revenue growth, VAST Data at 2,190 with 254 percent growth, and Komprise at 2,571 with 212 percent growth. SingleStore and OpenDrives were both on the list last year but don’t appear this year.

Real-time database supplier Redis has upgraded its open source and enterprise product to Redis v7.2, adding enhanced store vector embeddings and a high-performance index and query search engine. It is previewing a scalable search feature which enables a higher query throughput, including VSS and full-text search, exclusively as part of its commercial offerings. It blends sharding for seamless data expansion with efficient vertical scaling. This ensures optimal distributed processing across the cluster and improves query throughput by up to 16x compared to what was previously possible.

The latest version of the Redis Serialization Protocol (RESP3) is now supported across source-available, Redis Enterprise cloud and software products for the first time. Developers can now program, store, and execute Triggers and Functions within Redis using Javascript. With Auto Tiering, operators can keep heavily used data in memory and move less frequently needed data to SSD. Auto Tiering offers more than twice the throughput of the previous version while reducing the infrastructure costs of managing large datasets in DRAM by up to 70 percent.

The preview mode Redis Data Integration (RDI) transforms any dataset into real-time accessibility by seamlessly and incrementally bringing data from multiple sources to Redis. Customers can integrate with popular data sources such as Oracle Database, Postgres, MySQL, and Cassandra.

Silicon Motion has confirmed the termination of the merger agreement with MaxLinear and intends to pursue substantial damages in excess of the agreement’s termination fee due to MaxLinear’s willful and material breaches of the merger agreement. A July 26 MaxLinear statement said:”MaxLinear terminated the Merger Agreement on multiple grounds, including that Silicon Motion has experienced a material adverse effect and multiple additional contractual failures, all of which is clearly supported by the indisputable factual record. MaxLinear remains entirely confident in its decision to terminate the Agreement.” A July 26 8-K SEC filing contains a little more information.

HA and DR supplier SIOS has signd with ACP IT Solutions GmbH Dresden top disribute its products in Germany, Switzerland, and Austria.

SK hynix is mass-producing its 24GB LPDDR5X DRAM, the industry’s largest capacity.

Robert Scoble

Decentralized storage provider Storj has a partnership with Artificial Intelligence-driven mental health tech company MoodConnect. The two intend to unveil a state-of-the-art mental health tracker designed to help individuals and corporations capture, store, and share sentiment data securely. MoodConnect empowers users to track, store and share their own emotion data from conversations and to securely possess this data for personal introspection, to share with healthcare professionals, with friends and family, or to gauge organizational sentiment within a company. It will use Storj distributed storage. MoodConnect has appointed Robert Scoble as its AI advisor.