Home Blog Page 171

Storage news ticker – October 10

Cloud and backup storage vendor Backblaze has hired Robert Fitt as Chief Human Resources Officer, its first such position. He will lead the company’s strategic advancement for all HR functions including talent management, culture, compensation and benefits, organizational design, health and wellness, diversity, equity, and inclusion efforts. Fitt’s career in HR spans more than 20 years, including leadership roles at Turntide Technologies, 360 Behavioral Health, Mobilite, Broadcom Corporation, and others.

Managed infrastructure provider 11:11 Systems has bought assets of bankrupt Sungard Availability Services including its Cloud and Managed Services (CMS) business. This is 11:11’s seventh acquisition including Unitas Global managed servcies and cloud infrastructure assest and iland cyber resiliency and disaster recovery unit. It is backed by Tiger Infrastructure Partners, a middle-market private equity firm that invests in growing infrastructure platforms. Learn more at 1111Systems.com.

We hear that Airbyte is looking to build out its list of data connectors (currently around 150) aggressively and has made available a new low code SDK (alpha version), plus financial incentives for its user community. The goal is to have a large number of new connectors at the end of the month.

BittWare has announced three PCIe 5.0 CXL-capable accelerator cards based on Intel Agilex M-Series and I-Series FPGAs. One is the IA-860m accelerator card using the M-Series FPGA, with high-speed HBM2e DRAM integrated into the FPGA package. It is for high throughput and memory-intensive workloads and can be programmed using Intel oneAPI and Data Parallel C++ (DPC++). The other two are the single-width IA-640i and IA-440i accelerator cards using I-Series FPGAs. The IA-640i is designed for accelerating datacenter, networking, and edge-computing workloads. The entry-level and half-height IA-440i is a mini version of the IA-640i.  All three cards can be used as coherent hardware accelerators within datacenter servers, networks, and edge servers equipped with CXL-supporting CPUs. Adding CXL features to these three cards requires purchase and activation of a CXL IP license from either Intel or a third-party IP supplier. BittWare provides application reference designs for all three cards.

BittWare storage accelerator
BittWare PCIe gen 5 accelerator cards

Research house DCIG has announced availability of its 2022-23 TOP 5 report on AWS Cloud Backup Solutions. It provides small and mid-sized enterprise guidance on the best backup solutions for recovering applications, data, and workloads they host in AWS. The full report may be accessed at no charge with registration on DCIG’s website here.

We have learned that Neridio’s CloudRAID does encryption, compression, and deduplication in addition to Reed-Solomon (RS) erasure coding. IO performance will be impacted roughly around three times of standard IO. The company told us: ”RS coding and encryption always introduces a latency tax, which is one reason why it was not adopted by primary storage by the industry.” Neridio’s software is “in the latency tolerant path, as we are not dealing with primary storage. We are only focusing on safety and security of a Gold copy, data foundation, which is Tier 2 or secondary storage tier… We are targeting those markets where security, cyber resilience, and an attack-proof data layout is more important than performance.” Neridio’s software distributes the workloads in parallel. It has Active Storage Threat response against ransomware attacks and storage security in-motion with Exclusive Path routing at content level, it said.

Ocient has said its Hyperscale Data Warehouse is generally available in the AWS Marketplace. Customers can deal directly with Ocient in AWS Marketplace, or with Ocient partners and resellers who offer Ocient deployed on AWS to their network of customers.

Samsung said it expects to build 1,000-plus layer 3D NAND by 2030 at its Samsung Tech Day 2022 event. Jung-bae Lee, Head of Memory Business, said: “One trillion gigabytes is the total amount of memory Samsung has made since its beginning over 40 years ago. About half of that trillion was produced in the last three years alone.” Its most recent, 512Gb eighth-generation V-NAND features a bit density improvement of 42 percent, the industry’s highest among 512Gb triple-level cell (TLC) memory products. The world’s highest capacity 1Tb TLC V-NAND will be available by the end of the year. Ninth-generation V-NAND (TLC and QLC) is under development and slated for mass production in 2024. Samsung will accelerate the transition to quad-level cell  (QLC), while enhancing power efficiency.

Samsung storage news ticker chart
Blocks & Files table of 3D NAND supplier layer count generations

Western Digital’s SanDisk has announced the Pro-G40 external SSD with a single port for either USB-C or Thunderbolt 3. It’s a rugged drive capable of withstanding dust, dirt, sand, and submersion at a depth of 1.5 meters (about five feet) for up to 30 minutes. It can withstand up to 4,000 pounds of pressure and survive a fall of up to three meters (about 9.8 feet) onto a concrete floor. It can reach speeds up to 2,700 MB/sec read and 1,900 MB/sec write speeds with Thunderbolt 3. USB 3.2 Gen 2 supports a 10Gbit/s interface, a quarter of Thunderbolt 3’s 40Gbit/s and will be slower. The Pro-G40 is available at $300 for 1TB and $450 for 2TB, with a five-year limited warranty. 

San Disk Pro-G40 external storage
San Disk Pro-G40 external SSD

IBM’s Spectrum Scale distributed parallel file system can now be used with the IBM Cloud. We understand that, from a tile in the IBM Cloud catalog, customers can fill out a configuration page specifying the parameters of their desired storage and compute clusters. They then click Start and a process using IBM Cloud Schematics, Terraform, and Ansible deploys and configures many IBM Cloud VPC resources that, in less than an hour, culminates in cloud-based high performance parallel filesystem storage and compute clusters that are ready to run, we are told.

SIOS has said its LifeKeeper for Linux clustering software has achieved SAP recertification for integration with both NetWeaver and S/4HANA. SIOS’s specialized application recovery kits (ARKs) modules come with LifeKeeper to provide application-specific intelligence and automation of configuration steps. This enables configuration of HANA clusters without the complexity or need for error-prone manual scripting. Also ARKs ensure that cluster failovers maintain SAP best practices for fast, reliable recovery of operation, SIOS said.

China’s TerraMaster has launched the new two-bay F2-223 NAS and 4-bay F4-223 NAS with TRAID products, with upgraded specs including the use of Intel’s Celeron N4505 dual-core processor and the latest TOS 5 operating system. This offers more than 50 new functions and 600 improvements compared with the previous generation.

TRAID offers disk space, hard disk failure redundancy protection, and automatic capacity expansion. This elastic strategy provides higher disk space utilization than traditional RAID modes, we are told. Storage space can be expanded by replacing the hard disk with a larger capacity model and/or increasing the number of hard disks. TRAID allows a maximum of one hard disk to fail. You can migrate TRAID to TRAID+, with redundant protection of two hard drives, by upping the number of hard drives. 

The two-bay TerraMaster F2-223 is available in the US via Amazon for $299.99 and the 4-bay F4-223 for $439.99.

US Commerce Department will likely deny requests by US suppliers to send equipment to Chinese firms like YMTC and ChangXin Memory Technologies if they are making advanced DRAM or flash memory chips. License requests to sell equipment to foreign companies making advanced memory chips in China will be reviewed on a case-by-case basis. US suppliers seeking to ship equipment to China-based semiconductor companies would not need a license if selling to outfits producing DRAM chips above the 18 nanometer node, NAND Flash chips below 128 layers, or logic chips above 14 nanometers. This would likely affect YMTC, which is directly competing with Micron and Western Digital.

NIS2

NIS2 – The EU cybersecurity Network and Information Security 2 directive updates the 2016 NIS directive and takes effect on October 18, 2024. It has stricter security requirements, faster incident reporting, a focus on supply chain security, harsher penalties for non-compliant organizations, harmonized rules across the EU, and better member state information sharing. NIS2 aimed to improve the cyber-resilience of critical infrastructure and services across the European Union. Operators of NIS-affected services have to set up risk management practices and report significant incidents. EU member states have to set up national cybersecurity strategies and Computer Security Incident Response Teams (CSIRTs), and there is an EU-wide Cooperation Group to encourage cybersecurity information  sharing.

DORA

DORA – The EU’s Digital Operational Resilience Act is a set of EU regulations to enhance the cyber resilience of financial institutions aiming to ensure they can continue to function during cyberattacks or other potentially disastrous IT incidents. It is scheduled to come into force from January 2025. It sets standards for managing cybersecurity risks, incident reporting, and digital resilience for banks, insurers, payment service providers, and other financial entities. DORA emphasizes harmonized rules across the EU, covering risk management frameworks, third-party ICT service providers, and regulatory oversight. Its goal is to ensure financial firms can withstand, respond to, and recover from ICT-related disruptions and cyber threats.

Nemo

Nemo – NeMo (Neural Modules) is an Nvidia framework and toolkit designed to build, train, and deploy large-scale language models, speech recognition, and generative AI models. It’s specifically tailored for natural language processing (NLP), automatic speech recognition (ASR), and text-to-speech applications. NeMo provides pre-trained models, customizable pipelines, and tools for fine-tuning, allowing developers to integrate models into various applications such as chatbots, virtual assistants, and speech translation systems.

NeMo is not part of Nim, as Nim is a general-purpose, statically typed programming language, whereas NeMo is a toolkit for building, training, and deploying large-scale AI models.

LDPC

LDPC – Low-Density Parity Check – a coding algorithm that uses linear error correcting codes (ECC) featuring iterative belief propagation decoding to detect and correct errors in read or transmitted information. It is said to be suitable for error correction with large block sizes transmitted via very noisy channels. Decoders feature parallelism. LDPC code has been used in commercial hard disk drives and SSDs. 

NIM

NIM – Nvidia Inference Microservices – A software library of containerized services created by GPU hardware and software supplier Nvidia to prepare data to be used in generative AI large langauage model (LLM) training and inference. These prebuilt containers support a broad spectrum of AI models—from open-source community models to Nvidia AI Foundation models, as well as custom AI models. NIM microservices are deployed with a single command for integration into enterprise-grade AI applications using standard APIs and just a few lines of code. They are built on foundations including inference engines like Triton Inference Server, TensorRT, TensorRT-LLM, and PyTorch, and NIM is engineered to facilitate seamless AI inferencing at scale, ensuring that you can deploy AI applications on-premises or in the cloud.

Nvidia NIM Agent Blueprints is a catalog of pretrained, customizable AI workflows that equip enterprise developers with a suite of software for building and deploying generative AI applications for canonical use cases, such as customer service avatars, retrieval-augmented generation and drug discovery virtual screening. They include sample applications built with NeMo, NIM and partner microservices, reference code, customization documentation and a Helm chart for deployment.

Higher capacity disk drives leads to lower unit volumes shipped

TrendFocus data for calendar Q3 disk drive shipments show that about a fifth fewer drives were shipped than a year ago as big cloud buyers moved to 18TB drives.

The industry’s three suppliers shipped around 38.6 million drives, according to TrendFocus’ preliminary estimates for the quarter, which would be 42 percent fewer units than a year ago and down 14 percent on the previous quarter. 

TrendFocus splits the data into nearline (high-capacity 3.5-inch) drives, mission-critical (10K 2.5-inch), client PC and consumer electronics (retail) shipments. Nearline drives represent 80 percent of the total drives shipped and 71 percent to HDD industry revenue. Wells Fargo Analyst Aaron Rakers tracks this and his chart shows the inexorable rise of the nearline disk drive category;

A plateau effect can be seen in the capacity shipped from 2021’s Q2 onwards. Large scale buyers of high-cap disk drives buy capacity rather than units and higher capacity drives will lead to fewer units shipped unless overall capacity demand rises faster than per-drive capacity.

That has not been happening recently and Rakers said TrendFocus: “noted a strong positive mix shift with cloud companies moving to >18TB along, [which] with lower mid-capacity model shipments to OEMs resulted in ~250EB of capacity shipped or down ~2 percent y/y and q/q.”

There were some 2 million mission-critical drives shipped in the quarter; down 46 percent annually, although up 27 percent on a quarter-on-quarter comparison. Shipments in the 2.5-inch mobile and CE markets of 8.5-9 million also declined on a quarterly basis by 10-15 percent, the same percentage fall as 3.5-inch desktop PC and CE shipments of 12.5 million.

We’ve tabulated and charted supplier shipment shares;

Note that TrendFocus retroactively corrects its preliminary numbers.

Toshiba’s curve has turned up while both Seagate and WD have turned down. 

The biggest threat to the HDD industry is that pentalevel cell (PLC) NAND – with 5 bits per cell – could be cheap enough with high-enough endurance to start replacing nearline drives used for backup and archiving. At the moment SSDs have a near 5x price premium per TB compared to disk drives – see this Gartner chart – and PLC SSDs are still in the future with their properties unknown. 

Storage and the Supercloud

Analyst firm Wikibon has defined a new layer of technology – the Supercloud – and storage has a role within it.

Update: Dave Vellante note added at end re many contributors to the super cloud concept. 8 October 2022.

Wikibon has been working on the Supercloud concept for some months with IT suppliers and analysts to come up with a definition of what it is and what it means for product and service suppliers.

Late last month analyst Dave Vellante wrote: “Supercloud is a term introduced to describe an evolving architecture in computing… It is a natural evolution of today’s multi-cloud and hybrid computing models.”

Dave Vellante has written about the Supercloud concept
Dave Vellante

The Supercloud “comprises a set of services abstracted from the underlying primitives of hyperscale clouds (e.g. compute, storage, networking, security, and other native resources) to create a global system spanning more than one cloud.”

It has three properties. Firstly, it runs as a set of services across more than one cloud. Secondly, it has a purpose-built SuperPaaS (Platform-as-a-Service) layer that abstracts the underlying primitives of the native PaaS layer within each cloud and creates a common experience across clouds for developers, operators, users and/or ecosystem partners. 

The third property is that it has metadata intelligence that has an awareness of cost, latency, bandwidth, governance, security, data sovereignty, or other attributes in each supported cloud platform. This can be used to “run workloads efficiently across federated cloud platforms, and explicitly serves the intended purpose of the Supercloud.” 

Vellante sees three ways a Supercloud can be deployed: 

  • Single cloud instantiation with a control plane running its service on one cloud but supporting data plane interactions with more than one other cloud.
  • Multi-cloud, multi-region instantiation in which a “full stack of services is instantiated on individual clouds and regions. A unified interface supports interactions across more than one cloud.”
  • Global instantiation – “a single global instantiation of services spans multiple cloud provider regions. An example is a data platform that enables governed and secure data sharing across clouds and regions. Snowflake and Oracle Database Service for Microsoft Azure are examples.”

 The Supercloud can be consumed in three service models. One is Infrastructure-as-a-Service (IaaS) and this is “the ability to provision a service including compute, storage, networking, security or other computing resources across multiple clouds and on which workloads can be provisioned and managed without knowledge of the underlying cloud infrastructure.” Vellante identifies NetApp’s Cloud Volumes service as an example. 

A second is Platform-as-a-Service. “The developer and operational experience is identical across clouds with no need to manage the underlying compute, storage, network and security controls of the cloud provider.” VMware Cloud Foundation is the quoted example.

The third service model is Software-as-a-Service in which “users access applications from a Web browser or mobile application that invokes services running in more than one cloud. The user has no knowledge or control over the underlying cloud infrastructure.” SAP HANA is the example for this model.

Why do we need a new architecture for the hybrid multi-cloud world? Vellante writes: “Supercloud is an attempt to describe a new architecture that integrates infrastructure, unique platform attributes and software to solve specific problems that public cloud vendors aren’t directly addressing.”

There is little incentive for individual public cloud service suppliers to support other clouds, but users would like freedom from cloud lock-in, and application and data portability between public clouds. Vellante writes: “Hyperscale clouds are walled gardens and generally cloud providers want to keep data in their clouds.”

The Supercloud concept is a work in progress and Wikibon says it will continue to evolve it.

Bootnote.

Dave Vellante told me: “I do want to point out that I didn’t do this alone and don’t deserve all the credit. There were *many* contributors – I was a catalyst.”

AWS sets up Lustre-based caching filesystem

AWS is running a Lustre-based caching filesystem to provide fast file access to cloud compute needed to process distributed file and object data sets, including ones on-premise.

Update. AWS’ Sébastien Stormacq has updated the pricing section of his blog. 8 October 2022.

Amazon File Cache has a POSIX interface to NFS v3-accessed origin files that can be on-premises or in the public cloud in one or more regions, and also to S3 buckets which store object data.

Sébastien Stormacq, AWS
Sébastien Stormacq

AWS Principal Developer Advocate Sébastien Stormacq writes that Amazon File Cache “transparently loads file content and metadata (such as the file name, size, and permissions) from the origin and presents it to your applications as a traditional file system. File Cache automatically releases the less recently used cached files to ensure the most active files are available in the cache for your applications.” 

It uses a parallel Lustre filesystem behind the scenes and a Lustre client needs to be downloaded to your AWS account to set up the file cache.

There can be up to eight NFS filesystems or eight S3 buckets to a cache – it has to be uniformly NFS or S3 – and they are exposed or presented as a unified set of files and directories. Stormacq says: “The connection between File Cache and your on-premises infrastructure uses your existing network connection, based on AWS Direct Connect and/or Site-to-Site VPN.”

There are two options for uploading data from the origin sources to the file cache. Stormacq says: “Lazy load imports data on demand if it’s not already cached, and preload imports data at user request before you start your workload. Lazy loading is the default.”

The cached data can be accessed for processing by AWS compute services (instances) in containers or virtual machine. According to Stormacq: “Applications benefit from consistent, sub-millisecond latencies, up to hundreds of GB/sec of throughput, and up to millions of operations per second.” The performance depends upon the size of the cache; bigger being better for throughput, and it scales from a starting 1.2TiB (1.32TB) up to the pebibyte level using 2.4TiB increments.”

Stormacq’s blog has demos of him setting up the file cache using two Amazon FSx for OpenZFS file systems. He points out: “File Cache encrypts data at rest and supports encryption of data in transit. Your data is always encrypted at rest using keys managed in AWS Key Management Service (AWS KMS). You can use either service-owned keys or your own keys (customer-managed CMKs).”

The pricing is complex. AWS bills users for the provisioned cache storage capacity and metadata storage capacity and details can be found on a pricing page. Stormacq told us: “We do not charge S3 and Direct Connect and network transfer charges. These are all costs that depends on options chosen. If [the] customer use S3 they will be charged for S3 storage and data transfer. If they use their on-prem NFS server with a DX connection, they will be charged for DX etc.” Enjoy working this out.

File Cache is available in US East (Ohio), US East (N Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (London).

Dell has Liqid route to CXL memory pooling

Liqid
Liqid

Dell has shown how its MX7000 composable server chassis can be used with Liqid technology to add PCIe gen 4-connected GPUs and other accelerators to the composable systems mix, with an open road to faster still PCIe gen 5, CXL, and external pooled memory.

The four-year-old MX7000 is an 8-bay, 7RU chassis holding PowerEdge MX server sleds (aka blades) that can be composed into systems with Fibre Channel or Ethernet-connected storage. The servers connect directly to IO modules instead of via a mid-plane, and these IO modules can be updated independently of the servers. Cue Liqid upgrading its IO modules to PCIe gen 4.

Kevin Houston, Dell
Kevin Houston

Liqid supported the MX7000 from August 2020, with PCIe gen 3 connectivity to GPUs etc. via a PCIe switch. Kevin Houston, a Dell principal engineer and Field CTO, writes: “The original iteration of this design incorporated a large 7U expansion chassis built upon PCIe Gen 3.0.  This design was innovative, but with the introduction of PCIe Gen 4.0 by Intel, it needed an update.  We now have one.”

He showed a schematic of such a system: 

Dell Liqid diagram

The MX7000 chassis is at the top with eight upright server sleds inside it. A Liqid IO module is highlighted; a PCIe HBA (LQD1416) wired to a Liqid 48-port PCIe gen 4 fabric switch. This connects to a Liqid PCIe gen 4 EX-4400 expansion chassis which can hold either 10 Gen 4 x 16 full height, double wide (EX-4410) or 20 Gen 4 x 8 full-height, single wide (EX-4420) accelerators

The accelerator devices can be GPUs (Nvidia V100, A100, RTX, and T4), FPGAs, SSD add-in cards or NICs.

Houston writes: “Essentially, any blade server can have access to any [accelerator] device.  The magic, though, is in the Liqid Command Center software, which orchestrates how the devices are divided up over [PCIe].”

Liqid’s Matrix software allocates accelerators to servers, with up to 20 GPUs allocated across the eight servers in any combination, even down to 20 GPUs to a single server.

Comment

It seems to us at Blocks & Files that this MX7000 architecture and Liqid partnership means that PCIe gen 5, twice as fast as PCIe gen 4, could be adopted, opening the way to CXL 2.0 and memory pooling.

This would require Dell to equip the MX7000 with PowerEdge servers using Sapphire Rapids (Gen 4 Xen SP) processors – or PCIe gen 5-supporting AMD CPUs. Then Liqid will need a PCIe gen 5 HBA and switch. Once at this stage, it could provide CXL support and memory pooling with CXL 2.0.

When memory pools exist on CXL fabrics, composablity software will be needed to dynamically allocate it to servers. Suppliers like Dell, HPE, Lenovo, Supermicro etc. could outsource that to third parties such as Liqid or decide that the technology is core to their products and build it, acquire it or OEM it.

CXL memory pooling looks likely to be the boost that composability needs to enter mainstream enterprise computing and support use cases such as extremely large machine learning models. How the public cloud suppliers will use memory pooling, both internally and externally, as memory-pooled compute instances, is an interesting topic to consider. 

LRU

LRU – Least Recently Used is a cache replacement algorithm that removes the least recently used cache values when the cache is full.That means an LRU-using system has to timestamp cache entry accesses, or use counters, both of which are complex and could need hardware support. First in, first out (FIFO) is an alternative cache entry replacement method.

Hammerspace extends Global Data Environment

Hammerspace has extended its Global Data Environment so that IT operations can set global data policies applying to all data regardless of vendor storage or where it is located, and users can create and act on custom metadata from their desktops.

The Global Data Environment is an abstraction layer, based on a parallel global filesystem, covering file and object storage on external filers, object systems and hyperconverged infrastructure, both on-premises and in the public clouds. Users and IT admins operate within it as if they were operating the underlying storage without needing to know the intricacies involved in touching the storage layer components. The latest announcement makes it easier for users to operate in this environment as well as IT Operations (ITOps) staff. 

Hammerspace architecture

The new components in v4.6.6 of the software include:

  • Metadata plugin
  • User initiated file protection
  • Automated file reservation
  • Global audit
  • User-level search for files in GDE via integration with Alchemi Data Elasticsearch

The Metadata Plugin enables users to add their own custom tags to files and also manage metadata-driven workloads from their Windows desktops. Mac and Windows support will come in a future release. Such custom metadata can enrich files to trigger downstream workflows, data protection activity and other actions at a file-granular level. For example, file scan can be tagged as containing personally identifiable information which needs masking if generally available copies are made. Cost center information could also be added to departmental files.

Hammerspace screen grab
Hammerspace screen grab

The custom metadata could automatically trigger workflows, such as a report run, saving ITOps effort.

If a file is corrupted by malware, for example, users themselves can roll back to a previous version via Windows desktop mouse clicks (again, Linux and macOS support is coming). The users can roll back to the previous version of the file they need, no matter which storage system the file is stored on within the Global Data Environment. There is no need for ITOps help.

Users can also reserve files which are likely to have shared write access when write updates are being made. This is designed to avoid write collisions.

The Hammerspace software supports system ACLs (Access Control Lists) across both SMB and NFS to create a Global Audit log of filesystem operations such as file/folder deletes, renames, and other actions. This enables persistent System ACLs to be applied across the Global Data Environment, regardless of which storage type or location the file instances reside. Hammerspace software manages data placement across different sites behind the file system, ensuring that security enforcement is not broken by moving or copying data to other sites or platforms.

The user-level search facility means users can run keyword searches, via Alchemi Data’s Elasticsearch, to find files and documents they need. Alchemi Data can also scan documents, including MS Word and PDF files, to locate user-defined keywords. Hammerspace can harvest these keywords and add them to its metadata, making it easier to manage and access files for projects, research, organization, corporate governance and compliance.  

Hammerspace chart
Hammerspace chart showing distributed teams access to range of underlying storage vaults

Komprise

Hammerspace competitor Komprise recently announced user-level extensions to its feature and function set. We can separately position Komprise and Hammerspace by saying Komprise comes at the global file data problem from a system-wide file (unstructured data) tiering and movement to lower-cost storage approach. Hammerspace’s starting point is the need to have a global data environment for all data, primary and secondary, with life cycle management a secondary aspect of this.