Home Blog Page 193

Retrospect backup platform adds cloud storage

StorCentric Restrospect
StorCentric Restrospect

StorCentric’s Retrospect has announced its own Storage Cloud and integrated it as a backup target into Retrospect Backup v19. It has also updated its VMware and Hyper-V virtual machine backup and restore facility to Retrospect Virtual 2022.

There are several cloud storage offerings used as backup targets, such as Amazon’s S3, Azure Blob, Backblaze B2, and Wasabi. Several backup-as-a-service suppliers sell their own cloud and examples include Clumio (AWS), Datto, and Druva (based on AWS). Generally speaking, backup industry practice is not to have a tightly coupled cloud storage target. 

General manager JG Heithcock said: “Retrospect Cloud delivers a seamless one-click cloud backup and ransomware protection experience to our customers with a global footprint to minimize latency and adhere to local data regulations.” He said Retrospect’s SMB customers will appreciate having a single subscription covering both the backup software and the cloud storage (perpetual licensing is still available as an option).

Retrospect has not actually built its own cloud infrastructure and is relying on the Wasabi S3-compatible cloud, with 13 datacenters around the globe. Each one will be certified. Wasabi positions its cloud as a high-availability offering significantly less expensive than AWS. This contract with Retrospect probably helps explain why Wasabi has been able to raise extra funding for its cloud development.

In effect, Retrospect is reselling the Wasabi cloud as an integral part of Retrospect Backup 19 and Virtual 2022. We may see similar arrangements from other backup suppliers, or even more such deals with backup suppliers by Wasabi.

Retrospect says its cloud has a major focus on combatting ransomware and has security-focused features like immutable backups, anomaly detection, multi-factor authentication, and AES-256 at-rest encryption. Sensitive data can be backed up to Retrospect Cloud Storage but guaranteed to remain private from the underlying infrastructure providers.

Heithcock told us: “We say, hey, you’ve been doing backups for a month. And normally, you backup 5, 10 percent of your files. Today, you’re wanting to backup 80 percent. Something’s weird about that, right? That’s not normal, probably ransomware. And so we will flag that backup and send out alerts and emails.”

Restrospect
Retrospect 19 OS compliance monitoring screenshot

v19 also supports on-premises NAS devices, including the latest Nexsan EZ-NAS unit, with Retrospect built-in, and tape libraries offering LTO-9 support. Heithcock said: “You can also set this up as a file server and actually backup files on it. In fact, you don’t have to just do backups, you can use Retrospect as a data migration tool.”

Retrospect 19 also includes:

  • Backup Comparison to check what’s changed between backups. Using this with anomaly detection, administrators can identify exactly which files changed to signal an anomaly and evaluate their contents to isolate valid ransomware infections.
  • OS Compliance Checks to identify systems that are out of compliance with the latest version of each operating system.
  • One certified Wasabi cloud storage location.

The backup comparison feature can be used if anomaly detection detects ransomware to identify infected files and avoid using them, going to previous versions instead.

Restrospect
Retrospect Backup screenshot with cloud storage option

A subsequent v19.1 update release will add:

  • Flexible immutable retention periods to extend the period on past backups instead of including that data in new backups.
  • Certification of each of Wasabi’s data centers around the world, meaning an additional 12 locations for Retrospect Cloud Storage.
  • Microsoft Azure for Government: blob storage on Azure for Government for state and local agencies looking for data protection in a US-based high-security data center.

Retrospect Management Console, the hosted backup analytics service, is updated to support multi-factor authentication, has a redesigned dashboard to better aggregate information for larger environments, separate user roles such as Administrators and Viewers, and a user action audit log.

Retrospect Virtual 2022 gets significantly faster backups, up to three times faster, and restores, data deduplication to reduce storage capacity occupancy, and certified Hyper-V 2022 support.

All-in-all, it has significantly upgraded and strengthened its backup software, with dedupe, cloud storage, anti-ransomware features, and much faster virtual machine backup.

Retrospect availability and pricing

Retrospect Backup 19.0 and Virtual 2022 will be generally available on July 12, and Backup 19.1 and Management Console updates will be generally available on August 30. Capacity-based pricing includes 500GB at $12/month and 1TB at $20/month. There is a separate add-on for existing perpetual licenses.

Jupiter supercomputer’s storage innards

Jupiter, Europe’s first exascale supercomputer, looks like it could be using NVMe SSDs, disk, IBM’s Spectrum Scale parallel file system, TSM backup, and LTO tape for its storage infrastructure.

The Jupiter system will have a modular design with three storage components: a parallel file system, a parallel and high-bandwidh flash module, and a backup/archive system. The system will be housed in Germany’s Forschungszentrum Jülich (FZJ) Supercomputing Centre in North Rhine-Westphalia.

Jupiter module diagram
JSC’s Jupiter module diagram

Jupiter stands for “Joint Undertaking Pioneer for Innovative and Transformative Exascale Research”.

Dr Thomas Eickermann, head of the communications system division at the Jülich Supercomputing Centre (JSC), told B&F: “The details of the storage configuration are not yet fixed and will be determined during the procurement of the exascale system.

“The parallel high bandwidth flash module will be optimized for performance and will therefore be tightly integrated with the compute modules via their high-speed interconnect. For the high capacity parallel file system, capacity and robustness will be further important selection criteria. This file system will most likely mainly be based on (spinning) hard disks. For the high capacity backup/archive system, we target a substantial extension of JSC’s existing backup/archive that currently offers 300PB of tape capacity.”

JSC already operates the existing JUWELS and JURECA supercomputers. JUWELS is a multi-petaflop modular supercomputer and currently consists of two modules. The first deployed in 2018 is a Cluster module; a BullSequana X1000 system with Intel Xeon Skylake-SP processors and Mellanox EDR InfiniBand. The second, deployed in 2020, is a Booster module; a BullSequana XH2000 system with second-generation AMD EPYC processors, Nvidia Ampere GPUs, and Mellanox HDR Infiniband.

Jupiter
Germany’s fastest supercomputer JUWELS at Forschungszentrum Jülich, which is funded in equal parts by the Federal Ministry of Education and Research (BMBF) and the Ministry of Culture and Science of the State of North Rhine-Westphalia (MKW NRW) via the Gauss Centre for Supercomputing (GCS). (Copyright: Forschungszentrum Jülich / Sascha Kreklau)

JURECA (pronounced “Eureka” and short for Juelich Research on Exascale Cluster Architectures) started in 2015. It uses Intel’s 12-core Xeon E5-2680 Haswell CPUs, deployed in a total of 1,900 dual-CPU nodes. The processors are equipped with Nvidia K80 GPU accelerators and connected by Mellanox 100Gbit/s Infiniband interconnects across 34 water-cooled cabinets. The nodes run a CentOS Linux distribution.

The Next Platform thinks “Atos will almost certainly be the prime contractor on the exascale machine going into FZJ. And if that is the case, Jupiter will probably be a variant of the BullSequana XH3000 that was previewed back in February.” These will support HDR 200Gbit/s and NDR 400Gbit/s InfiniBand from Nvidia and also Atos’s BXI v2 interconnect. Jupiter’s flash modules will have to hook up to XH300’s compute nodes with InfiniBand or the BXI interconnects will be used.

According to a cached Google search result, JSC uses several GPFS (Spectrum Scale) scale-out, parallel file systems and disk drive media for some user data. The data is backed up to tape using TSM-HSM (Tivoli Storage Manager – Hierarchical Storage Manager).

In March 2018, JSC tweeted about its 17-frame IBM TS4500 tape library with 20 LTO-8 drives installed and more than 20,000 LTO-7 M8 tape cartridges inside. This extends its 2 x Oracle SL8500 tape libraries by 180PB capacity for backing up and archiving HPC user data.

It looks as if the easy button for JSC’s parallel file system is Spectrum Scale, and the one for the backup/archive is Spectrum Protect and an IBM tape library, using modern LTO-9 tapes and drives. There doesn’t appear to be a pre-existing flash module though. We suspect boxes of NVMe SSDs will be used.

Storage news ticker – June 21

Storage news ticker
Library of Congress image.ights Advisory: No known restrictions on publication.

Spectra Logic has updated its Vail data management software, delivering access and placement of data across multi-site and multi-cloud storage: 

  • A fully qualified on-premises Glacier-like solution that enables any software with an S3 interface to have direct access to local tape or nearline object storage with the use of S3 Glacier-like commands that mimic AWS 
  • Simple S3 disk and S3 glacier tape to modernize data protection and backup applications  
  • Flexible implementation of either a global cloud-based object database for universal access or a local on-premises object database for optimized performance 
  • Support for AWS Object Lock using on-premises storage targets to protect data against ransomware and malicious activity 
  • Direct integration of any cloud services, including artificial intelligence and machine learning (AI/ML), with on-premises data using intelligent object synchronization 
  • Scheduled and instant data placement and orchestration using life cycle rules and staging interfaces 
  • Global object placement and visibility to optimize the performance, economics, and location of data

Startup Ahana has extended its August 2021 $20 million A-round by raising an additional $7.2 million. The latest round came from Liberty Global Ventures with participation from existing investor GV. Total funding is now $32 million. Ahana will use the funding to continue to grow its technical team and product development; evangelize the Presto community; and develop go-to-market programs.

Ahana also announced Ahana Cloud for Presto Community Edition, immediately available, including users of the 100,000+ downloads of Ahana’s PrestoDB Sandbox on DockerHub. It provides distributed Presto cluster provisioning and tuned out-of-the-box configurations, bringing Presto to data teams of all sizes for free. 

Canonical’s Ubuntu Core 22, the fully containerized Ubuntu 22.04 LTS variant optimized for IoT and edge devices, is now generally available for download. The release includes a fully pre-emptible kernel to ensure time-bound responses. The Ubuntu 22.04 LTS real-time kernel, now available in beta, delivers high performance, ultra-low latency, and workload predictability for time-sensitive industrial, telco, automotive, and robotics use cases, Canonical said.

DataStax has secured $115 million in private equity funding led by the Growth Equity business within Goldman Sachs, giving it a $1.6 billion valuation. DataStax’s valuation was $830 million in 2014. Total funding is now $343 million. It will use the capital to accelerate global expansion and for development of its Astra DB multi-cloud database and Astra Streaming streaming service, which are part of the company’s open data stack for building and running real-time applications on any cloud.

The Iguazio MLOps Platform and built-in Feature Store now offer connectivity to the Snowflake Data Cloud, with enterprise customers including Fortune 500 companies already using the joint solution. Iguazio claomed its MLOps platform can accelerate the data science process up to 12x and make more efficient use of AI resources, like GPUs, through better orchestration and automation. It now comes with a built-in Snowflake connector that powers the built-in online and offline feature store in Iguazio with data from Snowflake, allowing enterprises to access the Data Cloud to build, store, and share features that are ready for use in machine learning applications.

Kioxia has announced a 512GB product in its lineup of EXCERIA high-endurance microSD memory cards. The new product delivers sufficient performance and endurance for continuous high-resolution 4K video recordings of dashboard and surveillance cameras. It’s capable of up to 17,000 hours of cumulative use and up to 10 hours and 29 minutes of continuous 4K video recording. With a read speed of up to 100MB/sec and write speed of up to 85MB/sec, the new card supports the UHS Speed Class 3 (U3) and Video Speed Class 30 (V30) specifications, making it suitable for 4K video recording.

Cloud-based server fleet manager Nebulon has unveiled ImmutableBoot, a “reboot-to-recover” ransomware solution for bare-metal Linux. ImmutableBoot is built to protect application infrastructure from a ransomware attack or a misconfigured operating environment with a server reboot to a known, good operating system version. It allows operations teams to protect their application infrastructure with immutable, or “frozen,” server software, reverting infected or misconfigured operating systems and application configurations to a known, good version each time the server reboots.

Rockset, which supplies a real-time analytics platform for the cloud, today announced a new Snowflake connector for low-latency, high-concurrency analytics across streaming data from sources such as Apache Kafka, Amazon DynamoDB or MongoDB, and historical data from Snowflake. Rockset organizes data in a Converged Index, which is optimized for real-time data ingestion and low-latency analytical queries. Ingest rollups enable developers to pre-aggregate real-time data using SQL, without the need for complex real-time data pipelines. Customers can use Rockset for real-time model serving as part of the machine learning pipeline with Snowflake.

China-based SmartX has released CloudTower 2.0, a central management platform for multiple SmartX HCI product clusters across datacenters. This product update includes:

  • Cross-cluster VM migration
  • Content library
  • VM user view
  • Access control and security settings
  • Interface optimization

The SNIA’s Networking Storage Forum presents an xPU Accelerator Offload Functions webinar on June 29 at 11am PT. The presenters are Joseph White, Dell Technologies; John Kim, Nvidia; Mario Baldi, Pensando Systems; Yadong Li, Intel; and David McIntyre, Samsung. These new accelerators (xPUs) have multiple names such as SmartNIC, DPU, IPU, APU, NAPU. This second webcast in the series will take a deeper dive into the accelerator offload functions of the xPU, what problems xPUs are coming to solve, where in the system they live, and the functions they implement, focusing on:

  • Network Offloads
  • Security Offloads
  • Compute Offloads
  • Storage Offloads

Register here.

Synology has released the 4U, 60-bay HD6500 designed for super-sized storage needs. HD6500 offers petabyte-level storage with support for up to 960TB per chassis. It can be paired with up to four additional RX6022sas 60-bay expansion units for over 4PB of storage in 20U. HD6500 can deliver over 6,688MB/sec in sequential read, and 6,662MB/sec in sequential write. It’s a solution for scenarios requiring massive quantities of data such as large-scale storage for studios, video surveillance backup, and enterprise office PC backup. It has an MSRP of $16,999.99. More info here.

ReRAM developer Weebit Nano will publicly demonstrate its ReRAM IP module for the first time at the Leti Innovation Days event. It will show Weebit ReRAM functioning as a non-volatile memory block, being fed live images and retaining this data while powered off, then displaying the data separately. It will also show the speed of the ReRAM module, highlighting its faster write speed compared to typical flash memory technology. The module’s Direct Program/Erase capability and byte addressability contribute to its faster write throughput time compared to flash, which needs to access entire data sectors every time it erases/writes. The ReRAM IP module includes the ReRAM array, control logic, decoders, IOs (Input/Output communication elements), and error correcting code (ECC), as well as patent-pending analog and digital smart circuitry running algorithms which significantly enhance the memory array’s technical parameters.

Weebit Nano IP Module demo video.

… 

Veeam Backup for Google Cloud v3 is now available. v3 adds support for MySQL backup and recovery, featuring flexible policy-based protection and recovery options. It has snapshot, backup and archive options, and is designed to restore a chosen database to the original or a different location. It features:

  • Role-based access control
  • Configuration backup and restore
  • New overview dashboard
  • Full page wizard
  • Worker’s redesign
  • Health check for backup files

The latest version of Veeam Backup for Google Cloud is available here.

Pure well-placed for hyperscale growth

Pure Storage has a potential humungous hyperscale opportunity in front of it, with a startling 600PB footprint possibility at Meta, plus more at Azure using Equinix colo-resident Pure arrays for EDA work.

This comes from modelling by Wells Fargo analyst Aron Rakers who’s been poring over Pure numbers for his subscribers. We learnt about Pure’s adoption by Meta (Facebook as was) in January with a phase 1 total of 175PB of Flash Array and 10PB of FlashBlade storage capacity. Phase 2 is underway, and during this 2022 year Meta will grow its AI Research SuperCluster (RSC) storage to exabyte-level capacity.

Taken literally at 1EB that gives Pure an 875PB opportunity. Rakers views it differently: “We would estimate Pure’s potential capacity opportunity in the ~600PB range for going forward, which compares to the company estimated to have shipped ~1.86EB of total storage capacity in 2021, according to Gartner. Put simply, we think Pure has a much broader opportunity from the Meta AI RSC deployment looking forward.”

Pure also won a deal with Azure and Equinix back in November for FlashBlade unified file+object storage placed in Equinix co-location centers with a high-speed, low-latency  connection to Azure datacenters for EDA, HPC and other highly parallel workloads. That’s two hyperscaler customers for Pure.

Rakers thinks “there is an increasing potential that the company could see additional hyperscale opportunities.” He quotes Pure CEO Charles Giancarlo answering analyst questions in the Q4  2022 earnings call: “It’s a little bit difficult to predict the exact timing, but we [Pure] do have conversations ongoing. It’s certainly my expectation, but I can’t say that it’s near-term. But 12 months is a long-time … I hope to have an update there; it’s definitely of strategic importance to the company overall.”

After talking to various people Rakers thinks “Pure is engaged in discussions with a couple of the other large hyperscale cloud companies on potential future projects involving all-Flash storage deployments.” That’s two more.

Rakers suggests that hyperscaler customers could be finding that the scale-out requirements of AI infrastructure and high-performance computing workloads mean they have to use all-flash arrays instead of hybrid or all-disk arrays for large capacity primary data storage.

The other all-flash array suppliers will be wondering why exactly Pure was chosen instead of, for example, industry leaders Dell EMC and NetApp. Particularly NetApp, which has a good public cloud presence with Azure NetApp Files.

We can’t realistically expect Pure to scoop up every hyperscalers all-flash array opportunity. But, if these prospects in front of Pure turn out to be real and are of a similar size to Meta, then Pure’s hyperscale opportunity is, well, literally hyperscale in scope. 

CXL-led big memory taking over from age of SAN

Samsung CXL
Samsung CXL

CXL 2.0 could create external memory arrays pretty much in the same way Fiber Channel paved the way for external SAN arrays back in the mid-1990s. 

Charles Fan of MemVerge is excited about CXL
Charles Fan

The ability to dynamically compose servers with 10TB-plus memory pools will enable many more applications to run in memory and avoid external storage IO. Storage-class memory becomes the primary active data storage tier, with NAND and HDD being used for warm and inactive data, and tape for cold data.

That’s the view of Charles Fan, MemVerge CEO and co-founder, who spoke to Blocks & Files about how the CXL market is developing a year on from his initial briefing on the topic.

He says: “For people like us in the field, this is a major architectural shift. Maybe the biggest one for the last 10 years in this area. This could bring about a new industry, a new market of a memory fabric that can be shared across multiple servers.”

CXL is the Computer Express Link, the extension of the PCIe bus outside a server’s chassis, and is based on the PCIe 5.0 standard. CXL v1, released in March 2019 and based on PCIe 5.0, enables server CPUs to access shared memory on accelerator devices with a cache coherent protocol.

MemVerge software combines DRAM and Optane DIMM persistent memory into a single clustered storage pool for use by server applications with no code changes. In other words, the software already combines fast and slow memory.

B&F diagram of MemVerge big memory scheme
B&F diagram of MemVerge big memory scheme

CXL v1.1, which sets out how interoperability testing between the host processor and an attached CXL device can be performed, is supported by Intel’s Sapphire Rapids and AMD’s Genoa processors. CXL v2.0 adds support for CXL switching through which multiple CXL 2.0-connected host processors can use distributed shared memory and persistent (storage-class) memory.

A CXL 2.0 host will have its own directly connected DRAM and the ability to access external DRAM across the CXL 2.0 link. Such external DRAM access will be slower, by nanoseconds, than the local DRAM access, and system software will be needed to bridge this gap. (System software which MemVerge supplies, incidentally.) Fan says he thinks the availability of CXL 2.0 switches and external memory boxes could first appear as early as 2024. We’ll see prototypes much earlier, though.

Samsung CXL memory expansion box
Samsung CXL memory expansion box

MemVerge is partnering with composable systems supplier Liqid so that MemVerge-created DRAM and Optane memory pools can be dynamically assigned in whole or part to servers across today’s PCIe 3 and 4 buses. CXL 2.0 should bring in external memory pooling and its dynamic availability to servers; what composability software does.

Fan says: “With CXL, memory can become composable as well. And I think that’s highly synergistic to the cloud servicing model. And so they will have it and I think they will be among the first adopters of this technology.” 

Blocks & Files’ thinking is that the hyperscalers, including public cloud suppliers, are utterly dependent on CXL for memory pooling. And they’ve not got pre-existing technology that they can use to supply external pooled memory resources. So they either build it themselves, or look for suitable suppliers, of which there are very, very few. And here is MemVerge with what looks like ready-to-use software. 

For Fan, CXL 2.0 “is the best development in the macro industry for us in our short life of five years.”

His company will be helped by the rise of a CXL 2.0 ecosystem of CXL switch, expanders, memory card, and device suppliers. MemVerge’s software can already run in the public cloud.  SeekGene, a biotech research firm focusing on single-cell technology, has significantly reduced processing time and cost by using MemVerge Memory Machine running on AliCloud i4p compute instances. 

Fan says: “AliCloud was the first cloud service provider to deliver an Optane-enabled instance to their customers, and then our joint service lays on top of that, to allow encapsulation of the application, and use of our snapshot technology to allow rollback recovery.”

MemVerge will make its basic big memory software available in open source form to widen its adoption, and supply paid-for extensions such as snapshot and, possibly, checkpoint services.

External memory pooling example

Imagine a rack of 20 servers today, each with 2TB of memory. That’s 20 x 2TB memory chunks, 40TB, with any application limited to 2TB of memory. MemVerge’s software could be used to bulk up the memory address space in any one server to 3TB or so but each server’s DRAM slots are limited in number and once they are used up no more are available. CXL 2.0 removes that limitation.

Let’s now reimagine our rack of 20 servers, with each of them having, for example, 512GB of memory, and the rack housing a CXL 2.0-connected memory expander chassis with 30TB of DRAM. We still have the same total amount of DRAM as before, 40TB, but it is now distributed differently, with 20 x 5.12GB chunks, one for each server, and a 30TB shareable pool.

An in-memory application could consume up to 30.5TB of DRAM, 10 times more than before, radically increasing the working set of data it can address and reducing its storage IO. We could have three in-memory applications, each taking 10TB of the 30TB memory pool. The ability of such applications to execute faster will be significantly higher.

Fan says: “It lifts the upper limit, the ceiling, that you have to the application in terms of how much memory you can use, and you can dynamically provision it on demand. So that’s what I think is transformative.”

And it’s not just servers that could use this. In Fan’s view: “GPUs could also use a more scalable tier of memory.”

MemVerge memory-storage tiering ideas
MemVerge memory-storage tiering ideas

Freshly created DRAM content will still have to be persistent and writing 30TB of data to NAND will take appreciable time, but Optane or similar storage-class memory, such as ReRAM, could be used instead with much faster IO. The most active data will then be stored in SCM devices, with less active data going first to NAND and then disk and finally tape as it ages and its activity profile gets lower and lower.

Such CXL-connected SCM could be in the same or a separate chassis and be dynamically composable as well. We could envisage hyperscalers using such systems of tiered external DRAM and Optane to get their services running faster and capable of supporting more users with higher utilization.

Application design could change as well. Fan adds: “The application general logic is to use as much memory as you have. And storage is only used if you don’t have enough memory. With other data-intensive application, it will be moving the same way, including the database. I think the memory database is a general trend.

“For many of the ISPs I think having the infrastructure delivering a more limitless memory will impact their application design – for it to be more memory-centric. Which in turn reduces their reliance on storage.”

CXL 2.0, hyperscalers, and the public cloud

The public cloud suppliers could set up additional compute instance types with significantly higher memory capacity, and SCM capacity as well. Their high customer counts and scale would enable them to amortize the costs of buying the DRAM and SCM more effectively than ordinary enterprises, and get more utilization from their servers.

Fan thinks that current block-level storage device suppliers may start producing external memory and SCM devices, and so too, B&F thinks, could server manufacturers. After all, they already ship DRAM and SCM in their current server boxes. Converged Infrastructure systems could start having CXL-memory shelves and software added to them.

Fan is convinced that we are entering a big memory computing era, and that the impact of CXL 2.0 will be as profound as that of Fiber Channel 35 years ago. In the SAN era, Fan says: “Storage can be managed and scaled independently to compute.”

Now the same could be true of memory. We are moving from the age of the SAN to an era of big memory and things may never be the same again.

Resilvering

Resilvering – In the ZFS file system the process of moving data from one disk drive device in a ZFS storage pool (RAIDz poo)l to another device is known as resilvering. It is a disk data copy rebuild operation but ZFS calls it resilvering. It can be monitored by using the zpool status command. When a device is replaced due for example to failure, a resilvering operation is initiated to move the data from the good data copies on other devices to the new device.

The concept of resilvering refers to antique glass mirrors which were made with a layer of silver light reflecting material coating the back of the glass. When this decayed the mirror would appear streaky and tarnished and not work so well. Resilvering, replacing the silvered coating, restored the original clarity to the mirror.

RAIDz is the ZFS version of RAID. It is tightly bound to ZFS in that it does not have a fixed block size and works with ZFS’ copy-on-write technology.

Resilvering can take a lot of time. The OpenE website explains that, iIn a traditional RAID, where all blocks are regular, you take block 0 from each of the old drives, compute correct data for block 0 on the missing drive and write the data onto a new device. This process is then repeated for all blocks, even for the blocks that hold no data. This is because the traditional RAID does not know which blocks on the RAID are in use and which are not. If the array is otherwise idle, serving no user requests during the rebuild, the process is done sequentially from start to end, which is the fastest way to access rotational hard drives. 

ZFS uses variable-sized blocks. Therefore, for each recordsize wort of data, which can be anywhere from 4KB to 1MB, ZFS needs to consult the block pointer tree to see how data is laid out on disks. Because block pointer trees are often fragmented and files are often fragmented, there is quite a lot of head movement involved. Rotational hard drives perform much slower with a lot of head movement, so megabyte per second speed of the rebuild is slower than that of a traditional RAID. Now, ZFS only rebuilds the part of the array which is in use and it does not rebuild free space. Therefore, on lightly used pools it may actually complete faster than a traditional RAID. However, this advantage disappears as the pool fills up.

HeLC

HeLC – 7-bit hepta-layer cell (HeLC) NAND. See MLC.

HLC

HLC –  6-bit hexa-level cell flash. See QLC.

If you really want to transform your business, get AI to transform your infrastructure first

Sponsored Feature

AI isn’t magic. But applied correctly it can make IT infrastructure disappear. 

Not literally of course. But Ronak Chokshi, who leads product marketing for InfoSight at HPE, argues that when considering how to better manage their infrastructure, tech leaders need to consider what services like Uber or Google Maps have achieved.

The IT infrastructure behind the delivery of these services is immaterial to the rest of the world – except perhaps for frazzled tech leaders in other sectors who wonder how they could achieve similarly seamless operations.

“The consumers don’t really care how it works, as long as the service is available when needed, and it’s easy to manage,” he says.

Pushing infrastructure behind the scenes is the raison d’etre of the HPE InfoSight AIOps platform. Or, to put another way, says Chokshi, InfoSight worries about the infrastructure, so tech teams can be more application-centric.

“We want the IT teams to be a partner to the business, to the line of business stakeholders and application developers, in executing their digital transformation initiatives,” he explains.

That’s a stark contrast to the all too common picture of admins fretting over whether a given host is being overburdened with VMs, or crippled by too many read-write cycles.

It’s not that this information is unimportant. Rather it’s a question of how it’s gathered, and who – or what – is responsible for collating the data and gaining insight from it. And, most of all, taking positive action as a result.

From the customer’s point of view, explains Chokshi, “InfoSight becomes your single pane of glass for all insights, for any issues that come up, any metrics, any attributes, or any activity that you need to track in terms of IOs, read write, throughput, latencies, from storage all the way up to applications.” This includes servers, networking, and the virtualization layer.

It all starts with telemetry

More importantly though, the underlying system predicts problems as they arise, or even before, and takes appropriate action to prevent them.

The starting point for InfoSight is telemetry, which is pulled from every layer of the technology and application stack. Chokshi emphasizes that this refers to performance data from HPE’s devices, not production or customer data. “That’s IO read writes, throughput latencies, wait times, things of that nature.”

Telemetry itself potentially presents an IO and performance challenge. Badly implemented real time telemetry could impact performance. Spooling off data intermittently when systems are running quiet means the chance for real-time insight and remediation is lost.

“We actually instrument our systems very intelligently to send us specific kinds of telemetry data without performance degradation,” says Chokshi. This extends right down to the way HPE structures its storage operating system..

HPE InfoSight aggregates the telemetry data from across HPE’s global install base, together with information from HPE’s own (human-based) support operation.

“When there is an issue and our support personnel get a call from a customer, they troubleshoot it, and fix it… but when the fix is implemented, we don’t just stop there. That is where the real work begins. We actually create a signature pattern. It’s essentially a fingerprint for that issue, and we push it to our cloud.”

This provides a vast data pool against which InfoSight can apply AI and machine learning, which then powers support case automation.

As telemetry data from other devices across the installed base continues to stream into HPE, Chokshi continues, “we create signature patterns for issues that might come up from those individual systems.”

When the data coming from a customer matches an established signature pattern within a specific environment, InfoSight will push out a “wellness” alert that appears on the customer’s dashboard. At the same time, a support case is opened.

Along with alerting customers, InfoSight will also take proactive actions, tuned to customers’ individual environments. For example, if it detects that a storage OS update could result in a conflict or incompatibility with the VM platform a customer is running, it will halt or skip the upgrade.

Less time solving storage problems

The potential impact should be pretty obvious to anyone who’s had to troubleshoot an underperforming system, or a mysterious failure, which could be down to storage…but might not be.

Research by ESG shows that across HPE’s Nimble Storage installed base, HPE InfoSight lowered IT operational expenses by 79 percent, while staffers spent 85 percent less time resolving storage-related tickets. An IDC survey also showed that more than 90 percent of the problems resolved lay above of storage. So, just taking storage as a starting point, InfoSight can have a dramatic impact right up the infrastructure stack.

At the same time, InfoSight has been  extended to encompass the software layer, with the launch of App Insights last year. As Chokshi says, it’s often a little too easy for application administrators to shift problems to storage administrators, saying “hey, looks like your storage device is not behaving properly.”

App Insights creates a topology view of the entire stack and produces alerts and predictions of problems at every layer. So, when an app admin suggests that their app performance is being degraded by a storage problem, Chokshi explains, “The storage admin pretty much almost instantly would have a response to that question saying they can look up App Insights dashboard.”

So, the admin can identify, for example, whether a drive has failed, or alternatively that a host is running too many VMs, “and that’s slowing your applications down.”

For a mega scale example of how InfoSight can render infrastructure invisible,  look no further than HPE’s Greenlake edge to cloud platform, which combines on-prem infrastructure management and deployment with management and further services in the cloud.

For example, HPE has recently begun offering HPE GreenLake for Block Storage. Traditionally, deploying block storage for mission- or business-critical systems meant working out multiple parameters, says  Chokshi. “How much capacity? How much performance do you need from storage? How many applications do you plan to run, etc, etc..”

With the new block service, admins just need to set three or four parameters, including whether the app is mission-critical or business-critical and choosing an SLA.

“And you provision that, and that’s all done through the cloud. And it essentially makes the block storage available to you. Behind the scenes, HPE InfoSight powers that experience from enabling the cloud operation experience and ensuring that systems and apps don’t go down. It predicts failures, and prevents them from occurring.”

Greenlake expansion on the way

Over the course of this year, InfoSight will be extended to more and more HPE Greenlake services. This is a big deal because what was originally brought to market for storage, then servers, is now being integrated with nearly every HPE product that is provisioned through HPE Greenlake

At the same time, HPE will extend the InfoSight-powered support automation it has long offered on its Nimble Storage, which sees customers bypassing level 1 and 2 technicians, and being put straight through to level 3 support. “Because by the time you call, we already know the basics of the issue and we already know your environment. We don’t have to ask you questions. We don’t have to ask for logs, we don’t have to ask for any sort of data. We actually already have it through the telemetry data.”

So is this as good as it gets? No, it will actually get better in the future, argues Chokshi, because as InfoSight is rolled out to more services and products, and to more customers, it will be accessing ever more telemetry and analyzing ever more customer contexts.

“To actually get the advantages of AIOps, you need large sets of relevant data,” he says. “And you need to let time go by because AI is not a one and done. It improves over time.”

Sponsored by HPE.

Boffins design NAND and persistent memory in single SSD reference architecture

Academic researchers have designed a reference architecture SSD, with both NAND and persistent memory inside, for use in database logging and replication.

Such logging is desired for database recoverability and high availability. It requires an update to a local log file and also to one at a remote site for every transaction. If the primary database site goes down, the secondary one can serve as hot backup Persistent Memory, such as Optane or battery-backed DTAM. RDMA transfers are utilised by modern databases to accomplish this logging as fast as possible. The eight authors of a paper published in SIGMOD ’22, “X-SSD: A storage system with native support for database logging and replication”, think the current method is not portable, has a complex data path and low interoperability. 

“To address these issues, this paper introduces the X-SSD, a new SSD architecture that mixes NAND Flash and PM memory classes. A X-SSD device can take transaction log writes on a fast, PM-backed data path and be responsible for propagating the operation to remote sites and eventually to NAND Flash storage.” In other words the PM and fast transfer operations are sent down stack to a specially designed SSD, called Villars, where they “offer a more straightforward and robust way to manage PM on behalf of the database and achieve equally fast results.”

The authors note: “Every DIMM slot used for PM is not used for DRAM. This forces the system designers to choose between DRAM or PM capacity. Optane and battery-backed DRAM require specific server support and cannot be ported across servers without certain characteristics. Optane, in particular, is not supported on AMD platforms.”

Their X-SSD design removes this limitation: “Moving PM into a X-SSD device frees DIMM slots for DRAM and restores the ability to deploy PM on vendor-independent server-class machines without special-purpose DIMM slots or battery-backing features. One can then use PM on an AMD server simply by plugging in this new NVMe device.” That’s a neat by-product of their ideas.

Because it is basically an NVMe device, two specific NVMe protocol features are used to craft the drive. Firstly a Controller Memory Buffer (CMB) exposes an internal memory area to applications via memory mapping. The X-SSD uses CMB to expose a second, byte-addressable data path, in addition to the conventional SSD block-based one.

A second NVMe feature, Persistent Memory Region (PMR), can expose another memory area and assumes the device persists the operations against that area. It has additional configuration options compared to the CMB memory area. The X-SSD also uses Non-Transparent Bridging (NTB) which supports interconnecting different hosts’ systems.

The X-SSD has a conventional side – using NAND – for normal block data transfer, and a fast side – using some form of persistent memory – for byte-level logging data transfers. Fast side writes can be acknowledged to the database application before being destaged to NAND.

The conclusion of the 15-page paper is that the authors “showed that the Villars device can absorb transaction log workloads from a modern database with several advantages – simpler interface, comparable latency, and clearer crash behavior semantics – over having the database directly manipulate PM (Persistent Memory).”

X-SSD paper authors.

One of the paper’s authors is Yong Ho Song of Hanyang University & Samsung Electronics, which caused to us wonder if Samsung could actually produce an X-SSD. We think not, at least in the short-term, as there would be no speed advantage and low-level database code would need rewriting. Both attributes suggest that mass-production of an X-SSD is unlikely.

It’s an interesting idea though, and other scientists/engineers may use it as a base to do interesting things with SSD design.

Quantum: LTO tape roadmap falls short

Quantum CEO and president Jamie Lerner has suggested that the Linear Tape Open (LTO) consortium was not ambitious enough for tape and its standards were insufficient for hyperscalers.

Jamie Lerner, Quantum
Jamie Lerner

The Scalar brand tape libraries is part of Quantum’s overall file life cycle management product portfolio. It has enjoyed success in the hyperscaler market for archival storage, counting all of the top five hyperscalers as customers, 17 of the top US intelligence agencies, has more than 40EB of capacity deployed globally, and 3 million-plus LTO tapes under management. 

Lerner told B&F at an IT press tour meeting that “hyperscalers don’t care about standards” and that they treat tape as a “big fat disk drive… Tape is like a big slow hard drive.”

He said “the products we built for enterprise do not work in hyperscale clouds,” and products had to be redesigned for hyperscalers: “Basically we redesigned everything we know about tape for these hyperscalers.”

Quantum i6H hyperscaler-type tape library
Quantum i6H hyperscaler-type tape library

What the hyperscalers desire is online access to large capacity tape drives so that they have to fetch them from a tape library’s shelves less often. That means they want higher-capacity tapes and the tape library’s mechanical components to cope with high-intensity use.

Getting higher-capacity tape drives is a problem. Lerner told us the LTO roadmap is falling behind disk capacities. The industry is on the LTO-9 level, with 18TB raw tapes, yet HDD supplier Western Digital is already shipping 20TB HDD with 22TB and 26TB SMR drives announced. Seagate is sample shipping 20TB+ drives and Toshiba has 26TB drives coming.

The three suppliers’ HDD roadmaps extend out 30, 40, 50TB drives, and beyond. WD is even suggesting it could create a 50TB archive disk drive. If it proceeds with this idea, the gen 2 and 3 versions would have even larger capacities.

Lerner told us that the hyperscalers are such large buyers that they can specify hyperscaler-specific designs which deliver the requisite speeds, feeds, capacities, reliability, and power consumption. Standards such as LTO don’t matter.

The LTO roadmap won’t catch up with disk until gen 11 at up to 76TB, if that arrives within four or five years. We note that LTO stopped its capacity doubling with LTO-9, which was originally expected to have a 24TB capacity but was scaled back in September 2020. This doesn’t create confidence in its ability to deliver on its future roadmap.

How could tape capacity be driven higher? One method suggested by Lerner is to increase the tape’s tension so that it can be wound tighter and take up less space inside its cartridge. This would enable the tape length to be increased and its capacity would jump in proportion.

Another option could be to move to disk drive head technology which reads and writes data in narrower tracks than tape drives. This could be accompanied by streaming the tape across a flat bed as it passes under the head so that its motion is steadier and smoother.

Quantum is constrained by there being a single tape drive supplier: IBM. It was a pity that, looking back, Quantum stopped making its own SDLT format drives. The lack of competition in tape drive manufacturing is a bad thing and is certainly not helping Quantum.

Veritas takes autonomous data management moonshot

Veritas is attempting to transform its core NetBackup product into an autonomous data management service operating across public clouds, on-premises datacenters, and edge sites. The company is spending a significant amount of engineering time and money doing this.

The key themes are autonomy and as-a-service. The whole project started two years ago, and the first public inkling of this came with the release of NetBackup 10 (NBU 10) and an IT press tour at Veritas’s Santa Clara HQ this month.

Veritas also promoted internal execs and recruited new ones to bring its revamped software to the market. For example, Lissa Hollinger was promoted to SVP and CMO in February 2022, from being a VP running product and solutions marketing.

The org recruited ex-Accenture consultant Lawrence Wong to be an SVP and its chief strategy officer in January. He said Veritas had looked around at other companies with potentially appropriate technology, but decided it was better to take its NetBackup product and customer base and re-engineer the product. Then the customers could use what they are familiar with and trust, but as-a-service and with wholly new functionality to autonomously protect and manage their data.

Wong works closely with Doug Matthews, SVP for product management. Matthews told us that the engineering organization had closed down many extraneous projects to concentrate resources on the core products and their transformation.

It was a conscious decision not to follow the Commvault route and set up a separate division like Metallic for the whole technology. It did buy HubStor in January 2021 to get a SaaS-based data protection development team – an “acqui-hire.”

The magnitude of the task can be seen by Veritas receiving additional engineering and market funding from private equity owner Carlyle. It bought Veritas for $7.4 billion in 2015 and now it has pumped in more money – an unrevealed amount. 

Wong told event attendees 87 percent of the Fortune 500 uses Veritas – 435 companies – and it has more than 80,000 customers in total. He said Veritas was number one in data protection, according to Gartner calculations, having a 15 percent enterprise backup market share. This differs from IDC calculations. Veeam says it is tied number one in data replication and protection revenue market share according to IDC, with $647.17 million in revenues (11.7 percent) alongside Dell with its $665.46 million (12 percent). Veritas is in the number three slot with $541.47 million (9.8 percent).

He said Veritas had a four-year no data loss record in the event of ransomware attacks, which was news to us and suggests an under-marketed product strength. The NBU roadmap includes integration with early warning systems and automated clean recovery at scale.

Veritas’s strategy comprises developing autonomy features, adding virtual air-gapped protection in the cloud, and AI-based ransomware resilience. It wants to deliver protection, availability, and compliance within and across hybrid, private, and public clouds with subscription and as-a-service business models.

Automation and autonomy

Veritas believes that automating data protection procedures is necessary, but not enough. The data protection product has to gain autonomy as well.  There’s a difference between automation and autonomy – a washing machine is automated, a driverless car is autonomous.

Veritas slide

Veritas declares it will eliminate the burden of human intervention from data management, but not oversight. Data management and protection should just happen, invisibly and autonomously, but without sacrificing overarching human control.

Autonomy combines automation with AI and machine learning so that the data protection system can adapt to changed circumstances and respond instantly. Autonomous Data Management will provide ransom-free recoveries – at any scale. It can actively defend against threats. 

Veritas slide

Matthews said: “We’re going to build a data lake of metadata to understand how people are protecting their data.” Veritas is building its own analytic routines for such data lakes, and will provide this as a service for MSPs to sell. 

Wong chipped in: “This autonomousness will spread outside data protection, to secondary data management and archiving.” The autonomy features will come in a future release of NetBackup, building on the foundation set by NBU 10.

Veritas declares that multicloud autonomous data management will independently find and protect data no matter where it lives. It will continuously determine where and how to store it in the most efficient and secure way possible – all without human involvement. 

Veritas slide

Veritas is aiming to achieve annual revenue growth of 8 to >10 percent by 2026. This is a big ask. It is building a dedicated cloud specialist enabling team to be embedded with CSPs, partner system integrators, and managed service providers, which will help it raise sales.

Carbon reduction

Veritas claims using NetBackup can lower a customer’s carbon footprint. It quotes a US Grid Emission Factor and US Data Center Energy Usage report saying the energy equivalency of storing 1PB of unoptimized data in the cloud for one year can create 3.5 metric tonnes of CO2. NBU can reduce this with data reduction, leading to a lower network load and cloud footprint plus elastic cloud compute resource utilization.

The data reduction process will deduplicate global metadata in the cloud across customers, but it will not dedupe data across customers; the customer data boundary is sovereign.

Comment

Whatever calculations Gartner and IDC make, Veritas is a major data protection player and it is competing against Veeam, Druva, HYCU, Cohesity, Commvault, and Rubrik. As the amount of unstructured data grows, and as new workloads come into being – such as Kubernetes-orchestrated and edge applications – Veritas must stay relevant and is convinced it can grow.

The ambitious retooling of NetBackup is a huge engineering effort – Veritas’s moonshot if you will. If it can produce the autonomy goods in a post-v10 NBU release, then its enterprise customers should be well pleased and see no need to try competing products.

Matthews said the autonomy vision was conceived two years ago. “We don’t think Rubrik and Cohesity have autonomy on their roadmaps.”

Wong is bullish about the competition. He suggested that data protection startups with unfortified balance sheets could face consolidation if economic times get hard.