Back up a minute: Backblaze on SMR, storage tiers, and Web3

By

-

December 13, 2022

Blocks and Files talked to Gleb Budman, co-founder and CEO of cloud storage supplier Backblaze, well known for providing helpful reports detailing reliability statistics on the drives used in its infrastructure. We talked to Budman about cold storage tiers, region expansion, Wasabi competition, decentralized Web3 storage and shingled magnetic media drives.

His answers may surprise you as this cloud storage supplier appears to be sticking to its single tier, non-SMR knitting with a firm grasp.

Blocks & Files: Would Backblaze consider providing colder storage backup – for example, using tape media? If it can devise disk-based pods, then perhaps it could devise tape-based ones too.

Gleb Budman: [O]ur products are architected to offer a single tier of hot storage in B2 Cloud Storage, and automatic, unlimited backup through Backblaze Computer Backup. Adding tiers adds complexity, price, and forces customers to make decisions about what to keep readily accessible. At 1/5th the price of most hot cloud storage providers, we let customers keep and use everything they store at cold-storage pricing. Additional tiers of storage is not something we are considering at this time. Although publishing “Tape Stats” would be a nice addition to the blog…

Blocks & Files: How does Backblaze consider and respond to competition from Wasabi for cloud and backup storage?

Gleb Budman: Over 15 years, we’ve grown the Backblaze Storage Cloud to offer performant, reliable, trusted storage solutions to over 500,000 customers. We essentially bootstrapped our way to going public – we only took some $3 million in outside funding up until 2021 – which makes us extremely efficient, built for the long haul, and affordable for our customers. We celebrate everyone who is working to bring the benefits of cloud solutions to more businesses and individuals, but we don’t ever want to surprise customers with delete fees driven by storage duration requirements, rapid access fees, or other unpredictable expenses.

We think our long track record of building zettabyte-scale infrastructure, serving half a million customers, and doing so with radical transparency all along the way is valuable to existing and future customers, and that is value that’s hard to replicate.

Blocks & Files: How does Backblaze consider the idea of expanding its regions to cover more of the world’s geography?

Gleb Budman: We recently announced that Backblaze expanded its storage cloud footprint with a new data region in the Eastern United States. Our new US East region provides more location choice overall, particularly for businesses that wish to easily store replicated datasets to two or more cloud locations. Backblaze also announced it provides free data egress for Cloud Replication across the Backblaze platform, making its cost-effective object storage even more compelling. We serve customers in more than 175 countries – rolling out datacenters and regions at exabyte scale provides them better value and better serves the use cases we enable.

Blocks & Files: What is your view of the distributed decentralized storage Web3 ideas? How would you describe the pros and cons of such storage vs Backblaze?

Gleb Budman: We actually considered building a peer-to-peer backup service as our original idea 15 years ago. Since our goal is to make it astonishingly easy to store, protect, and use data, we decided not to because ultimately it would be too complex for most users to adopt. There still doesn’t seem to be much customer demand to store data on decentralized platforms.

Blocks & Files: What is Backblaze’s view on the advantages and disadvantages of shingled magnetic media recording (SMR) disk drives? Could we see them having a role in Backblaze’s pods?

Gleb Budman: We do not use SMR drives. We experimented with them a couple of years ago and while they can be a little less expensive upfront, in practice they did not meet our operational, cost, and performance metrics. SMR drives are best suited for data archive purposes, while application storage, media and entertainment, and backup customers need to be able to use their data. Also, we think customers should be able to use their data as they want to, including deleting it without having to worry about the delete fees driven by the use of SMR drives.

Bootnote

Backblaze has just added a new data region in the Eastern United States, bringing increased access speed for customers based there and greater geographic separation for backup data stored by businesses operating in other regions. It also helps provide an additional destination for businesses seeking to store copies of their datasets to two or more cloud locations for compliance and continuity needs. A Cloud Replication service is provided with free data egress, enabling customers to copy data across the Backblaze platform at no expense.

Azure Storage Mover competes with partners

By

Chris Mellor

-

December 12, 2022

Microsoft has set up a service to move NFS file data into Azure, competing with its partners Data Dynamics and Komprise, and also Datadobi and WANdisco.

Update: Komprise, WANdisco and Datadobi comments added; 13 and 14 December.

Back in February, Microsoft announced a free inward migration service to bring file data to Azure through data migration partners Data Dynamics and Komprise. Data Dynamics used its StorageX product while Komprise supplies its Elastic Data Migration (EDM). This program was a complement to the existing Azure Migrate portfolio for customers with less than 50TB of data to migrate and included tools such as AzCopy, rsync, or Azure Storage Explorer.

Fast forward 10 months and Jurgen Willis, Microsoft VP for Azure Optimized Workloads and Storage, writes: “Today, we are adding another choice for file migration with the preview launch of Azure Storage Mover, which is a fully managed, hybrid migration service that makes migrating files and folders into Azure a breeze.”

Yes, a breeze that could blow away its existing partners.

Azure Storage Mover focuses on the migration of an on-premises network file system (NFS) share to an Azure blob container. It will support many additional sources and Azure target combinations over the coming months.

Admins can plan the migration, whether it be one-time or repetitive, then start and track it through the Azure portal, PowerShell or CLI. Agent VMs are deployed in the source NFS share or shares system and data is migrated directly to the target Azure region. Willis says this “is key for migrations that happen across geographically diverse branch offices that will likely target Azure Storage in their region.”

All of a customer’s data migration projects can be managed from a single dashboard.

The Storage Mover roadmap includes provisioning target storage automatically based on the customer’s migration plan, and running post-migration tasks such as data validation, enabling data protection, and completing migration of the rest of the workload, etc. Another item on thre roadmap is automatically loading possible sources into the service. Willis writes: “That’s more than just convenience; it enables large-scale migrations and reduces mistakes from manual input.”

*Single Azure Storage Mover dashboard for all of a customer’s file data migration projects*

Data Dynamics

Asked about Storage Mover, Data Dynamics CEO Piyush Mehta told us: “Microsoft is providing a ‘first party’ service with basic functionality to support customers looking to migrate to Azure. The ability to move smaller workloads that may not have complexity or require discovery of the source, understanding of the permissions nor any other attributes are valuable for SMB clients that simply want the physics of moving data from A to B.”

But Data Dynamics focuses on larger customer, with Mehta saying: “Enterprise customers will require a more robust solution that addresses the scale and complexity of their environments and that’s where solutions such as ours would be brought in, meeting the needs of multiple sites with petabytes of data.”

In Mehta’s view: “When you look at the Data Dynamics Software Platform, enterprises are leveraging a single software [product] that enables storage optimization and lifecycle management, accelerates the journey to public hyperscaler, and meets data compliance, governance and security needs. This holistic approach goes beyond addressing a single use case of data migrations, rather providing an ability to leverage a single software [product] to address the needs of the CIO, CDO, CISO and lines of business.”

Komprise and WANdisco

Komprise co-founder, President and COO Krishna Subramanian tells us: “Cloud data migration is important and moves like this are further validation that the market is maturing. We partner closely with Microsoft to drive success of the Azure Migrate Program. We welcome best-of-breed point solutions and our approach is to incorporate these when appropriate to provide the best end-to-end, storage agnostic, standards-based data management and migration solution.”

“Just as we already leverage data transfer technologies like NetApp SnapMirror and provide a holistic data migration solution, we are looking at similarly leveraging Azure Storage Mover. … we will continue to partner closely with Microsoft to deliver the simplicity and efficiency of analytics-driven data migration and data management.”

A spokesperson from WANdisco said: “WANdisco’s capabilities and products are designed for significant, at-scale, ‘any cloud’ migrations and do not serve the same target customer base or customer need as ASM, which has been in preview since mid-September.”

Datadobi

Steve Leeper, VP of Product Marketing at Datadobi, told us: “It appears that Azure Storage Mover is a new managed file migration service that, in its current “preview” incarnation targets the migration of NFS datasets from on-premises storage into Azure Blob which can then be accessed by HPC (High Performance Computing) or other applications that depend on Azure Blob object storage.

“[Datadobi] StorageMAP’s data migration capabilities are very mature and provide several benefits ranging from API integration with the systems, UI/graphical and/or REST-API based migration creation, detailed monitoring and error handling workflows, highly performant scan and copy operations, a dedicated cutover module, and more. Its underlying data mobility engine has millions of hours of run time in some of the largest and most demanding environments that can be found. Any technology introduction that can make a customer’s life easier is something Datadobi will always welcome but we are also quite sure of our capabilities in this space which is the result of many years of intense R&D combined with real-world experience.”

Backblaze predicts HDD prices will fall to 1¢/GB by mid-2025

By

Chris Mellor

-

December 12, 2022

Disk drive price falls in $/GB terms are continuing at a slower rate than before but are set to reach one US cent per GB, according to Backblaze.

This is shown in a blog by Andy Klein, an evangelist at Backblaze with the very modern title of Principle Storage Cloud Storyteller.

He says Backblaze was buying hard disk drives (HDDs) at more than $0.11 per GB in 2009 when it launched. By 2017 that had fallen to just under $0.03/GP with 8TB drives, and this year it is at $0.014/GB with 16TB drives. To date, Backblaze has bought around 265,300 HDDs. The vendor says it pays more than the lowest street price to get assured delivery times for the large numbers of drives required for its cloud storage business.

Klein has charted the declines by HDD capacity point since 2009:

The overall downward trend has been interrupted several times. His chart shows 2TB drives costing more than 1.5TB drives for several months in 2009-2010. The same happened with 4TB and 3TB drives in late 2011 and 2012-2013, also with 6TB and 4TB drives in 2014-2015. Backblaze paid more for 8TB drives in 2016 than both 6TB and 4TB drives.

The 2011-2013 price discontinuity “was due primarily to the Thailand drive crisis which began in the second half of 2011 and continued to affect the market into 2013 before things got back to normal,” said Klein.

He says there has been an 87.4 percent $/GB decrease since 2009, from $0.114 to just over $0.014.

HDD prices for Backblaze decreased 9 percent per year from 2017 to November 2022 taking into account the four drive sizes it bought – 8TB, 12TB, 14TB and 16TB – and he has charted these specific declines:

The total $/GB decline over this period is 56.36 percent. This chart and the preceding one shows that the steepness of the price decline curve is lessening.

Typically a new drive capacity point costs more per GB at the start of a purchase run than the preceding drive’s capacity point at that time. The chart shows this with the 12 to 14TB transition and the 14 to 16TB transition. Klein also noted: “The cost per gigabyte of a drive will fall on average about 0.5 percent per month over time, slowly at first, then accelerating for some period before bottoming out. In nearly every case, the cost per gigabyte of each new drive size introduced will eventually fall below that of its predecessor.”

He said he thinks “the next milestone we can see is $0.01 per gigabyte for a hard drive” as a stable street price, and suggested: “Let’s go out on a limb and say that we will reach that in mid-2025 with 22TB or 24TB drives. That would mean you could buy a 22TB drive at Costco or on Amazon for about $220, or a 24TB for $240.”

Klein doesn’t know how low the $/GB number can go. If HDD manufacturers can continue adding capacity per drive and, if the technology needed does not cost more per GB than the technology it displaces, then price declines per GB could continue for several years more.

Catalog demos massively parallel DNA storage search

By

Chris Mellor

-

December 12, 2022

Storage pioneer Catalog Technologies says it has made a historic breakthrough by demonstrating massively parallel search of data stored in DNA.

Update: Catalog explained how computing capabilities can increase the efficiency and cost-effectiveness of reading data back from DNA by orders of magnitude. December 19, 2022.

Catalog is developing DNA storage technologies based on encoding data in sections of synthetically produced DNA molecule groups rather than the slower method of encoding it in DNA molecules directly. The writing and reading of data will be done potentially by using lab-on-a-chip sequencing technology and Catalog is partnering with Seagate to develop this capability.

Hyunjun Park, Catalog founder and CEO, said in a statement: “This historic and transformational achievement is based on years of work with partners and collaborators that helped make DNA-based computation a reality.”

Catalog encoded approximately 17,000 words (from Shakespeare’s Hamlet) into DNA in a few minutes on its Shannon DNA writing system. No pre-processing or DNA-based indexing of this data was carried out. It then ran a keyword search on this stored data and retrieved all occurrences of the query word.

It says the number of steps required in this search of the DNA-stored data would be approximately the same if the dataset had 170,000 or 170 million words instead of 17,000. This is due to the chemical processes (DNA storage sample rehydration and sequencing) involved being inherently massively parallel.

Catalog says it is on track to demonstrate this search scalability on data sets containing over 100 million words by mid-2023.

Park said: “With the advantages of DNA-based data storage and computation demonstrated, we now turn our attention to addressing more sophisticated applications from signal processing to machine learning over massive datasets. In parallel, we are working closely with partners and collaborators to reduce the size and complexity of our platform and to identify specific workloads to target commercial offerings.”

But has Catalog actually demonstrated DNA-based computation? We could say that calling this computation is like saying a semiconductor chip that can only add is a processor. Catalog has demonstrated one specific aspect of a computation use case; keyword search. It is an amazing scientific achievement but do not start building end-of-life plans for tape archives or Ocient hyperscale data analysis systems just yet.

Catalog also says it has demonstrated how computing capabilities can increase the efficiency and cost-effectiveness of reading data back from DNA by orders of magnitude. Catalog explained how in a mailed message to us: “By computing chemically we were able to reduce the amount of data to be read by a sequencer to just the targeted search term. That is, the only DNA presented to the sequencer was, more or less, just the resultant DNA file from the chemical search. This netted a two orders of magnitude speed up in “reading” courtesy of avoiding having to read 99 percent of the DNA encoded data.”

“This contrasts with using a sequencer to read all the data and then using conventional computing to decode and ascertain the content of the search. Tests have shown that we would expect this result generally, without regard to the amount of data being searched. Thus, any amount of data being searched will come down to a “cost” of about 1 percent of the total; this is about two orders of magnitude improvement in a data file of arbitrary size.”

The data storage search area is receiving attention from AI-based semantic searchers like Nuclia. Catalog’s DNA storage and search technology relies, at the moment, on the massive potential capacity of DNA storage giving it physical space and cost advantages that tape, disk and SSD cannot match. If it can search exabytes, even zettabytes, of data in a massively parallel fashion then the technology has legs.

Bootnote

This is quite slow writing IO in IT storage terms, as writing 17,000 x 5-letter words in three minutes equates to a write rate of 472 bytes/sec.

Storage news ticker – 9 December

By

Chris Mellor

-

December 9, 2022

An Arcserve survey found that 40% of UK-based respondents are not re-evaluating and updating disaster recovery plans and ransomware defence as the workforce moves to a remote work model and the increased use of mobile devices on the data perimeter. As cyber criminals focus their attacks more on remote workers, it is a business imperative that organisations regularly assess changes in IT and work environments and bolster ransomware defences and update disaster recovery plans accordingly.

…

Data protector and manager Cohesity has appointed James Blake as Field Chief Information Security Officer (CISO) in EMEA). His role involves consulting on how organisations can bring the worlds of data security and management together and build a cyber resilience and recovery strategy orchestrated by a single data management platform.

…

GPU-powered SupremeRAID HW supplier GRAID and composable disaggregated infrastructure (CDI) systems vendor Liqid are linking up to focus on providing data protection along with I/O bottleneck elimination for customers deploying Liqid Honey Badger NVMe flash devices and CDI solutions. Sumit Puri, CEO & Cofounder of Liqid, said: “While the Honey Badger is already the world’s fastest SSD at 4M IOPS, when paired with SupremeRAID SR-1010 it delivers more than 5M IOPS with solid RAID 6 protection. SupremeRAID works brilliantly with the Honey Badger in both composed and non-composed deployments.”

…

High-end array supplier Infinidat has grown its business delivering enterprise storage systems to cloud service providers (CSPs), managed service providers (MSPs) and managed hosting providers (MHPs) to 31% of its customer base. Infinidat says it’s enabling CSPs, MSPs and MHPs to compete effectively in a highly competitive marketplace that includes the world’s largest cloud providers and telcos.

…

Phison is heading for the moon with NASA. It said its 8TB M.2 2280 SSD solution has completed flight qualification tests required for Lonestar Data Holdings’ historic first lunar data center mission, gaining NASA Technology Readiness Level 6 (TRL-6) certification. The SSD has been selected by Lonestar’s contractor and Phison’s partner, space logistics company Skycorp. NASA’s lunar mission is scheduled for the second half of 2023. Lonestar, which is launching a series of data centers to the lunar surface to provide off-site archival and edge processing services, is sending the first data center to the Moon as a payload on Intuitive Machines’ NOVA-C lander under NASA’s Commercial Lunar Payload Services (CLPS) program.

…

Open-source in-memory database supplier Redis has changed its CEO. Orginal CEO and co-founder Ofer Bengal has moved to board chairman and incomer Rowan Trollope is the new CEO. This is part of a long-planned succession process and will be effective February 1, 2023. Trollope joins Redis from Five9 where he served as CEO for more than four years. Five9 provides cloud contact center software for the enterprise. Before that he was SVP and General Manager of Cisco’s Applications Group and previously Symantec. Trollope is also a mountaineer.

…

Object and file storage supplier Scality’s CMO, Paul Speciale, has issued predictions for 2023. He says:

Security will dominate IT buying criteria, including for data storage.
Unstructured search gets smart with multi-cloud capabilities and rich feature sets.
Malicious software supply chain attacks will slow open-source adoption.
As recession concerns loom, green storage innovation will rise in importance.
Tighter integration of managed cloud services and object storage will emerge.

We think the green issues will affect supplier’s development plans in 2023. Read Speciale’s blog here.

…

The Storage Networking Industry Association (SNIA) has announced its 2022-2023 Board of Directors and Technical Council members as it celebrates 25 years in the storage industry focused on advancing storage and information technology. SNIA now has over 180 industry organizations, 2,500+ active members, and more than 50,000 IT end user and storage professionals around the world. The top level members are:-

Board of Directors Executive Committee:

Chair: Dr. J Metz, AMD
Vice Chair: Richelle Ahlvers, Intel Corporation
Secretary: Chris Lionetti, Hewlett Packard Enterprise
Treasurer: Sue Amarin
Member: Scott Shadley, Solidigm Technology
Chair Emeritus: Wayne Adams, Industry Consultant

Board Members:

Peter Corbett, Dell Technologies
John Geldman, KIOXIA Corporation
Roger Hathorn, IBM
Jonathan Hinkle, Micron
Dave Landsman, Western Digital, Inc.
Chris Lueth, NetApp
David McIntyre, Samsung Corporation

Technical Council

Co-Chair: Mark Carlson, KIOXIA Corporation
Co-Chair: Bill Martin, Samsung Corporation

Technical Council Members:

Curtis Ballard, Hewlett Packard Enterprise
Stephen Bates, Eideticom
Alan Bumgarner, Solidigm Technology
Anthony Constantine, Intel Corporation
Shyam Iyer, Dell Technologies
Glenn Jaquette, IBM
Fred Knight, NetApp
Dave Peterson, Broadcom, Inc.
Leah Schoeb, AMD

In 2023, SNIA will focus its technical work on the following areas:

Smart Data Accelerator Interface (SDXI)
Computational storage will continue with expansion of the Computational Storage API and the Computational Storage Architecture and Programming Model toward new released versions
DNA data storage, through the DNA Data Storage Alliance Technology Affiliate
SNIA Swordfish will continue to develop enhancements
Continued work on the SNIA Emerald program and Storage Device Level Power Efficiency Measurement (SDLPEM) activities

…

Cloud storage provider Wasabi has had a funding top up. The $250 million Series D-round announced in September has had its $125 million equity component increased by $15 million with contributions from three parties: Azura, a fund controlled by SIS International holdings, and existing investors including Prosperity7 Ventures, and Aramco Ventures fund. Along with its existing debt facility, the company has now raised over $500 million to date, most recently on a $1.1 billion valuation. Wasabi has 40,000+ customers, 250+ global employees, 13 storage regions across North America, Europe, and Asia Pacific, and 14,000 partners, including backup, disaster recovery, and surveillance companies. It said it is preparing for years of investment in EMEA.

…

VAST Data is being used by Plan B, a New Zealand-based business which manages data centers for medium to large commercial customers. Plan B is moving all of its customer data to VAST’s platform. The VAST kit is also being used for backups and restore performance in the event of an attack or outage. VAST claims Plan B’s customers can retrieve archived data in seconds as opposed to days with legacy platforms.

Spanish startup Nuclia reveals language search models

By

Chris Mellor

-

December 9, 2022

Milky Way stars - Imply graphic — Head torch MIlky Way at night searcher

AI Search-as-a-Service company Nuclia claims it can index text from virtually any source and language and then search it for words or phrases in multiple languages.

It’s a cliche that 80 percent of companies’ data is unstructured and keeps on growing. Looking for text and speech needles in this data haystack is complicated by there being myriad file formats and also multiple languages used in subsidiary operations for international business. Most international companies simply cannot catalog all the unstructured data they hold – they don’t know what they have and have no practical way of finding out. Nuclia reckons it can fix that.

The two founders, CEO Eudald Camprubí and CTO Ramon Navarro, told an IT Press Tour that their software can help companies sift through complex data, because its language model technique normalizes multi-language unstructured data and makes it searchable.

The key technology is natural language processing (NLP) and it uses language modelling to build vectors of language statements and store them in its open-source NucliaDB database. This is available in GitHub, is cloud-native and stores unstructured data and vector, text, paragraphs and relations indexing information. A vector in this sense is a set of numeric values in arbitrary dimensions – tens, hundreds or even more of them – that describe a complex data object such as a word, phrase, paragraph, object or image.

The sources from which the vectors are generated are files and objects: S3 is supported, containing text that is printed, viewed or spoken. This could be a Word document, PowerPoint slide, images which can be scanned with OCI, and audio files which can be transcribed. The text can be indexed from virtually any language which is not pictogram-based; pictograms being characters such as those found in Japanese and Chinese languages.

The index is paragraph-sensitive and search results come in multi-paragraph form, not the unparagraphed masses of text which the Otter.ai transcription service can produce.

Nuclia search is faster and broader than searching through a file:folder document structure using keywords, the company claims, as Nuclia looks into content irrespective of its format and language. A keyword search is language sensitive; a search for “Vehicle” will not return results containing the French word “voiture”, the German word “Fahrzeug“ or the Italian word “veoicolo.” A search of a Nuclia-indexed sets of PDFs, word docs, audio files and images containing text in multiple languages, will return these results.

That’s because the Nuclia search does not look for keywords but for the language models which underlie different languages and hence are common to them. When you input a search string to Nuclia it turns that into a language model and then searches for similar models in the NucliaDB repository. Searches can use sentences or paragraphs, or keywords of course.

From its point of view, files and folders are poor ways of putting a structure in place to contain and access data. It’s better to have a searchable central database indexing everything that uses text or speech. Search results point to specific text paragraphs or specific timed parts of podcasts, other audio files, or videos containing speech.

Nuclia’s software runs on the Google Cloud Platform. Applications can use it as an AI-powered search plug-in through API integration.

*Nuclia YouTube video showing how to include its AI-powered search engine in applications in minutes*

The company was founded in Barcelona, Spain in 2019, and picked up €5.4 million in seed funding in April from two European funds: Crane Venture Partners in the UK and Elaia in France. There are some 20 staff, mostly developers. The company earned about $100,000 in revenue this year and has commitments for next year that total $400,000 to $500,000.

Pricing is still being worked out and is influenced by the amount of source data to be indexed and the number of queries per period.

Nuclia has 20 or so customers in the healthcare, pharmaceutical, education, public administration and customer service areas. One of its customers inputs about 100,000 PDDF documents a month for indexing. This is computationally intensive; indexing a thousand of them takes about a week.

Individual users could connect their desktop drives and Dropbox folders to Nuclia and have it index them. Use cases start with basic unstructured data search and include multi-language semantic search, video and audio search, data anonymisation, GDPR compliance, data training, insight detection and customer service.

It says competing unstructured data search services include ElasticSearch and Algolia, which has a media content search and discovery feature used by customers as a recommendation engine.

Text summarization is an issue, said Camprubí: “It is not that useful today. It is difficult to define what a summary is. Do you generate text or cut-and-paste from the file?”

Nuclia’s roadmap includes a possible addition of translation services.

Resilience? We’ve heard of it. Forrester data durability study shows Big 3 are ahead

By

Chris Mellor

-

December 9, 2022

A Forrester Data Resiliency Wave study shows that Commvault is rated ahead of Cohesity and Rubrik, and the three lead all other suppliers. It also gives a measure of the included suppliers’ revenues streams from those products, which most take care not to reveal.

This “Forrester Wave: Data Resilience Solution Suites, Q4 2022” report looks at just nine suppliers, not representing the entire market. It defines data resilience as the capability to back up and secure data in hybrid on-premises/public cloud environments supporting SaaS and container environments. Products or services should secure the backup infrastructure from cyberthreats, recover from failure, navigate shared-responsibility models for hosted services, and address data privacy and sovereignty concerns.

Commvault’s Ranga Rajagopalan, SVP for Products, said in a statement: “We believe our position as a Leader in the Forrester Wave for Data Resilience Solutions Suite validates the completeness of our current offerings and our vision in future-proofing the data protection strategy for our customers.”

The highlighted comments about suppliers include:

Commvault leads with breadth of backup support and attention to enterprise needs.
Rubrik encapsulates a well-featured backup system with a security focus.
Cohesity bridges the divide between backup and data management.
Druva simplifies backup operations but mainly focuses on modern workloads.
Veeam Software is powerful and flexible, but software-only delivery is double-edged.
Dell Technologies’ customer relationships bolster its position in the market.
Veritas has strong backup capability, but its market growth has been tepid.
Zerto is known for supporting aggressive RPOs but has limited backup functionality.
IBM’s Spectrum Protect suite provides core functionality with piecemeal integration.

Here’s the data resiliency suppliers’ Wave diagram:

This 4-box variation diagram positions suppliers in a square space defined by a vertical weaker-to-stronger current offering axis and a weaker-strategy-to-stronger-strategy horizontal axis. There are three concentric quarter circles – waves – for Contenders, Strong Performers and Leaders with the remaining space for Challengers.

Each vendor’s position on the vertical axis of the Forrester Wave graphic indicates what it sees as the strength of its current product set. Placement on the horizontal axis indicates the strength of the vendors’ strategies. Market presence is represented by the size of the markers on the graphic.

Vendors were selected on the basis of having generally available comprehensive backup-and-restore functionality for both legacy and modern workloads, ransomware recovery capabilities, comprehensive reporting resilience and consolidated management interface, significant product revenue (at least $100 million in last fiscal year), and Forrester client interest.

The product revenue barrier obviously limits the supplier choices for the Forrester analysts. It also tells us that Druva had revenues of at least $100 million in its latest financial year, as did HPE’s acquired Zerto business.

The basis for the Wave rankings is an evaluation table of the suppliers’ offerings:

Forrester’s analysts rate suppliers’ offerings in 17 categories grouped into three sets – current offering, strategy and market presence – with averages per supplier for each group. The scores are also given weightings and then overall rank scores calculated, but not published in the report.

If we chart the three set averages per supplier we get a quick and dirty look at their rankings:

It’s immediately obvious that Veeam scores high on market presence but the Forrester weighting algorithm downplays this strength and so relegates Veeam from a Leader to Strong Performer. Druva. also scores low on the market presence factor, almost as low as Zerto.

The market presence is formed from three components: revenue, number of customers and average deal size. Each of these are rated from 0 to 5. A minimum revenue requirement for being in this Wave report is $100 million revenues in the latest financial year. In that case what do the 0, 1, 2, 3, 4 and 5 ratings represent for the revenue number? Presumably a rating of 0 is the minimum amount, $100 million, and 1 is a higher amount, 2 a higher amount still, and so on.

Dell and Veeam received 5s on this revenue rating. IDC said Veeam’s first half 2021 revenues were $647.17 million, giving it a $1.29 billion annual run rate. CTO Danny Allen told us in July this year that Veeam revenues were growing 20 percent year-on-year. That gives us a $1.55 billion run rate for Veeam’s fiscal 2022 and could mean a Forrester Wave 5 ranking on revenue means numbers in that arena.

We don’t know if the revenue rating scale is linear from the starting $100 million point, i.e., adding $200 million per step, but we can now chart the supplier’s revenue rankings, knowing roughly what the high end point value is:

This is comparative data we simply have not seen before, and indicates that Cohesity and Rubrik revenues in their latest fiscal year could be above $100 million but less than Commvault’s $769.6 million in its fiscal 2022. The Forrester “3” revenue rating has Commvault and IBM at the same level, indicating IBM pulled in around $750 milion for its Spectrum Protect revenues in its last fiscal year. As IBM no longer releases storage software revenues in its financial reports, this is informative.

Similarly private equity-owned Veritas doesn’t release its financial numbers but we now know its revenues were greater than Commvault’s $769.6 million in its latest fiscal year but less than Veeam’s and revenues, also Dell’s PowerProtect-based revenues.

You can download a copy of the Forrester Wave report courtesy of Cohesity here.

Pure talks up all-flash in sustainability report as datacenter world awaits regulation

By

Chris Mellor

-

December 8, 2022

Pure Storage has released a sustainability report claiming its products can help datacenters use less electricity, pressing the pressure points of customers that want to hit carbon emissions targets and are worried about regulators homing in on their use of energy and water.

The paper, titled Drivers of Change: Pure Storage IT Sustainability Impact Survey 2022, is accompanied by an Energy Savings Visualizer, which enables businesses to see Pure’s estimated electricity cost savings based on performance or capacity-optimized block, file, or object storage.

Ajay Singh, Pure’s chief product officer, said: “This inaugural report on the central role that IT can play in overall sustainability can be an important tool to help IT leaders improve their data storage strategies and decrease their organization’s carbon footprint as they advance their digital transformation.”

Datacenters account for 1 percent of global electricity consumption today, according to the paper. Pure says that by building and adopting sustainable technology infrastructure, IT teams have the potential to make a significant and immediate impact to use less electricity for power and cooling and also create less e-waste.

The commissioned report, carried out by Wakefield Research, surveyed more than 1,000 sustainability managers in the US, UK, France and Germany. It looked at organizations’ sustainability efforts, with sustainability mainly focused on electricity consumption and the underlying carbon emissions. The highlights include:

Sustainability initiatives are a priority but half of sustainability managers are behind on sustainability goals
86 percent of sustainability program managers think companies can’t achieve sustainability goals without significantly reducing technology infrastructure energy usage.
Just over half say the sustainability of vendors’ technology is likely to be overlooked during the selection process.
Around half say the IT team is considering sustainability when making decisions about technology purchases.

78 percent say their company’s leadership is treating sustainability initiatives as a priority, but only about half (51 percent) are on track with their goals. Just under a third of the surveyed companies (32 percent) are focused on becoming carbon neutral.

The report declares that 81 percent said the impact of technology infrastructure on a company’s carbon footprint will increase in the next 12 months. Pure says it is lowering the carbon emissions of its products moving forwards. Last year it said it wanted to achieve a 66 percent reduction per PB in sold product Scope 3 emissions.

It believes there should be industry-wide standards for reporting power efficiency so that customers can make informed and reliable choices as sustainability becomes a bigger factor in technology purchasing decisions.

Analyst view

William Blair analyst Jason Ader said he believed Pure has an “opportunity to displace near-line hard disk (due to the declining cost of NAND and its energy efficiency).” He added: “Pure’s proprietary ‘Direct Flash’ hardware architecture, in which flash management software is applied against a pool of raw NAND flash (as opposed to off-the-shelf SSDs, which virtually every other storage vendor, including the CSPs, leverages), is the company’s key technical differentiator in the storage market and leads to industry-leading density, performance, and data reduction efficiency.”

In other words, at least theoretically, you need less flash capacity in Pure’s all-flash arrays than competing vendors’ AFAs and so less electricity is needed during their operation for power and cooling. The electricity savings against all-disk and hybrid flash/disk arrays are even greater.

Ader notes: ”Pure sees NAND price declines as a catalyst for greater displacement of near-line hard disk drives, which are still the prevalent form factor for capacity-based enterprise storage systems.”

Comment

Pure’s belief that there should be industry-wide standards for reporting power efficiency has a ready-made framework; the SNIA’s Emerald program and its Storage Device Level Power Efficiency Measurement (SDLPEM) activities.

The width of the “Industry-wide” concept needs defining in a storage industry sense. Suppliers whose products depend on moving components, such as hard disk drives (HDD) and tape drives and the systems that use them – disk arrays, tape autoloaders and tape libraries – will use far more electricity than solid-state drives on the one hand and storage software on the other.

But tape drives only require power when in use and tape cartridges spend more time offline than online. Tape is likely inherently greener than disk. Perhaps disk power-down, think Copan, will return from its technology cemetery.

The emissions created during HDD and NAND manufacturing will also need to be considered.

Developing an industry-wide reporting standard will require all suppliers thinking they have been treated equally and not put at a disadvantage because their kit inherently causes more carbon emissions, such as HDDs and tape drives. The SNIA Emerald working group has a quite difficult job on its hands.

Pure hints in its report that sustainability issues should be present earlier in the IT technology procurement process than at present. This might mean that they would become a more important factor and that would potentially place HDD vendors at a considerable disadvantage versus potential replacement technologies, such as the QLC flash gear supplied by Pure Storage and VAST Data. It’s likely that more flash array suppliers will discover the joys of QLC.

SK hynix boosts DDR5 DRAM speed with parallel reads

By

Chris Mellor

-

December 8, 2022

SK hynix has boosted DDR5 memory speed with a buffer between the CPU and DRAM which can accept parallel reads from 2 banks.

This DDR5 Multiplexer Combined Ranks (MCR) Dual In-line Memory Module (DIMM) was developed with a buffer from Renesas and Intel’s MCR technology. SK Hynix says DDR5 DIMMs can output 4.8Gbps and its MCR DIMM is at least 80 percent faster than that.

Sungsoo Ryu, SK hynix’s head of DRAM Product Planning, claimed: “SK hynix delivered another technological evolution for DDR5 by developing the world’s fastest MCR DIMM.”

A DIMM has several memory chips on its board. DDR5 (Double Data Rate 5) has double the bandwidth and capacity of the previous DDR4 memory standard. SK Hynix shipped its first DDR5 modules in November 2018.

The more DRAM speed and capacity the better, as server and PC CPUs can get through more work with less waiting for memory contents to be read or written. DDR5 memory is organized into ranks, being part of the memory that typically refers to 64 bytes of data to be transferred to the CPU as a bundle. The MCR DIMM is a module product with multiple DRAM chips attached to the board and improved speed as a result of two ranks operating simultaneously.

Through this dual rank operation SK hynix’s MCR DIMM enables the transmission of 128 bytes of data to the CPU in one operation instead of 64. This supports a data transfer rate of 8Gbps, not quite twice the 4.8Gbps rate of unbuffered DDR5 DIMMs.

The development effort by SK hynix, Renesas, and Intel MCR DIMM has been underway since 2019. Sk hynix said a significant market for MCR DIMMs could be high-performance computing. It is planning to bring the product to mass production in the future, but no date was signalled.

Dr Dimitrios Ziakas, VP of Memory and IO Technologies at Intel, said: “We look forward to bringing this technology to future Intel Xeon processors and supporting standardization and multigenerational development efforts across the industry.” There is a hint there, a slight hint, that non-Intel processors could support this in the future.

UUID

By

Chris Mellor

-

December 7, 2022

UUID – Every disk drive has a UUID, a Universal Unique Identifier, which is a 128-bit hexadecimal number, generated buy a standard method, and used to identify objects or entities on the Internet. UUIDs should stay unique until at least 2030 after which they may lose their uniqueness. UUIDs are standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE).

Qumulo extends scaleout clusters , hardware support

By

Chris Mellor

-

December 7, 2022

Qumulo has updated to its scaleout filer software to support 265 nodes in a cluster, added support for new third-party hybrid NVMe platforms and will add SMB protocol enhancements.

The company supplies scaleout Core filer software that runs on its Qumulo-branded hardware (P, C, K series) and also third-party file storage systems. These can be hybrid NVMe systems, including disk drives and NVMe SSDs. All data on Qumulo is written initially to SSD storage, delivering flash-level write performance. As data ages, Qumulo’s software monitors how often the data is re-accessed, moving it to the HDD layer as it “cools.”

Kiran Bhageshpur, Qumulo CTO, said in a statement: “Customers often feel stuck using a vendor’s exclusive hardware and are subject to price hikes, technological limitations, and even supply chain risk. We enable our customers to use their platform(s) of choice while making data management simple.”

The 265-node limit for a Qumulo cluster is a 165 percent uplift from the previous 100 limit. It means a clusters can scale to higher capacity. However, Qumulo web documentation, “Supported Configurations and Known Limits for Qumulo Core” says the On-Premises Cluster Size limit is 275 nodes while the Cloud Cluster Size limit is 100 nodes. Qumulo said there is documentation error and the upper limit should be 265, not 275. It is being corrected.

Dell PowerScale F900 clusters can be expanded to a maximum of 252 nodes.

A source close to Qumulo told B&F: “There was never any real hard limits for nodes. … For a time [the product management VP] wouldn’t allow anything larger than 50 nodes. Then a customer did it on their own – so they limited it to 100 nodes, and the same thing happened. They put a limit in code and the uplift to 265 is simply based on a customer wanting to grow their nodes beyond the current limitations.”

The uplifted cluster node limit means the maximum cluster capacity has expanded from 34.2PB to 100.7PB, an almost 3x increase. Cluster capacity can vary with the Qumulo model and hardware vendor.

Qumulo says it has added certifications for three new hybrid NVMe platforms offering 35-110 percent greater performance vs existing hybrids. The newly supported systems are:

HPE Apollo 4200 Gen10+ (as compared to Gen10)
SuperMicro A+ ASG-1014S-ACR12N4H (no previous hybrid on SuperMicro)
A whitebox solution from Arrow Electronics, Quiver 1U Gen2 Hybrid (replaces Arrow’s 1U SATA-based hybrid offering)

A Qumulo pitch is that its architecture allows customers to consolidate many workflows onto one system. It says five new workloads will soon be enabled by SMB protocol enhancements and S3 support. These new workloads are:

8K Native Video Editorial in the AWS Cloud (SMB MC)
Philips (formerly Algotec) PACS (SMB MC)
Media Ingest / Watch Folders (SMB CN)
Data Analytics with Hadoop/SPARK (S3)
Hybrid Cloud/Remote Access (S3)

Qumulo CEO Bill Richter said: “Our fundamental vision is to enable massive scale, simple operation, deployable anywhere a customer chooses. Qumulo ensures customers have the freedom to pick where their data resides… Customers can run in the cloud, on-premises in their datacenter, or at the edge.”

Atempo foresees structured data explosion

By

Chris Mellor

-

December 7, 2022

The post 2023 world is going to see an explosion in structured data alongside the flood of unstructured data, at least according to French data manager and protector Atempo.

It is a truism that 80 percent of the world’s data is unstructured (file and object) and growing. Louis-Frédéric Laszlo, Atempo VP of product management told an IT Press Tour briefing about Atempo’s view of what’s happening in 2023, detailing incremental developments for its Miria, Tina, Lina and Continuity software, but then surprised his audience.

He said he ”sees the structured and unstructured worlds converging with a need for unified storage.” That means a single piece of storage software will be able to handle blocks, files and objects, as Ceph does today.

For Atempo, although no details were revealed, it surely means its data managing, moving and protection software products will have to operate in a unified block+file+object world.

Laszlo also said: “For File, VM, Emails – think object first.” This is another turnaround.

Data management and data protection have to recognize these changes and Atempo wants to work towards offering a unified data management platform.

We have not heard similar views from its competitors, such as Data Dynamics and Komprise, but, unless Atempo is talking to different analysts, they may be coming around to the same point of view.

The short term roadmap, for next year, has four pillars: end-to-end data immutability, data integrity, cost reductions for unstructured data storage, and sustainability, meaning carbon emission reductions, which should equate to savings in customer energy use with its products.

The E2E immutabiity has immutability on each storage tier apart from tier one production data. It encompasses both protection data and production data on the secondary tier, such as nearline disk storage. File systems can have an immutability flag and, when set, a new file version becomes a fresh file or object. Naturally S3 object lock will be supported.

Data integrity will involve background scanning to combat ransomware, sanity checks before VM restarts and data retrieval, and testing to validate automatic restarts.

Atempo’s data analytics functionality will be extended to provide automatic tiering based on storage costs. Self-service data movement will supported as well. This use of analytics to drive cost reduction will also be used to drive storage energy reduction, with a revival of tape storage as the low-cost and low energy-use storage medium. We think it will be possible to see the storage energy usage levels via a dashboard report or graphical display, and arrange data placement to lower energy use.

It would not be a surprise to see Data Dynamics, Komprise and other vendors introduce similar functionality.

NewsPaperStorages and File System News

NewsPaperStorages and File System News

Back up a minute: Backblaze on SMR, storage tiers, and Web3

Bootnote

Azure Storage Mover competes with partners

Data Dynamics

Komprise and WANdisco

Datadobi

Backblaze predicts HDD prices will fall to 1¢/GB by mid-2025

Catalog demos massively parallel DNA storage search

Bootnote

Storage news ticker – 9 December

Spanish startup Nuclia reveals language search models

Resilience? We’ve heard of it. Forrester data durability study shows Big 3 are ahead

Pure talks up all-flash in sustainability report as datacenter world awaits regulation

Analyst view

Comment

SK hynix boosts DDR5 DRAM speed with parallel reads

UUID

Qumulo extends scaleout clusters , hardware support

Atempo foresees structured data explosion

ABOUT US

FOLLOW US