Home Blog Page 258

Storage news ticker – December 1

AWS has announced four new storage offerings:

  • S3 Glacier Instant Retrieval is a storage class that provides retrieval access in milliseconds for archive data — now available as an access tier in Amazon S3 Intelligent-Tiering. 
  • FSx for OpenZFS is a managed file storage service that makes it easy to move on-premises data residing in commodity file servers to AWS without changing application code or how the data is managed. 
  • Amazon Elastic Block Store (Amazon EBS) Snapshots Archive is a storage tier for Amazon EBS Snapshots that reduces the cost of archiving snapshots by up to 75 per cent. 
  • AWS Backup supports centralised data protection and automated compliance reporting for Amazon S3, as well as for VMware workloads running on AWS and on premises. 

Cloudian has announced planned support for the new AWS Outposts servers, enabling customers to expand their Outposts use cases with Cloudian’s HyperStore S3-compatible object storage. Cloudian’s storage software and appliances provide limitless on-premises capacity for workloads that require local data residency and low-latency data access. This follows HyperStore achieving AWS Service Ready designation for the Outposts rack in June and reflects the two companies’ increased collaboration.

Dell Technologies and AWS announced that they are bringing Dell’s cyber recovery vault to the AWS Marketplace with the launch of Dell EMC PowerProtect Cyber Recovery for AWS. It features a public cloud vault, operational air gap and enhanced security.

HYCU, which supplies on-premises and public cloud data backup and recovery as a service, announced a preview of cloud-native HYCU Protégé for Amazon Web Services (AWS). This will give customers a tightly integrated and application-aware system to protect, manage, and recover data for workloads on AWS. Protégé supports both on-premises and applications, and virtual machines (VMs) running on public clouds. General availability of Protégé for AWS is anticipated in the first quarter of 2022.

Druva was named as the third-placed leader in the Forrester New Wave SaaS Application Data Protection, Q4 2021 report. AvePoint was number one followed by Keepit.

Commvault, Veritas and Spanning Cloud Apps were Strong Performers, with Acronis on the Contender/Strong Performer boundary, and Asigra classed as a Challenger.

Peraton, a system integrator and enterprise IT provider, has selected CTERA’s file platform to support a $497 million contract to provide infrastructure-as-a-managed service (laaMS) for US Department of Veterans Affairs (VA) storage and computing infrastructure facilities across the US and globally. CTERA will deliver file services for mission-critical workloads, connecting up to 300 distributed sites to the VA Enterprise Cloud powered by AWS GovCloud (US).

The lawsuit initiated by VMware against Nutanix concerning the latter’s hire of VMware exec Rajiv Ramaswami as its CEO has been resolved. Nutanix stated: “VMware’s lawsuit was misguided and inappropriate, as there was no wrongdoing on Mr Ramaswami’s part. VMware has agreed to dismiss the lawsuit and we are very pleased that the matter has been favourably resolved.” Lawyers made money. Nutanix spent some to prove it was right. VMware spent some too but ends up just looking peevish.

HPC storage supplier Panasas has joined the Thales Accelerate Partner Network. This collaboration will safeguard HPC storage systems and user data with Panasas “hardware-based encryption at rest” and Thales’ enterprise integrated storage security and key management solution. Panasas’s PanFS 9 adds layers of security through file labelling support for Security-Enhanced Linux (SELinux) and hardware-based encryption at rest with zero performance degradation. Panasas and Thales have extended Thales’ CipherTrust Manager’s support to Panasas PanFS 9 for centralized, compliance-ready enterprise encryption key management.

Sunlight.io announced a “Hyper-Convergence Innovation of the Year” win at the SDC Awards 2021, beating competition from Nutanix and Hewlett Packard Enterprise. Its entry featured its NexVisor software stack and an OEM partnership with Altos Computing — a subsidiary of Acer — to deliver what it says is the industry’s fastest and most efficient hyperconverged infrastructure (HCI) appliances. Winners were announced at a gala evening event on 24 November 2021 at the Leonardo Royal St Paul’s Hotel, London.

Komprise explains the mess: files, objects, silo sprawl and abstraction layers

As if in a dream, I thought: suppose we didn’t have file and object storage? Would we invent them? Now that we do have them, should they be united somehow? In a super silo or via a software abstraction layer wrapper? Where  should functions like protection and life cycle management be placed?

Who could I ask for answers? Komprise seemed like a good bet. So I sent them a set of questions and COO and president Krishna Subramanian kindly provided her answers.

Krishna Subramanian

Blocks & Files: If we were inventing unstructured data storage now, would we have both files and object formats or just a single format? And why?

Krishna Subramanian: The issue is one of cost versus richness. File hierarchies have rich context and are performant, but more expensive because of all the overhead. Object storage is more cost-efficient with a flat design, but slower. The need for different price/performance options is only increasing as data continues to grow, so both formats will be needed. However, objects are growing much faster than file formats because of their usefulness for cost-effective, long-term storage.

There is a use case for access for both file and object and it reflects the life cycle of a file/object. Typically, when a file is first created it is accessed and updated frequently: the traditional file workload. After that initial period of creation and collaboration the file still has value, but updates are unlikely and the workload pattern shifts to that of object. The organic workflow we see is that unstructured data is created by file-based apps, whereas for long-term retention and secondary use cases such as analytics, object is the ideal format. 

Object’s abilities to handle massive scale, provide metadata search, deliver durability, and achieve lower costs are advantageous. An additional benefit of object storage is offloading data from the file system, allowing it to perform better by removing colder data. File storage systems start to bog down as space utilization approaches 90 per cent.

Are files and object storage two separate silo classes? Why?

Today, file and object storage are separate silo classes, because not only do they present and store data in different formats, but also the underlying architectures are different. For instance, object storage has multiple copies built-in whereas file storage does not, which impacts how you protect files in each environment. The way apps use file versus object is also different. Object data sets are often so large you search and access objects via metadata. With file you provide semi-structure with directories.

Other differences include:

  • File is more administratively intensive in regard to space management and data protection.
  • With objects these tasks are largely handled by the cloud provider. Space is only constrained by your budget and data protection is based on multiple copies, erasure coding and multi-region replication versus daily backup or snapshots with file.

What are the pros and cons of storing unstructured data in multiple silos?

The value of data and its performance requirements vary throughout its lifecycle. When data is hot, it needs high-performance; when data is cold, it can be on a less performant but more cost-efficient medium. Silos have existed throughout the life of storage because different storage architectures deliver different price/performance. 

However, there are still logical reasons for silos as they allow IT to manage data distinctly based on unique requirements for performance, cost, and or security. The disadvantages of silos are lack of visibility, potentially poor utilization, and future lock-in. Data management solutions that employ real-time analytics will play an ever-stronger role in giving IT organizations the flexibility and agility of maintaining silos without incurring waste or unnecessary risk.

Is it better to have a single silo, a universal store or multiple silos presented as one via an abstraction layer virtually combining silos?

Silos are here to stay — just look at the cloud. AWS alone has over 16 classes of file and object storage, not including third-party options. Silos are valuable for delivering specific price/performance, data locality, security features for different workloads but they are a pain to manage. The future as we see it is this: the winning model will be silos with direct access to data on each silo and an abstraction layer to get visibility and manage data across silos. 

Less ideal is an abstraction layer such as a global namespace or file system that sits in front of all the silos, because you have now created yet another silo and it has performance and lock-in implications. This is why you do not want a global namespace.

Rather you want a global file index based on open standards to provide visibility and search across silos without fronting every data access. Data management which lives outside the hot data path at the file/object level gives you the best of both worlds: the best price/performance for data, direct access to data without lock-in and unified management and visibility.

Should this actual or virtual universal store cover both the on-premises and public cloud environments (silos)?

A virtual universal data management solution should cover on-premises, public cloud and edge silos. Data at the edge is still nascent, but we will see an explosion of this trend. 

Modern data management technologies will be instrumental in bridging across edge, datacenters and clouds. Customers want data mobility through a storage-agnostic and cross-cloud data management plane that does not front all data access and limit flexibility.

Should the abstraction layer cover both files and objects, and how might this be done?

Yes. The management layer should abstract and provide duality of files and objects to give customers the ultimate flexibility in how they access and use data. Komprise does this for example, by preserving the rich file metadata when converting a file to an object but keeping the object in native form so customers can directly access objects in the object store without going through Komprise. They can also view the object as a file from the original NAS or from Komprise because the metadata is preserved. 

Should the abstraction layer provide file sharing and collaboration and how might this be done? Would it use local agent software?

Data management can expose data for file sharing and collaboration, but it is better for data management to be a data- and storage-agnostic platform that enables various applications including file sharing, collaboration, data analytics and others. 

By abstracting data across silos, app developers can focus on their file sharing or other apps without worrying about how to bridge silos. Our industry has been moving away from agents because they are difficult to deploy, brittle and error prone.

Could you explain why the concept of object sharing and collaboration, like file sharing and collaboration, makes sense or not?

Object sharing is less about editing the same object, but more about making objects available to a new application like data analytics. This speaks to the lifecycle of data and data types. File data that requires collaboration — such as documents and engineering diagrams — will be accessed and updated frequently and are best served by file access, while longer-term access is best served by object. 

For example, an active university research project may create and collect data via file. Once the project is complete the research director can provide read-only access to the object using a preassigned URL, which wouldn’t be possible with a file alone.

If it doesn’t make sense does that mean file and object storage are irrevocably separate silos?

They are separate because the use cases are different, the performance implications are different, and the underlying architectures are different. I would draw a distinction between the data and how it’s accessed and used versus silos. There is value today in providing object access to file data, but perhaps no value in providing file access to data natively created in object. 

An engineering firm creates plans for a building using file-based apps, and after that project is complete the files should not be altered. Therefore, access by object makes sense. On the other hand, image data collected by drones via object API should be immutable through its entire life cycle. Providing access via file would provide limited benefit and be extremely complex with very high object counts, etc.

Should the abstraction layer provide file and object lifecycle management facilities?

Yes. Data management should provide visibility across the silos and move the right data to the right place at the right time systematically. Lifecycle management is critical. Many of the pain points of file data such as space, file count limits and data protection are growing beyond what can be managed effectively by humans. 

Old school storage management largely consisted of monitoring capacity utilization and backups. This was largely reactive: “I am running out of space. Buy more storage.” 

Proactive, policy-based data management can alleviate many of these issues. Object lifecycle management is often about managing budgets: your cloud provider is perfectly happy to keep all your data at the highest performance and cost tier.

Does the responsibility for data protection, including ransomware protection, lie with the abstraction layer or elsewhere?

Enterprise customers already have existing data protection mechanisms in place so the data management layer can provide additional protection, but it must work in concert with the capabilities of the underlying storage and existing backup and security tools. If you require customers to switch from existing technologies to a new data management layer, then it’s disruptive, and again creates a new silo.

Data protection features such as snapshots and backup policy for file or immutability and multi-zone protection are characteristics of storage systems. Effective data management is putting the right data on the right tier at the right time. A great example is moving or replicating data to immutable storage like AWS S3 object lock for ransomware protection; data management puts the data in the bucket and object lock provides protection.

How should — and how could — this abstraction layer provide disaster recovery?

Data management solutions can replicate data and provide disaster recovery often for less than half the cost of traditional approaches because they can leverage the right mix of file and object storage with intelligent tiering across both. Ideally, hot data is readily available in the event of a disaster, but costs are lower.

What’s important to note here is the data is not locked in a proprietary backup format. Data is available for analytics in the cloud and its availability can be verified. A big challenge for traditional/legacy approaches to disaster recovery was the need to “exercise” the plan to make sure data could be restored and understand how long that would take. Since data is available natively in the cloud when using storage-agnostic data management solutions, disaster recovery can be verified easily and data can be used at all times.

How can unstructured data be merged into the world of structured and semi-structured analytics tools and practices?

The need to analyze unstructured data is growing with the rise of machine learning, which requires more and more unstructured data. For instance, analyzing caller sentiments for call center automation involves analyzing audio files which are unstructured. But unstructured data lacks a specific schema, so it’s hard for analytics systems to process it. As well, unstructured data is hard to find and ingest as it can easily be billions of files and objects strewn across multiple buckets and file stores. 

To enable data warehouses and data lakes to process unstructured data, we need data management solutions that can globally index, search and manage tags across silos of file and object stores to find data based on criteria and then ingest it into data lakes with metadata tables that give it semi-structure. Essentially, unstructured data must be optimized for data analytics through curation and pre-processing by data management solutions.

HPE’s revenue growth anaemia versus profits on steroids

HPE grew revenues in its fourth fiscal 2021 quarter at an anaemic two per cent rate — compared to Dell’s 21 per cent rise in its most recent quarter — but HPE’s profits rocketed up.

Overall HPE revenues in the quarter were $7.4 billion, with profits of $2.55 billion. That is hugely more than the year-ago $157 million. Full year revenues were $27.8 billion — up three per cent on the year — with profits of $3.43 billion. Compare that to a loss of $322 million in the prior year and again it’s a majorly impressive turnaround.

President and CEO Antonio Neri’s announcement statement said: “HPE ended fiscal year 2021 with record demand for our edge-to-cloud portfolio, and we are well positioned to capitalise on the significant opportunity in front of us.” EVP and CFO Tarek Robbiati added his two cents: “HPE executed with discipline and exceeded all of our key financial targets in FY21. The demand environment has been incredibly strong and accelerated in the second half of the year, which gives us important momentum headed into next year.” 

Just look at the Q4 FY2021 profits bar! It’s the highest it’s been for years.

HPE divides its business into segments and we’ve put the segment revenues into a table to show what’s happened over the last five quarters: 

Compute and storage are regarded as core businesses by HPE, with the Intelligent Edge and HPC-plus AI regarded as growth segments. However the segment annual revenue growth percentages were not that much different:

  • Compute — $3.2 billion and 1.1 per cent
  • Storage — $1.26 billion and 3.5 per cent
  • Intelligent Edge — $815 million and 3.7 per cent
  • HPC + AI — $1 billion and 0.8 per cent

We would expect to see a lot more growth in the growth segments. That’s why they are called growth segments.

Storage revenue growth of 3.5 per cent in the fourth fiscal 2021 quarter — a second successive growth quarter — was better than HPE’s overall and fairly puny revenue increase of two per cent. But it is anaemic, comparing poorly to competitors like NetApp growing revenues 11 per cent and Pure Storage 37 per cent in the same period.

HPE storage revenues show a second consecutive growth quarter but it looks a tad early to declare a turnaround from four quarters of decline.

Within the storage segment the all-flash array (AFA) business increased seven per cent on the year, led by Primera arrays which grew revenues by “strong double-digits” suggesting 16–18 per cent. This was HPE’s sixth successive quarter of AFA revenue growth. However AFA revenues grew more than 30 per cent in the prior quarter.

NetApp AFA annual run-rate revenues just grew 22 per cent, while Pure’s AFA 37 per cent revenue growth is all AFA-based, so HPE’s success here is, again, somewhat limited to say the least.

Nimble array sales rose four per cent on the year and the dHCI section of that rose more strongly with double-digit growth.

Analyst view

Wells Fargo senior analyst Aaron Rakers summed HPE results up like this: “HPE’s F4Q21 results reflect continued solid execution with regard to the company’s as-a-service pivot. HPE as-a-Service orders grew 114 per cent year-on-year, which includes a large multi-million network-as-a-service win in F4Q21.”

There was momentum in the GreenLake/as-a-Service area as: “HPE exited F4Q21 with a total ARR at $796 million, up 36 per cent year-on-year and compared to +33 per cent year-on-year in the prior quarter.” 

Overall Rakers thought HPE’s results were net-neutral. He said: “HPE reported that F4Q21 orders accelerated to +28 per cent year-on-year (vs +11% year to date through F3Q21) vs revenue growth at only +2 per cent year-on-year.”

It would appear that HPE’s financial discipline has generated terrific growth in profitability and its order presages well for the next quarter. What appears to be missing is overall growth acceleration, partly through, as Rakers notes, “the revenue headwind associated with HPE’s continued as-a-service pivot.”

We think Neri and his execs are looking to HPE’s customer base adopting GreenLake and everything-as-a-service in droves to drive up both revenues and profitability. 

Street-beater NetApp’s fifth growth quarter in a row

NetApp had a knock-out second fiscal 2022 quarter, selling more flash arrays and object storage, and with a good pipeline for further sales. Its public cloud business is growing strongly and it’s uprated its guidance for the full year, despite looming supply chain issues.

Revenues were $1.57 billion — 10.9 per cent more than a year ago — with a profit of $224 million, up 63.5 per cent. Profit as a percentage of revenue was 14.27 per cent versus 9.68 per cent a year ago. NetApp really is doing well. (Although Pure Storage is growing faster, it is still not making a profit.)

CEO George Kurian’s statement said: “We delivered another strong quarter, with results all at the high end or above our guidance. Our performance reflects a strong demand environment, a clear vision, and exceptional execution by the NetApp team and gives the confidence to raise our full year guidance for revenue, EPS and Public Cloud ARR. We are gaining share in the key markets of all-flash and object storage, while rapidly scaling our public cloud business.”

NetApp generally seems locked into a $1.2B to $1.7B revenue/quarter area. Perhaps growing public cloud revenues can take it towards $2B/quarter revenues in the years ahead.

Hybrid cloud (meaning not public cloud) revenues increased eight per cent, to $1.48 billion, but that was the bulk of NetApp’s revenues. This growth was driven by StorageGRID object storage and all-flash arrays, predominantly ONTAP. The all-flash array annualised net revenue run rate increased 22 per cent year-over-year to $3.1 billion — a record — and product revenue overall grew nine per cent year-over-year to $814 million. Thirty per cent of NetApp’s installed base use its all-flash arrays, leaving lots of adoption ahead still.

Kurian said: “We once again gained share in the enterprise storage and all-flash array markets,” without identifying competitors who lost share. 

Public cloud-based revenues were $87 million — up 85 per cent annually — meaning an 80 per cent increase in annualised revenue run rate (ARR) to $388 million. The public cloud business, 17 per cent of that, has a lot of growth ahead of it to reach this heady level. Kurian said: “We remain confident in our ability to achieve our goal of reaching $1 billion ARR in FY25” — meaning $250 million/quarter in public cloud revenues.

He also pointed out: “Our public cloud services not only allow us to participate in the rapidly growing cloud market, but they also make us a more strategic datacentre partner to our enterprise customers, driving share gains in our hybrid cCloud business.” 

Kurian was especially pleased about NetApp’s relationships with AWS, Azure and GCP. “These partnerships create a new and massive go-to-market growth engine, as three of the largest and most innovative companies in the world are reselling our technology. … We are now the first and only storage environment that is natively integrated into each of the major public cloud providers.” 

He also said NetApp was coping with the supply chain challenges — but more challenges are coming.

Billings increased seven per cent year-over-year to $1.55 billion. There was a 14 per cent increase in software product revenues.

Financial summary:

  • Gross margin — 68 per cent;
  • Public cloud gross margin — 71 per cent;
  • Operating cash flow — $298 million;
  • Free cash flow — $252 million;
  • EPS — $0.98 compared to $0.61 a year ago;
  • Cash, cash equivalents and investments — $4.55 billion at quarter-end.

This was NetApp’s fifth growth quarter in a row. 

But it still has away to go to get back to fiscal 2018 revenue heights, and an even bigger hill to climb to claw back ground from fiscal years 2012 and 2013 — the glory years, in retrospect. 

In the earnings call NetApp said that it had a strong deal pipeline relating to on-premises datacentre modernisation. William Blair analyst Jason Ader argues: “This is mainly motivated by lower costs for storing large, production data sets (where cloud storage costs can be prohibitive), but can also be due to improved control (for compliance and regulatory reasons) and security.”

The outlook for the next quarter is for revenues between $1.525 billion and $1.675 billion — $1.6 billion at the mid-point — which would be an 8.8 per cent increase year-on-year. Full FY2022  revenues are being guided to being 9 to 10 per cent more than last year — $6.29 billion at the mid-point. This revenue increase is depressed by anticipated supply chain issues in the second half of FY2022.

DNA storage and science: archiving unreality

DNA storage researchers have devised a neat way to record events inside cells down to a one-minute granularity, but are wildly off-beam when predicting this could become useful as a technology to store general archival data.

Identifiable segments of DNA can be sequentially added to a base strand and represent ones or zeros. These indicate a particular change in the environment inside a living cell and so function as in-vivo ‘ticker tape’ data recorders. But the rate of ingest is so slow – three eighths of a byte per hour – that suggestions it could scale up for general archive use are little short of ridiculous.

Researchers led by Associate Professor of Chemical and Biological Engineering Keith Tyo at the McCormick School of Engineering, Northwestern University, Illinois synthesized DNA using a method involving enzymes.

A Northwestern University announcement declares: ”Existing methods to record intracellular molecular and digital data to DNA rely on multipart processes that add new data to existing sequences of DNA. To produce an accurate recording, researchers must stimulate and repress expression of specific proteins, which can take over 10 hours to complete.”

It is faster to add DNA nucleotide bases to the end of a single DNA strand. These bases — adenine (A), thymine (T), guanine (G) and cytosine (C) — can be grouped together in different combinations. The researchers’ Time-sensitive Untemplated Recording using TdT for Local Environmental Signals (TURTLES) method uses a DNA polymerase called TdT, standing for Terminal deoxynucleotidyl Transferase, to add the nucleotide bases. 

Their composition can be affected by the cell’s internal environment. According to a paper published in the Journal of the American Chemical Society, the researchers “show that TdT can encode various physiologically relevant signals such as Co2+ (Cobalt), Ca2+ (CalCium), and Zn2+ (Zinc) ion concentrations and temperature changes in vitro (in glass — the test tube). So Cobalt presence could cause more A and fewer G bases, and Cobalt absence could cause the reverse, giving us a binary situation: more A bases equals 1 and more G bases equals 0.

They were able to record sequential changes down to the minute level. “Further, by considering the average rate of nucleotide incorporation, we show that the resulting ssDNA functions as a molecular ticker tape. With this method we accurately encode a temporal record of fluctuations in Co2+ concentration to within 1 min over a 60 min period.”

They were also able “to develop a two-polymerase system capable of recording a single-step change in the Ca2+ signal to within 1 min over a 60 min period”. This means they can look at the timing of changes inside the cell as well as the individual step changes.

The researchers say their method is more than 10x faster than other intracellular solutions, transferring information to DNA in seconds instead of hours. So far so very good.

Brain research

This discovery could, they say, change the way scientists study and record neurons inside the brain — by implanting such DNA digital molecular data recorders into brain cells and looking at changes within and between millions of brain cells. The university’s announcement says that “By placing recorders inside all the cells in the brain, scientists could map responses to stimuli with single-cell resolution across many (million) neurons.”

Scientific team member and paper co-author Alec Callisto said: “If you look at how current technology scales over time, it could be decades before we can even record an entire cockroach brain simultaneously with existing technologies — let alone the tens of billions of neurons in human brains. So that’s something we’d really like to accelerate.”

But they would have to extract the DNA and ‘read’ it — and we are talking about a massive operation overall.

Archiving application

The university says this about the TURTLES method: “It’s particularly good for long-term archival data applications such as storing closed-circuit security footage, which the team refers to as data that you “write once and read never,” but need to have accessible in the event an incident occurs. With technology developed by engineers, hard drives and disk drives that hold years of beloved camera memories also could be replaced by bits of DNA.”

But almost certainly not by the TURTLES technology, because its speed is devastatingly slow. Talking to Genomics Research, Tyo said the researchers’ process wrote data at a rate of three bits an hour!

We read this four or five times to make certain it was that slow. The exact phrasing Tyo used is “up to 3/8 of a byte of information in one hour.”

Tyo speed phrasing in Genomics Research article.

Tyo also said that running millions of these processes in parallel would enable more data to be stored and written faster.

That’s true, but how much faster? Let’s try some simple math and assume we could accelerate the rate using five million parallel instances. That would be 15 million bits/hour, meaning 1.875 million bytes/hour, 1,875MB/hour or 1.875GB/hour. That means in turn, 31.25MB/minute and thus 0.52MB/sec.

This is a paralysingly slow write rate compared to modern archiving technology. A Western Digital 18TB Purple disk drive transfers data at up to 512MB/sec — 984 times faster. We would need to accelerate the TURTLES speed by 4.92 billion to achieve this HDD write speed. It seems an unrealistic, ludicrous idea.

Comparing TURTLES to tape makes for even more depressing reading. LTO-8 tape transfers compressed data at 900MB/sec — faster than the disk drive. LTO-9 operates at up to 1GB/sec with compressed data. We didn’t bother working out the TURTLES parallelisation factor needed to achieve these speeds.

Using TURTLES DNA storage for general archival data storage use would appear to be unrealistic. On the other hand, using it to record event streams inside cells is an exciting prospect indeed. 

Paper details

Bhan N, Callisto A, Strutz J, et al. Recording temporal signals with minutes resolution using enzymatic DNA synthesis. J Am Chem Soc. 2021;143(40):16630-16640. doi:10.1021/jacs.1c07331.

Get full content here.

Storage news ticker – November 30

AWS has updated its Compute Optimizer tool to better recommend which AWS facilities are matched to a particular workload, using resource efficiency metrics and analysing historical workload patterns. There are now metrics for AWS’s ECS, Lambda, and EBS with dashboard showing estimated monthly savings ($ and %) if its recommendations are followed. 

AWS Compute Optimiser dashboard.

AWS started previewing its Data Exchange for Amazon Redshift last month. It helps customers find data from providers to Redshift, subscribe to that data and then query it without having to run extract, transform and load (ETL) procedures or set up data pipelining operations. Data Exchange for Amazon Redshift should become generally available in the next few weeks and months.

AWS announced its IoT FleetWise service which enables automakers to collect, transform, and transfer vehicle data to the cloud in near-real time. They can collect and organise data in any format present in their vehicles (regardless of make, model, or options) and standardise the data format for subsequent data analysis in the cloud. 

AWS’s Marketplace for Containers is now generally available. Customers can discover containers, subscribe to them, and deploy them into Kubernetes environments such as EKS Anywhere, Red Hat OpenShift, VMware Tanzu, Rancher or their own self-managed one. Veeam’s Kasten K10 is available inside this marketplace.  

Data protector Bacula has announced its eponymous Bacula 14.0 product, with enhanced security, additional functionality and claimed lower costs. It features:

  • Proxmox module, featuring incremental backup;
  • Nutanix filer module, with HFC technology;
  • M365 module updated to encompass Teams; 
  • Better security and ransomware protection.

Data migrator and growing data manager Datadobi has validated Google Cloud Storage as an endpoint for its data replication and data migration offerings. DobiProtect software suite users can now replicate data held on any S3-compatible object storage to Google Cloud Storage or vice versa. DobiProtect already supports Azure Blob storage as a migration and replication endpoint, which provides a multi-cloud strategy. Datadobi envisages customers starting with an initial migration and then setting up a subsequent replication stream to run until formal cutover.

NetApp’s Spot organisation announced that Ocean for Apache Spark is now available in preview for select AWS customers. NetApp bought Data Mechanics in June, gaining its API to run Apache Spark analytics jobs in the three main public clouds. Ocean is NetApp’s Kubernetes-orchestrated container app deployment service supporting AWS ECS and EKS instances, and the Google Kubernetes Engine. Ocean for Apache Spark is a managed cloud-native Spark service, deployed on a Kubernetes cluster inside a customer’s cloud account and giving data teams a serverless experience.

Paolo Juvara.

Pure Storage has appointed Paolo Juvara as its chief digital transformation officer. He comes from. near-three year stint as Google Cloud’s CIO and six years-plus as a Group VP at Oracle Applications Labs. He is responsible for leading Pure’s IT organisation and defining the strategy for the systems, technology and tools running Pure’s business operation.

Telecom services provider Vocus New Zealand is using SoftIron HyperDrive Ceph-based storage systems to build out an Object-based Storage as a Service platform inside its DataHub storage infrastructure. The HyperDrive family of products will serve as the backbone for the DataHub infrastructure, located in the Vocus Albany data centre in Auckland. Vocus has 4,200km-plus fibre in the ground network with wan on-net presence throughout Australia, from where it provides carrier-grade services across Australia, New Zealand, and into Asia Pacific and the Western United States.

Managed storage and compute as a service supplier Zadara is partnering with AbbaDox, a Software-as-a-Service health technology company, to offer PACS-as-a-Service with unlimited scalable platform-edge compute power, low-latency network access, and storage. The aim is to give imaging centres and radiology providers cost-effective PACS services. It should provide rapid deployment of enterprise PACS services for use-cases such as teleradiology, distributed radiology, main PACS, and disaster recovery PACS. 

HPE’s Zerto unit announced the availability of Zerto In-Cloud for Amazon Web Services (AWS), a cloud-native software offering of the Zerto platform that delivers disaster recovery (DR) for Amazon Elastic Compute Cloud (EC2), scaling to protect 1,000+ instances across regions, availability zones, and accounts. It does not require an agent and its API-centric management enables it to be integrated into a user’s automation system to streamline workload protection.

It offers:

  • Automated non-disruptive failover testing of all instances in an AWS region or availability zone to validate recovery plans;
  • Protection and recovery of large complex applications like SAP or Oracle;
  • Protection groups across multiple instances in AWS;
  • Management components can be run from any EC2 region and be protected region to region to be resilient against any regional outage.

IBM invests in SingleStore to get faster AI and analytics on distributed data

IBM has invested in SingleStore — the re-branded MemSQL that’s building a distributed, relational SQL database for both transaction and analytic workloads and is runs on-premises and in the three main public cloud clouds.

SingleStore raised $80 million in an F-round of funding in September. In 2020 it pulled in $50 million in debt finance and $80 million in an E-round. The company was founded in 2011 and total funding to date is $264 million, ignoring the $50 million of debt financing. It now transpires that IBM Ventures took part in the F-round.

Raj Verma.

The announcement of IBM’s participation was accompanied by a statement from Raj Verma, SingleStore CEO: “Our expanded relationship with IBM is another proof point that SingleStore’s modern cloud database for data-intensive applications is the right solution at the right time. We look forward to working closely with IBM.”

Daniel Hernandez, General Manager of IBM Data and AI, said: “For many enterprises, data has largely remained siloed and dispersed, keeping it from being readily accessible and usable for them to apply AI and make informed business decisions. … With IBM’s investment in SingleStore, we continue our focus to make sure clients can break down silos and put their data to work.”

IBM’s investment — we don’t know the actual amount — builds on existing SingleStore-IBM relationships, with SingleStore available in IBM Cloud Pak for Data offering, its certification on Red Hat OpenShift, and availability in the Red Hat Marketplace.

SingleStore and IBM say they will collaborate to help customers better govern diverse distributed data, accelerate data access, and simplify data management. A cited example would be delivering a data fabric architecture to access and manage distributed data, making it available across an enterprise to all users, and providing unity and governance.

SingleStore graphic.

We think SingleStore availability in the IBM public cloud will soon become a reality. This move is emblematic of ongoing general data silo-bridging and abstracting moves in the database and file/object storage space. Witness Alluxio, Hammerspace, Komprise, StrongBox and many others.

Quest Software flipped from one private equity owner to another

Private equity-owned Quest Software, which supplies data management and protection software, has been bought by Clearlake Capital for an undisclosed amount, reported to be $5.4 billion.

Quest was started up in 1987 and grew to the point where, with 100,000 customers, it was bought by Dell, who combined it with information security business SonicWall. Dell, needing money to help it buy EMC, sold it all off to private equity business Francisco Partners and Elliott Management in mid-2016 in a deal thought to be worth $2 billion.

Buyer Behdad Eghbali, Co-founder and managing partner at Clearlake, issued an acquisition quote, saying: “We have long admired Quest as a leading identity-centric cybersecurity, data intelligence, and IT operations management software platform and the Company’s software solutions that help secure enterprise IT environments.”

Seller Dipanjan “DJ” Deb, Co-founder and CEO of Francisco Partners, said: “We are proud of the tremendous progress Quest has made since re-launching as an independent company, and I want to recognize Patrick Nichols and the management team for strong execution. We have a long and successful track record executing divisional carve-out transactions and are grateful to have had the opportunity to work with the Quest team to create value for the company, its customers, and its partners. We wish the Quest organization well in its new partnership with Clearlake.”

Quest CEO Patrick Nicholls, who was appointed in May 2020 and stays on as CEO, said: “Quest has evolved to become a market leader in identity-centric cybersecurity, data intelligence, and IT operations management and I want to thank Francisco Partners for helping Quest realise this vision. Our new partnership with Clearlake will accelerate Quest’s momentum as a leader and innovator as we increase our investment pace in our core product roadmaps, cloud/SaaS offerings, and global presence.”

More to the point Prashant Mehrotra, partner, and Paul Huber, principal at Clearlake, said: “Now with significant scale and completely independent, Quest is strategically differentiated in the market as a buy-and-build platform and industry consolidator, and we’re thrilled to partner with Patrick, Carolyn and the management team to help Quest accelerate growth organically and through M&A.”

There’s going to be acquisitions with Clearlake providing cash. Francisco Partners has, in effect, sold at a profit — the Wall Street Journal reports the sales was for $5.4 billion, including debt. Francisco Partners will now let Clearview take Quest and run with it while it presumably looks for another “divisional carve-out transaction.” 

Some Quest history

The SonicWall information security part of this Dell division was spun out in November that year by Francisco Partners and Quest then underwent layoffs and a reorganisation.

Since then Francisco Partners helped grow the Quest business, acquiring Balabit Corp for the one Identity unit in 2018, Binary Tree in September last year and erwin in January this year. Quest has four operating segments:

  • One Identity and OneLogin identity-centric cybersecurity software covering all aspects of a unified identity security and management approach to combat identity sprawl and identity-based attacks.
  • Platform Management for Microsoft offering IT operations resilience and flexibility, securing and managing Active Directory.
  • Information Management and erwin by Quest; data operations and intelligence software to optimise performance and deliver apps faster, with offerings including Toad for Oracle, erwin Data Modeler, erwin Data Intelligence, Foglight, ApexSQL and SharePlex.
  • Data protection and endpoint management software to control data growth and optimise system availability with NetVault, QoreStore, and Kace offerings.

The current team of exec managers at Quest stays in place and the deal should close in early 2022.

Storage news ticker – November 29

Amazon Web Services’ massive virtual and in-person re:Invent is now taking place with lots of announcements expected over the next few days. New CEO Adam Selipsky will be taking the stage. Register for the virtual event here and expect the usual Amazon overkill in terms of options and choices and recommendations.

From December 1, according to an AWS blog, data transfer from Amazon CloudFront is now free for up to 1TB of data per month (up from 50GB), and is no longer limited to the first 12 months after signup. The company is also raising the number of free HTTP and HTTPS requests from 2,000,000 to 10,000,000, and removing the 12-month limit on the 2,000,000 free CloudFront Function invocations per month. The expansion does not apply to data transfer from CloudFront PoPs in China. AWS free tier general customers will be able to migrate 1009GB per month of data out from a region to the internet, up from 1GB per month currently. This includes Amazon EC2, Amazon S3, Elastic Load Balancing, and so forth. The expansion does not apply to the AWS GovCloud or AWS China Regions.

Quest Software announced the general availability of a new turnkey system using data backup, protection, and management technologies — Lenovo ThinkSystem hardware, Veeam Backup software and Quest QoreStor software — for channel partners and end-users.

DDN business unit and enterprise storage provider Tintri has a ransomware recovery event in its GeekOut! Technology demo series scheduled for December 1 at 10am PST (18:00 UTC). Register here to listen to Tintro CTO Brock Mowry opining on Tintri and ransomware.

Live data replicator WANdisco has achieved Amazon Web Services (AWS) Migration and Modernization Competency status for AWS Partners. WANdisco’s LiveData Migrator allows production applications to continue to operate on-premises while their data is migrated to Amazon Simple Storage Service (Amazon S3). Production data becomes available in the cloud immediately and continues to be updated with changes throughout the migration. AWS launched the AWS Migration and Modernization Competency to allow customers to engage specialised AWS Partners that activate data and modernise applications. 

Veeam announced Yamaha Brazil, a division of the Fortune 500 company Yamaha Motor, has used Veeam Availability Suite to replace several legacy backup systems for its hybrid cloud infrastructure. The Veeam backed-up can be analysed for customer info relevant to digital marketing. The Veeam SW supports regulatory compliance for Brazil’s General Data Protection Law (LGPD).

A Veeam blog by Rick Vanover says that a Veeam Universal Storage API plug-in is now available for the Dell EMC PowerStore product. This provides backup capabilities, a new recovery option, and test capabilities with DataLabs from Dell EMC PowerStore snapshots. Vanover writes: “The heart of the plug-in with Dell EMC PowerStore is the ability to back up from the storage snapshot versus the VMware snapshot alone.” Faster PowerStore snapshots can be used to directly drive file, application and complete VMware VM recovery.

Sapphire Rapids, extra performance and IO heft, and a SKU nightmare

Intel revealed more Sapphire Rapids memory information at Supercomputing ’21 with capacity, bandwidth and tiering data. Datacentre servers will become much faster and capable of running more containers and virtual machines — but at the expense of SKU complexity.

Update. Diagram corrected to show Optane PMem 300 series connectivity doesn’t use PCIe bus. 2 Dec 2021.

Sapphire Rapids is the fourth generation of Intel’s Xeon Scalable Processor brand datacentre server CPU, following on from Ice Lake. According to Intel it will offer the largest leap in datacentre CPU capabilities for a decade or more.

It has a 4-tile design, using an EMIB (Embedded Multi-die Interconnect Bridge) link to unify them into what appears to be a monolithic die. The processor, with up to 56 cores — 14 per tile — supports:

  • DDR5 DRAM — up to eight channels/controllers,
  • HBM2e — up to 64GB of in-package memory with up to 410GB/sec bandwidth per stack, 1.64TB/sec in total;
  • Optane Persistent Memory 300 series;
  • PCIe 4.0 and 5.0;
  • CXL v1.1.

The EMIB functions also as an interposer, linking each CPU tile to its HBM2e stack. There is a three-level memory hierarchy, with HBM2e at the top, then socket-linked DRAM and Optane PMEM 300 series. HMBe2 will bring “orders of magnitude more memory bandwidth” according to Jeff McVeigh, VP and GM for the Super Compute Group at Intel. He said: “The performance implications for memory-bound applications I[are] truly astonishing.”

Intel Sapphire Rapids 4-tile diagram.

The processor has a coherent shared memory space, across CPU cores and DSA and QAT acceleration engines. DSA accelerates input data streaming while Quick Assist Technology (QAT) speeds encryption/decryption and compression/decompression.

The coherent memory space applies to processes, containers and virtual machines.

There can be up to 64GB of HBM2e memory, divided into four units of 16GB, one per tile, each built from 8x 16Gbit dies.

The Sapphire Rapids CPU can operate in four NUMA (Non-Universal Memory Architecture) modes with a sub-NUMA Clustering feature

Sapphire Rapids options include HBM2e with no DRAM, or DRAM with HBM2e — in which case we have flat mode and caching mode. Flat mode has a flat memory design embracing both DDR5 DRAM and HBM2e, but with the DRAM and HBM2e each having its own NUMA mode. 

In caching mode the HBM2e memory is used as direct-mapped cache from DRAM and is invisible to application software. With the alternative flat mode application software has to use specific code to access the HBM2e memory space.

As Optane also has its different modes, such as application direct (DAX), the use of non-persistent (HBM2e and DRAM) and persistent (Optane PMem 300) memory by applications will become more complicated. Intervening system software, such as MemVerge’s Big Memory Computing offering, will become more useful because they promise to hide this complexity.

Blocks & Files diagram.

It’s obvious that server configuration variations will become more complex as well, with servers coming with Sapphire Rapids CPU variations (how many cores did you want?), HBM2e-only and no DRAM, HBM2e and DRAM, PCIe 4.0 and/or 5.0, CXL1.1 or not, Optane PMem 300 and no Optane. This could be a SKU nightmare.

Intel’s Ponte Vecchio GPU will also support HBM2e and PCIe 5, and Intel considers Sapphire Rapids and Ponte Vecchio as natural partners — the one for general workloads and the other for GPU-intensive work. Will there be a GPUDirect-type link between Sapphire Rapids servers and Ponte Vecchio GPUs, echoing Nvidia’s software to bypass server CPUs and get stored data into its GPUs faster than otherwise? We don’t know but the idea seems to make sense.

22dot6 adds extra cloudiness to its Valence software

All-singing, all-dancing TASS (Transcendent Abstracted Storage System) supplier 22dot6 has updated its Valence software — making it easier to set up and operate private, hybrid, and public cloud storage.

22dot6 is not your usual storage supplier. It was founded in 2015, has just five staff listed in LinkedIn, and no known external funding. Hammerspace-like Valence was first announced in May and, as of now, we don’t know how many customers 22dot 6 has for the software.

Diamond Lauffin.

But its founder, Diamond Lauffin, has a long-term storage industry track record — co-founding, for example, Nexsan and being an EVP sales at Qualstar from 1993 to 2000. In other words, take it seriously.

Valence VSR storage controller.

A Lauffin announcement statement said: “Most enterprise storage managers are getting pressure from upstairs to shift to the cloud, but often times it is difficult for executives not on the front line to understand what’s actually involved in this process, and how complicated it can be. A TASS architecture is the answer, and from sunrise to sunset the Valence Cloud Suite combines the features and optimal practices required for enterprise level data management in the cloud.” 

This Valence Cloud Suite release adds:

  • Point-and-click private cloud setup;
  • Unifies private and public clouds into a single pool;
  • Cloud-to-cloud data migrations and reverse migrations with no pointers, links or stub files;
  • Transparent, multi-protocol, cross-platform support for all security and permissions with a single point-and-click;
  • Metadata-level analytics, lookups and data reports of public cloud data;
  • Individual, file-level data integrity audits that run transparently in the background and provide a red light/green light analysis of all data, protecting against file deletion, modification, corruption, or disappearance;
  • Geographic control over where data subject to regulation is located and file level data immutability.

Comment

We know of no independent analysis of the TASS software, and no evidence of 22dot6 engagement with analysts like Gartner, Forrester, ESG or the Evaluator Group. Contact 22dot6 to find out more.

This could be great storage software and have its use grow quickly, or it could be a storage curio — terrific in its own right but not a mass-market product. Keep your eyes on it just in case.

Storage news ticker – November 26

An Arcserve announcement appears to have had an embargo breakdown or similar event, as SecurityBrief Asia has put out a story — “Arcserve partners with Google Cloud to deliver cloud business continuity solution” — which is dated 11 November 2021. It describes the availability of Arcserve Cloud Services (DRaaS) on the Google Cloud Platform.

A DoKC (Data on Kubernetes Community) report entitled “Data on Kubernetes 2021” surveyed 500 Kubernetes-using respondents and found half of them are running half or more of their production workloads on it. Some 90 per cent believe it is ready for stateful workloads, and 70 per cent are running them in production. DoKC is a Kubernetes supplier trade group with levels of membership. Top level (platinum) members are DataStax, EDB, MayaData (bought by DataCore) and Pure-owned Portworx.

Taiwan-based contract semiconductor manufacturer United Microelectronics Corp. (UMC) is paying an undisclosed sum to Micron as the price of the legal action between the two being dropped. UMC pled guilty to Micron DRAM IP theft in October last year and was fined $60 million by the USA. The stolen IP was passed to China’s Fujian Jinhua which wanted it for DRAM manufacturing.

OWC Mercury Elite Pro mini.

OWC has updated its OWC Mercury Elite Pro mini portable flash drive with USB-C support, delivering up to 542MB/sec real-world performance. Drive capacities are 480GB, 1TB, 2TB and 4TB — or you can buy the product with flash drive and stick one in yourself. The 480GB model is priced at $94. 

China-based SmartX has announced version 5.0 of its SMTX ZBS distributed block storage product which can deliver 770,000 4K random read IOPS from a 3-node cluster. V5.0 introduces support for RoCE v2, NVMe-oF (alongside existing iSCSI support) and Optane Persistent Memory. It can provide 8GB/sec bandwidth for 256K sequential reads, saying this is close to the physical limit with a 25GbitE network card.

Richard Henderson, technical director at TigerGraph, believes that, in 2022, “digital twins” will appear everywhere, and be based on real-time analytic graph databases. A digital twin is a real-time model of a business and its environment. It provides, he says, a complete and current view of the physical business situation, using the business events and data that are probably already available in individual operational silos and data marts. This, combined with graph analytics, can deliver a detailed and immediate digital scenario, showing the impact and risk associated with any delays or failures within that network. Graph analytics will combine these individual events and the map of the network to produce a “zoomable” big picture view of the entire operation.

….