Home Blog Page 257

Update adds four ways to make WekaIO slicker, smoother and safer

Weka
Weka video

Superfast scale-out parallel file system supplier WekaIO has updated its software to better integrate flash, S3 object storage, cloud and on-premises file accesses in a single, slicker framework with four neat additions.

The latest tweak-and-tune version of WekaFS, v3.13, adds thinly-provisioned auto-scaling in AWS, backup to remote object stores, updated VMware virtual network device support and resource quotas on the CSI plug-in.

A statement from Liran Zvibel, co-founder and CEO of WekaIO, announced: “The latest features are an extension of our capabilities to better help users unlock the full capabilities of their datacentres.” That’s it — there’s no increase in core functionality but the product works even better in the on-premises, cloud and virtual environments.

WekaFS supports AWS Autoscaling Groups to allow for on-the-fly scaling of the cluster for peak demand periods and auto-scaling down when not needed. In v3.13, a filesystem can be set as thin provisioned when scaling the number of EC2 instances to be increased or decreased. When scaling up, as the NVMe capacity grows, these filesystems will get more NVMe capacity auto-added and then shrink down as EC2 instances are removed. This simplifies things and works with AWS Budgets, enabling users to track and act on the cost of dynamic AWS usage.

Remote Backup is now generally available in 3.13, supporting the use of multiple S3 object stores. In v3.13, WekaFS’s namespace expansion can attach to a second object store for snap-to-object data protection and cloud bursting purposes. A daily snapshot could be directed to a local object store, while a monthly snapshot is uploaded to a second, remote object store. WekaFS can upload just the incremental changes between each remote snapshot, enabling reduced network traffic and lowering the reduced capacity needed in the target object store.

WekaFS v3.13 introduces support for VMXNET Generation 3 (VMXNET3) — the most recent virtual network device from VMware. This allows for high-speed WekaFS operation with vMotion (vSphere) in a typical virtual environment. Users need to load the Weka client onto the guest VM and ensure a VMXNET3 network is attached.

V3.13 introduces Kubernetes CSI plug-in resource quota management to set volume size and enforce limits via policies. These are integrated into the plugin to make for tighter controls and prevent unchecked storage consumption.

WekaFS 3.13 is currently available for new and existing customers.

Western Digital: Oh HAMR, how I do love thee

Western Digital is focussing more on HAMR as its breakthrough technology to into the 30TB and beyond disk drive area, with no flash-disk cost crossover for at least ten years.

David Goeckeler.

This became clear with a CEO Dave Goeckeler session at the virtual Wells Fargo TMT Summit 2021 that took place on November 30. 

He said the HDD market looks to be in a period of sustained high demand. “Post-COVID … people [are] using the cloud more and more all the time … [and[ the cloud [is] powering ever more intelligent devices connected by high-speed networks. We play in two very, very important parts of that ecosystem and that platform — that is storage in the cloud and storage on the device.” This is “driving good demand across the endpoints and across the cloud.”

Goeckeler said that the incredible expansion in unstructured data storage by businesses and other organisations was a long-term tailwind for disk drive manufacturers. “We’re entering a stage where the cloud is just continuing to grow at a phenomenal pace. We’re all using it more and more.”

WD needs to keep innovating to lower the $/bit cost of its storage. Goeckeler prognosticated, “if we … continue to drive storage innovation and lower the cost of storage per bit, I think that we have a very, very strong demand driver in multiple dimensions; increased use, increased demand and also, there’s a big part of the iceberg below the water of data that’s not stored that could be stored with a different — if we can continue to drive the economics in the right direction.”

He said the bulk of the business was selling to enterprises, not clients, and that meant a change in internal investment emphasis to focus on “thinking about, well, how much do I need to invest so I can meet demand two or three years out?” So “we’re thinking about investment and being able to fuel that growth.“

HAMR

The discussion took a look at Heat-Assisted Magnetic Recording (HAMR) technology which uses laser-produced heat to create smaller bits than conventional magnetic recording (PMR) that are stable at room temperature, unlike similar-sized PMR bits. WD has been emphasising the use on microwaves in its Microwave-Assisted Magnetic Recording (MAMR) as an intermediate step to HAMR and made an initial advance in that direction with its enhanced PMR (ePMR) now called energy-assisted magnetic recording (EAMR).

HAMR graphic.

Goeckeler said: “HAMR is a very important technology. There’s no doubt about that. We’re heavily invested in HAMR. I think you know we have over 400 patents in HAMR. Any time you’re a supplier of hard drives in an industry this big, you’re going to be invested in a number of different technologies that you think is going to fuel your road map. So we’re a big believer in HAMR.

“I think the industry now is coming more around to the realisation that the HAMR is going to be real, it’s going to be in the future. It’s going to be very important. It’s going to extend the life of hard drives for a very, very long time

“We’re focused on HAMR, but we’re also focused on the steps to get there.”

The steps so far have been energy-assist and OptiNAND (flash-enhanced controller), about which he said it’s “very, very important technology, higher reliability, better aerial density. It allows us to … deliver several generations of technology. We’re able to deliver our 20-terabyte on nine platters, we can add the tenth, and we get another 2.2 terabytes of storage.”

With Optinand, “we really have that staircase to take you to 30 terabytes and then you get on the HAMR curve and you go for quite a bit longer. So I think it’s a really good story for — a really good road map for the hard drive industry.”

“Several generations” implies to us at least three generations of OptiNAND drives. If they each add 2TB then we are looking at a 20TB (nine-platter), 22TB (ten-platter) and then 24TB (10-platter), 26TB (10-platter) and 28TB (10-platter). Add a fourth OptiNAND generation and we are at Goeckeler’s 30TB doorway to HAMR.

We can derive a rough timescale from this, by assuming a 12-month cadence:

  • 2022 — 24TB
  • 2023 — 26TB
  • 2024 — 28TB
  • 2025 — 30TB

Comment

WD competitor Seagate is currently shipping first-generation 20TB HAMR drives and has a 30TB HAMR drive in development, with a cloud supplier market in prospect, not the traditional enterprise market. That will be served by conventional PMR (Perpendicular Magnetic Recording) drives.

Colin Presley, a Senior Director in Seagate’s CTO organisation, told us in September that HAMR “is really really hard technology” and “The industry across the board recognises HAMR is the road to high capacity.”

If Seagate produces a 30TB HAMR drive by, say, 2025 then that would be when WD is about to produce its first HAMR drive. By then Seagate will have shipped many millions of its HAMR drives and be vastly more experienced in manufacturing the drives, understanding their reliability, and creating controllers and firmware for using them.

We could see a situation emerging where WD is at a competitive disadvantage because it is years behind Seagate in HAMR manufacturing and controller technology.

Shingling and flash-disk crossover

Goeckeler said OptiNAND makes can be applied to Shingled Magnetic Recording (SMR) disks and they add additional capacity, say in the 15–20 per cent area. But “clearly, something like SMR requires changes on the client side. That’s not something anybody takes lightly.”

SSD and HDD $/TB prices are going down. Says Goeckeler: “They’re both declining,” and “We’re very comfortable with our [flash] road map and driving 15 per cent cost down there.”

He said: “But on the other side, you’ve got hard drives continuing to go deeper into the well of data that’s out there, and we see that — we see those costs continuing to go down. Like we said, we talked about a road map here that is many, many steps into the future with a major technology transition like HAMR in our future, it’s several years out, but it’s in our future. And so it provides a lot of runway into that drive value proposition.”

Ultimately he is not worried abut flash becoming cheaper than disk for the next decade. “Storage in the cloud is very important. The vast majority of that data is stored on hard drives and will be for a very, very, very long time. In fact, if you look at the economics, we don’t see crossover for beyond a decade, which is … beyond the planning horizon of any useful technology business.”

Two into one: Quantum deduplicates virtual DXi product line with free version

Quantum has a DXi line of deduplicating backup appliances and its two virtual products have been replaced by a single new one: the V5000, with free and paid-for versions.

There are physical DXi appliances using Quantum-supplied hardware and software, and V-Series virtual appliances — software downloaded by customers to run as virtual machines on their own server hardware. The previous V2000 and V4000 software appliances have been replaced by the new V5000, with a V5000 Community Edition as a no-charge starter product.

Bruno Hald.

Bruno Hald, Quantum’s VP and GM for Secondary Storage, issued an announcement quote: “Our customers are looking for simpler ways to protect their data, especially protection at remote sites and at the edge. … By introducing the Community Edition, we now can get this technology into the hands of as many users as possible so they can more efficiently backup and protect their critical data.”

Both V5000s are targeted at remote and branch offices and other internet edge sites needing a backup target system. V5000 backup data can be replicated to a central site for longer-term storage and to protect against edge site data loss.

The V5000 features:

  • Support for VMware, Hyper-V, or KVM environments, with support for major public cloud platforms coming in the first half of 2022;
  • Scalability from 5 to 256TB of useable capacity, with dynamic virtual RAM allocation;
  • Quantum DXi variable-length deduplication to reduce backup data sets by 20:1 or more;
  • Data replication between sites, with WAN traffic reduced by 20:1 or more since only deduplicated data is transferred;
  • Support for multiple protocols, including NAS, Veritas OST, and Veeam Data Mover Service (VDMS);
  • DXi Secure Snapshots for protection and immediate recovery from ransomware attacks.

It promises to be easy to install and manage with a simple, intuitive user interface.

The Community edition has a 5TB usable capacity limit, before deduplication, and can be upgraded to the full version by buying a subscription license. It is delivered as a virtual machine that can be installed in minutes, and Quantum says users can begin to back up data almost immediately. The subscription license upgrade is available through the V5000 Community Edition GUI. A single-click “buy now” option is on Quantum’s roadmap.

Get a DXi product line datasheet here. The DXi V5000 is available now with the Community edition downloadable here.

The need for speed: Seagate’s 20TB drives take on WD with a performance edge

Following Western Digital’s September launch of two 20TB drives, Seagate has introduced two 20TB drives of its own: an Exos X20 and an IronWolf Pro, both transferring data 16MB/sec faster than the WD drives.

Seagate says these conventionally-recorded drives — meaning not HAMR — have enhanced caching and employ either 6Gbit/sec SATA (IronWolf Pro, Exos X20) or 12Gbit/sec SAS (Exos X20) interfaces. They have sustained data transfer rates of 285MB/sec compared to the WD pair’s 269MB/sec. That’s only a 5.9 per cent difference, but in an hour of continuous operation would mean the Seagate drives move 57.6GB more than the WD drives, and 1.38TB more in a day. Small differences can mount up.

Yet the Seagate drives only have a 256MB cache capacity, while WD’s 20TB pair are fitted with 512MB ones. It must be down to better caching software.

Like the WD drives, Seagate’s have a 4.16ms latency rating and, we understand, they are 9-platter drives — 2.222TB per platter — and helium-filled. Seagate says they have vibration control and security functions and do not employ shingling.

The Exos X20 is a 24×7 datacentre drive with a 550TB/year workload rating while the IronWolf Pro is a NAS drive with a user workload limit of 300TB/year. They differ in their mean time before failure (MTBF) rating as well. The Exos is a 2.5 million hour MTBF while the IronWolf Pro makes do with 1.1 million hours.

Internal view of a 20TB Seagate drive.

The WD Ultrastar DC HC560 20TB drive also has a workload rating of 550TB/year, as does, we understand,  the 20TB Gold drive. WD hasn’t yet announced a 20TB NAS drive — that would be a Red Pro-branded product. Both of its 20TB drives — the Gold and the DC HC560 — are general datacentre drives, like Seagate’s Exos X20.

The 20TB Exos costs $670 while the 20TB Iron Wolf Pro will set you back $650, which includes three years of Rescue Data recovery. For comparison, WD’s 20TB Gold costs approximately $680 and the Ultrastar HC560 $700. These are not the prices volume buyers would pay, but on the face of it the Seagate drives deliver more performance for less money.

We can expect both Seagate and WD to spread their 20TB technology around their product portfolios with surveillance drives as an obvious possibility and, for WD, a NAS product. We might also expect Toshiba to announce its own non-shingled 20TB drive in the next few months.

We have no specific availability information for these two new Seagate drives.

Kioxia adds sophisticated admin tools and wider support to KumoScale

Kioxia has made its KumoScale box-of-flash admin easier and supports the latest version of OpenStack in its latest software release as it plans to build out its virtualised disaggregated flash box software.

V3.19 of Kioxia’s KumoScale — software to virtualise and manage boxes of block-access, disaggregated NVMe SSDs — adds operator-driven device management to add/remove SSDs from the pool, migrate data non-disruptively between SSDs, indicate SSD is ready for removal by blinking its lights, and upgrade SSD firmware. 

The software uses so-called Operator functions said to be declarative in nature. Admin staff specify a desired end-state — such as an empty SSD — and the software carries out the necessary steps to achieve it. Kioxia says this mode of admin management contrasts with imperative commands and was pioneered in the Kubernetes container orchestration framework.

This latest KumoScale supports the latest OpenStack “Xena” release and Kioxia contributed a couple of functions:

  • The Xena release os-brick nvme.py connector broadens support to include many commonly used Linux distributions; 
  • The Xena release nvme.py connector has been enhanced to support volume snapshots. 

The v3.19 KumoScale software also supports the Ubuntu distribution of Linux, the latest Kubernetes CSI version, and adds CSI and Ansible support of snapshot and clone functionality. 

Kioxia describes KumoScale as offering flash-as-a-service in Kubernetes, software-defined storage or bare metal environments. Target customers for KumoScale, which is sold through system partners, are cloud service providers, Software-as-a-Service providers and enterprise private cloud suppliers. “Kumo” means cloud in Japanese, by the way.

Comment

We still have no idea how popular KumoScale is. Kioxia hasn’t revealed any customer or deployment numbers. The system competes with other NVMe-only SAN array products such as Excelero. Our feeling is that it is less enterprise-capable than the Excelero product but this is just an impression. KumoScale may well have picked up one or two CSPs or MSPs as customers. We just don’t know.

Kioxia keeps steadily plugging away at developing the software with twice-yearly updates since its launch in March 2018. That demonstrates persistence if nothing else.

Access KumoScale v3.19 documentation here.

Storage news ticker – December 2

AWS-palooza

AWS Lake Formation is a service to set up a secure data lake and has three new capabilities:

  • Lake Formation Governed Tables on Amazon S3 simplify building resilient data pipelines with multi-table transaction support. As data is added or changed, Lake Formation automatically manages conflicts and errors to ensure that all users see a consistent view of the data. 
  • Governed Tables monitor and automatically optimize how data is stored so query times are consistent and fast. 
  • Row and cell-level permissions make it more easy to restrict access to sensitive information by granting users access to only the portions of the data they are allowed to see. 

Governed Tables, row and cell-level permissions are supported through Amazon Athena, Redshift Spectrum, AWS Glue, and Amazon QuickSight.

There is a new Amazon DynamoDB Standard-IA table class to reduce DynamoDB costs by up to 60 per cent for tables storing infrequently accessed data. This is compared to Standard DynamoDB table cost. Standard DynamoDB tables offer up to 20 per cent lower throughput costs than the Standard-IA. Customers can switch between the two classes without performance impact and no application code changes.

AWS is offering three new Outposts systems — not racks but 1U and 2U servers — which run its public cloud software on-premises, and pretty skinny affairs they are too. The 1U server is the STBKRBE. It has a Graviton2 Arm-based CPU with up to 64 vCPUs in various configurations. There is 128GiB of DRAM and 2x 1.9TB NVMe SSDs. The LMAXAD41 is a 2U system powered by an “Ice Lake” Xeon with up to 64 vCPUs and 2x 1.9TB NVMe SSDs and 128GiB DRAM. The more capable KOSKFSF has 128 vCPUs, 256 GiB of DRAM and 4x 1.9TB NVMe SSDs. The systems will be deliverable in the first 2022 quarter, with AWS support looking after them. Amazon’s marketeers need to up their server naming game, by the way.

NetApp has been named the 2021 AWS Independent Software Vendor (ISV) Design Partner of the Year in the US for its work on the jointly engineered Amazon FSx for NetApp ONTAP software introduced earlier this year.

The Rest

Druva is running a competitive win webinar highlighting how its customer Vertrax changed from Veeam to the Druva Cloud platform after ransomware corrupted its OneDrive and on-premises backup files. “We couldn’t recover most of the files,” said Rob Ljunggren, director of IT at Vertrax. But, when another ransomware attack happened, they were prepared. Register for the webinar here.

Taiwan-based Infortrend has introduced the EonStor DS 4000U — an all-flash SAN system to boost IOPS and reduce latency for applications like database and virtualisation. It supports U.2 format NVMe SSDs to deliver 1,000,000 IOPS and 11GB/sec throughput. This is a 90 per cent performance increase compared to the previous model. 

The system has high-performance and high-capacity tiers and an auto-tiering function to move data to these tiers. The system is backed by supercapacitors, has multiple levels of RAID protection available along with snapshots and remote replication. There is also SSD optimisation technology to extend service life, improve data protection, and simplify management. EonStor DS is certified as VMware Ready.

NetApp announced Spot Ocean Continuous Delivery (CD) is available for private preview with AWS customers in mid-December. Spot Ocean CD extends Spot by NetApp’s core technologies enabling delivery of cloud-native applications on Kubernetes. Spot Security is available in private preview, starting with AWS customers. Spot Security enables customers to detect, prioritise and help mitigate the most serious security threats and risks within cloud infrastructure.

….

Jonathan Martin, WekaIO’s president, just posted this on LinkedIn: “Weka just closed out a record-breaking quarter! … [and] tripled our revenue in Q4 year over year with more deals in Q4 than all of 2020 [and] It was also a record-breaking year.”

VMware has to cover the Nutanix CEO’s costs in its abandoned lawsuit. An extract from the lawsuit closure court order makes this clear: “NOW, THEREFORE, it is hereby ORDERED, that: 1. Plaintiff’s [Rajiv] Motion for Judgment on the Pleadings is GRANTED and Defendant’s [VMware] Cross-Motion for Judgment on the Pleadings is DENIED. 2. The Company [VMware] shall advance to Plaintiff [Rajiv] the reasonable attorneys’ fees and expenses he has incurred and will incur in connection with defending against 3 the California Action and shall indemnify Plaintiff’s reasonable Fees-on-Fees as set forth below.” Ho ho ho. That’s gotta sting.

Infinidat: we’re boosting our channel, even as others retreat

Infinidat, the memory-cached high-end array shipper, says it has been — and is — building up its channel partners and program while competitors are reducing their channel commitment.

It claims competitors, without naming them, have recently cut the number of partner account managers, slashed on-target earnings, and reduced the ranks of veteran channel managers. Infinidat says it has invested heavily in building up its channel partner program, increasing partner revenue, margins, co-op/marketing development funding (MDF), and joint events.

Eric Herzog, Infinidat’s CMO, said in a statement: “Infinidat is doing all the right things to catapult partners forward. The outstanding results of Infinidat’s engagement with solution providers speak for themselves.”

 So far this year Infinidat says it has:

  • Increased channel revenue by more than 20 per cent;
  • Increased the number of channel transactions by 25 per cent;
  • Doubled its channel-focused headcount;
  • Run three times as many joint channel/Infinidat demand generation events as last year;
  • Increased the number of channel partners by more than 25 per cent year-over-year, from <400 to >500.

The headcount increase includes “very experienced” channel-dedicated sales, SE, support personnel, and marketing, with resources around the globe. Kevin Rhone, practice director for partner acceleration at ESG, said Infinidat “has expanded its channel programs by increasing joint end user demand generation events, high value rewards for new customer acquisition, and rich competitive refresh incentives.”

Kelly Nuckolls, vice president of marketing & alliances at InfoSystems, said: “With Infinidat’s dramatic expansion of their channel focus and white-glove partner support, we are very excited to expand our GTM strategy in 2022.”

Infinidat says it aims to co-sell with channel partners as often as possible — almost 90 per cent of its revenues come through channel partners. Partner recruitment is ongoing.

Comment

Since Infinidat competes with other high-end block array suppliers, such as Dell EMC, Hitachi Vantara, HPE, IBM, and others, we must infer from Infinidat’s statement that two or more of them are cutting back on their channel programs.

Storage news ticker – December 1

AWS has announced four new storage offerings:

  • S3 Glacier Instant Retrieval is a storage class that provides retrieval access in milliseconds for archive data — now available as an access tier in Amazon S3 Intelligent-Tiering. 
  • FSx for OpenZFS is a managed file storage service that makes it easy to move on-premises data residing in commodity file servers to AWS without changing application code or how the data is managed. 
  • Amazon Elastic Block Store (Amazon EBS) Snapshots Archive is a storage tier for Amazon EBS Snapshots that reduces the cost of archiving snapshots by up to 75 per cent. 
  • AWS Backup supports centralised data protection and automated compliance reporting for Amazon S3, as well as for VMware workloads running on AWS and on premises. 

Cloudian has announced planned support for the new AWS Outposts servers, enabling customers to expand their Outposts use cases with Cloudian’s HyperStore S3-compatible object storage. Cloudian’s storage software and appliances provide limitless on-premises capacity for workloads that require local data residency and low-latency data access. This follows HyperStore achieving AWS Service Ready designation for the Outposts rack in June and reflects the two companies’ increased collaboration.

Dell Technologies and AWS announced that they are bringing Dell’s cyber recovery vault to the AWS Marketplace with the launch of Dell EMC PowerProtect Cyber Recovery for AWS. It features a public cloud vault, operational air gap and enhanced security.

HYCU, which supplies on-premises and public cloud data backup and recovery as a service, announced a preview of cloud-native HYCU Protégé for Amazon Web Services (AWS). This will give customers a tightly integrated and application-aware system to protect, manage, and recover data for workloads on AWS. Protégé supports both on-premises and applications, and virtual machines (VMs) running on public clouds. General availability of Protégé for AWS is anticipated in the first quarter of 2022.

Druva was named as the third-placed leader in the Forrester New Wave SaaS Application Data Protection, Q4 2021 report. AvePoint was number one followed by Keepit.

Commvault, Veritas and Spanning Cloud Apps were Strong Performers, with Acronis on the Contender/Strong Performer boundary, and Asigra classed as a Challenger.

Peraton, a system integrator and enterprise IT provider, has selected CTERA’s file platform to support a $497 million contract to provide infrastructure-as-a-managed service (laaMS) for US Department of Veterans Affairs (VA) storage and computing infrastructure facilities across the US and globally. CTERA will deliver file services for mission-critical workloads, connecting up to 300 distributed sites to the VA Enterprise Cloud powered by AWS GovCloud (US).

The lawsuit initiated by VMware against Nutanix concerning the latter’s hire of VMware exec Rajiv Ramaswami as its CEO has been resolved. Nutanix stated: “VMware’s lawsuit was misguided and inappropriate, as there was no wrongdoing on Mr Ramaswami’s part. VMware has agreed to dismiss the lawsuit and we are very pleased that the matter has been favourably resolved.” Lawyers made money. Nutanix spent some to prove it was right. VMware spent some too but ends up just looking peevish.

HPC storage supplier Panasas has joined the Thales Accelerate Partner Network. This collaboration will safeguard HPC storage systems and user data with Panasas “hardware-based encryption at rest” and Thales’ enterprise integrated storage security and key management solution. Panasas’s PanFS 9 adds layers of security through file labelling support for Security-Enhanced Linux (SELinux) and hardware-based encryption at rest with zero performance degradation. Panasas and Thales have extended Thales’ CipherTrust Manager’s support to Panasas PanFS 9 for centralized, compliance-ready enterprise encryption key management.

Sunlight.io announced a “Hyper-Convergence Innovation of the Year” win at the SDC Awards 2021, beating competition from Nutanix and Hewlett Packard Enterprise. Its entry featured its NexVisor software stack and an OEM partnership with Altos Computing — a subsidiary of Acer — to deliver what it says is the industry’s fastest and most efficient hyperconverged infrastructure (HCI) appliances. Winners were announced at a gala evening event on 24 November 2021 at the Leonardo Royal St Paul’s Hotel, London.

Komprise explains the mess: files, objects, silo sprawl and abstraction layers

As if in a dream, I thought: suppose we didn’t have file and object storage? Would we invent them? Now that we do have them, should they be united somehow? In a super silo or via a software abstraction layer wrapper? Where  should functions like protection and life cycle management be placed?

Who could I ask for answers? Komprise seemed like a good bet. So I sent them a set of questions and COO and president Krishna Subramanian kindly provided her answers.

Krishna Subramanian

Blocks & Files: If we were inventing unstructured data storage now, would we have both files and object formats or just a single format? And why?

Krishna Subramanian: The issue is one of cost versus richness. File hierarchies have rich context and are performant, but more expensive because of all the overhead. Object storage is more cost-efficient with a flat design, but slower. The need for different price/performance options is only increasing as data continues to grow, so both formats will be needed. However, objects are growing much faster than file formats because of their usefulness for cost-effective, long-term storage.

There is a use case for access for both file and object and it reflects the life cycle of a file/object. Typically, when a file is first created it is accessed and updated frequently: the traditional file workload. After that initial period of creation and collaboration the file still has value, but updates are unlikely and the workload pattern shifts to that of object. The organic workflow we see is that unstructured data is created by file-based apps, whereas for long-term retention and secondary use cases such as analytics, object is the ideal format. 

Object’s abilities to handle massive scale, provide metadata search, deliver durability, and achieve lower costs are advantageous. An additional benefit of object storage is offloading data from the file system, allowing it to perform better by removing colder data. File storage systems start to bog down as space utilization approaches 90 per cent.

Are files and object storage two separate silo classes? Why?

Today, file and object storage are separate silo classes, because not only do they present and store data in different formats, but also the underlying architectures are different. For instance, object storage has multiple copies built-in whereas file storage does not, which impacts how you protect files in each environment. The way apps use file versus object is also different. Object data sets are often so large you search and access objects via metadata. With file you provide semi-structure with directories.

Other differences include:

  • File is more administratively intensive in regard to space management and data protection.
  • With objects these tasks are largely handled by the cloud provider. Space is only constrained by your budget and data protection is based on multiple copies, erasure coding and multi-region replication versus daily backup or snapshots with file.

What are the pros and cons of storing unstructured data in multiple silos?

The value of data and its performance requirements vary throughout its lifecycle. When data is hot, it needs high-performance; when data is cold, it can be on a less performant but more cost-efficient medium. Silos have existed throughout the life of storage because different storage architectures deliver different price/performance. 

However, there are still logical reasons for silos as they allow IT to manage data distinctly based on unique requirements for performance, cost, and or security. The disadvantages of silos are lack of visibility, potentially poor utilization, and future lock-in. Data management solutions that employ real-time analytics will play an ever-stronger role in giving IT organizations the flexibility and agility of maintaining silos without incurring waste or unnecessary risk.

Is it better to have a single silo, a universal store or multiple silos presented as one via an abstraction layer virtually combining silos?

Silos are here to stay — just look at the cloud. AWS alone has over 16 classes of file and object storage, not including third-party options. Silos are valuable for delivering specific price/performance, data locality, security features for different workloads but they are a pain to manage. The future as we see it is this: the winning model will be silos with direct access to data on each silo and an abstraction layer to get visibility and manage data across silos. 

Less ideal is an abstraction layer such as a global namespace or file system that sits in front of all the silos, because you have now created yet another silo and it has performance and lock-in implications. This is why you do not want a global namespace.

Rather you want a global file index based on open standards to provide visibility and search across silos without fronting every data access. Data management which lives outside the hot data path at the file/object level gives you the best of both worlds: the best price/performance for data, direct access to data without lock-in and unified management and visibility.

Should this actual or virtual universal store cover both the on-premises and public cloud environments (silos)?

A virtual universal data management solution should cover on-premises, public cloud and edge silos. Data at the edge is still nascent, but we will see an explosion of this trend. 

Modern data management technologies will be instrumental in bridging across edge, datacenters and clouds. Customers want data mobility through a storage-agnostic and cross-cloud data management plane that does not front all data access and limit flexibility.

Should the abstraction layer cover both files and objects, and how might this be done?

Yes. The management layer should abstract and provide duality of files and objects to give customers the ultimate flexibility in how they access and use data. Komprise does this for example, by preserving the rich file metadata when converting a file to an object but keeping the object in native form so customers can directly access objects in the object store without going through Komprise. They can also view the object as a file from the original NAS or from Komprise because the metadata is preserved. 

Should the abstraction layer provide file sharing and collaboration and how might this be done? Would it use local agent software?

Data management can expose data for file sharing and collaboration, but it is better for data management to be a data- and storage-agnostic platform that enables various applications including file sharing, collaboration, data analytics and others. 

By abstracting data across silos, app developers can focus on their file sharing or other apps without worrying about how to bridge silos. Our industry has been moving away from agents because they are difficult to deploy, brittle and error prone.

Could you explain why the concept of object sharing and collaboration, like file sharing and collaboration, makes sense or not?

Object sharing is less about editing the same object, but more about making objects available to a new application like data analytics. This speaks to the lifecycle of data and data types. File data that requires collaboration — such as documents and engineering diagrams — will be accessed and updated frequently and are best served by file access, while longer-term access is best served by object. 

For example, an active university research project may create and collect data via file. Once the project is complete the research director can provide read-only access to the object using a preassigned URL, which wouldn’t be possible with a file alone.

If it doesn’t make sense does that mean file and object storage are irrevocably separate silos?

They are separate because the use cases are different, the performance implications are different, and the underlying architectures are different. I would draw a distinction between the data and how it’s accessed and used versus silos. There is value today in providing object access to file data, but perhaps no value in providing file access to data natively created in object. 

An engineering firm creates plans for a building using file-based apps, and after that project is complete the files should not be altered. Therefore, access by object makes sense. On the other hand, image data collected by drones via object API should be immutable through its entire life cycle. Providing access via file would provide limited benefit and be extremely complex with very high object counts, etc.

Should the abstraction layer provide file and object lifecycle management facilities?

Yes. Data management should provide visibility across the silos and move the right data to the right place at the right time systematically. Lifecycle management is critical. Many of the pain points of file data such as space, file count limits and data protection are growing beyond what can be managed effectively by humans. 

Old school storage management largely consisted of monitoring capacity utilization and backups. This was largely reactive: “I am running out of space. Buy more storage.” 

Proactive, policy-based data management can alleviate many of these issues. Object lifecycle management is often about managing budgets: your cloud provider is perfectly happy to keep all your data at the highest performance and cost tier.

Does the responsibility for data protection, including ransomware protection, lie with the abstraction layer or elsewhere?

Enterprise customers already have existing data protection mechanisms in place so the data management layer can provide additional protection, but it must work in concert with the capabilities of the underlying storage and existing backup and security tools. If you require customers to switch from existing technologies to a new data management layer, then it’s disruptive, and again creates a new silo.

Data protection features such as snapshots and backup policy for file or immutability and multi-zone protection are characteristics of storage systems. Effective data management is putting the right data on the right tier at the right time. A great example is moving or replicating data to immutable storage like AWS S3 object lock for ransomware protection; data management puts the data in the bucket and object lock provides protection.

How should — and how could — this abstraction layer provide disaster recovery?

Data management solutions can replicate data and provide disaster recovery often for less than half the cost of traditional approaches because they can leverage the right mix of file and object storage with intelligent tiering across both. Ideally, hot data is readily available in the event of a disaster, but costs are lower.

What’s important to note here is the data is not locked in a proprietary backup format. Data is available for analytics in the cloud and its availability can be verified. A big challenge for traditional/legacy approaches to disaster recovery was the need to “exercise” the plan to make sure data could be restored and understand how long that would take. Since data is available natively in the cloud when using storage-agnostic data management solutions, disaster recovery can be verified easily and data can be used at all times.

How can unstructured data be merged into the world of structured and semi-structured analytics tools and practices?

The need to analyze unstructured data is growing with the rise of machine learning, which requires more and more unstructured data. For instance, analyzing caller sentiments for call center automation involves analyzing audio files which are unstructured. But unstructured data lacks a specific schema, so it’s hard for analytics systems to process it. As well, unstructured data is hard to find and ingest as it can easily be billions of files and objects strewn across multiple buckets and file stores. 

To enable data warehouses and data lakes to process unstructured data, we need data management solutions that can globally index, search and manage tags across silos of file and object stores to find data based on criteria and then ingest it into data lakes with metadata tables that give it semi-structure. Essentially, unstructured data must be optimized for data analytics through curation and pre-processing by data management solutions.

HPE’s revenue growth anaemia versus profits on steroids

HPE grew revenues in its fourth fiscal 2021 quarter at an anaemic two per cent rate — compared to Dell’s 21 per cent rise in its most recent quarter — but HPE’s profits rocketed up.

Overall HPE revenues in the quarter were $7.4 billion, with profits of $2.55 billion. That is hugely more than the year-ago $157 million. Full year revenues were $27.8 billion — up three per cent on the year — with profits of $3.43 billion. Compare that to a loss of $322 million in the prior year and again it’s a majorly impressive turnaround.

President and CEO Antonio Neri’s announcement statement said: “HPE ended fiscal year 2021 with record demand for our edge-to-cloud portfolio, and we are well positioned to capitalise on the significant opportunity in front of us.” EVP and CFO Tarek Robbiati added his two cents: “HPE executed with discipline and exceeded all of our key financial targets in FY21. The demand environment has been incredibly strong and accelerated in the second half of the year, which gives us important momentum headed into next year.” 

Just look at the Q4 FY2021 profits bar! It’s the highest it’s been for years.

HPE divides its business into segments and we’ve put the segment revenues into a table to show what’s happened over the last five quarters: 

Compute and storage are regarded as core businesses by HPE, with the Intelligent Edge and HPC-plus AI regarded as growth segments. However the segment annual revenue growth percentages were not that much different:

  • Compute — $3.2 billion and 1.1 per cent
  • Storage — $1.26 billion and 3.5 per cent
  • Intelligent Edge — $815 million and 3.7 per cent
  • HPC + AI — $1 billion and 0.8 per cent

We would expect to see a lot more growth in the growth segments. That’s why they are called growth segments.

Storage revenue growth of 3.5 per cent in the fourth fiscal 2021 quarter — a second successive growth quarter — was better than HPE’s overall and fairly puny revenue increase of two per cent. But it is anaemic, comparing poorly to competitors like NetApp growing revenues 11 per cent and Pure Storage 37 per cent in the same period.

HPE storage revenues show a second consecutive growth quarter but it looks a tad early to declare a turnaround from four quarters of decline.

Within the storage segment the all-flash array (AFA) business increased seven per cent on the year, led by Primera arrays which grew revenues by “strong double-digits” suggesting 16–18 per cent. This was HPE’s sixth successive quarter of AFA revenue growth. However AFA revenues grew more than 30 per cent in the prior quarter.

NetApp AFA annual run-rate revenues just grew 22 per cent, while Pure’s AFA 37 per cent revenue growth is all AFA-based, so HPE’s success here is, again, somewhat limited to say the least.

Nimble array sales rose four per cent on the year and the dHCI section of that rose more strongly with double-digit growth.

Analyst view

Wells Fargo senior analyst Aaron Rakers summed HPE results up like this: “HPE’s F4Q21 results reflect continued solid execution with regard to the company’s as-a-service pivot. HPE as-a-Service orders grew 114 per cent year-on-year, which includes a large multi-million network-as-a-service win in F4Q21.”

There was momentum in the GreenLake/as-a-Service area as: “HPE exited F4Q21 with a total ARR at $796 million, up 36 per cent year-on-year and compared to +33 per cent year-on-year in the prior quarter.” 

Overall Rakers thought HPE’s results were net-neutral. He said: “HPE reported that F4Q21 orders accelerated to +28 per cent year-on-year (vs +11% year to date through F3Q21) vs revenue growth at only +2 per cent year-on-year.”

It would appear that HPE’s financial discipline has generated terrific growth in profitability and its order presages well for the next quarter. What appears to be missing is overall growth acceleration, partly through, as Rakers notes, “the revenue headwind associated with HPE’s continued as-a-service pivot.”

We think Neri and his execs are looking to HPE’s customer base adopting GreenLake and everything-as-a-service in droves to drive up both revenues and profitability. 

Street-beater NetApp’s fifth growth quarter in a row

NetApp had a knock-out second fiscal 2022 quarter, selling more flash arrays and object storage, and with a good pipeline for further sales. Its public cloud business is growing strongly and it’s uprated its guidance for the full year, despite looming supply chain issues.

Revenues were $1.57 billion — 10.9 per cent more than a year ago — with a profit of $224 million, up 63.5 per cent. Profit as a percentage of revenue was 14.27 per cent versus 9.68 per cent a year ago. NetApp really is doing well. (Although Pure Storage is growing faster, it is still not making a profit.)

CEO George Kurian’s statement said: “We delivered another strong quarter, with results all at the high end or above our guidance. Our performance reflects a strong demand environment, a clear vision, and exceptional execution by the NetApp team and gives the confidence to raise our full year guidance for revenue, EPS and Public Cloud ARR. We are gaining share in the key markets of all-flash and object storage, while rapidly scaling our public cloud business.”

NetApp generally seems locked into a $1.2B to $1.7B revenue/quarter area. Perhaps growing public cloud revenues can take it towards $2B/quarter revenues in the years ahead.

Hybrid cloud (meaning not public cloud) revenues increased eight per cent, to $1.48 billion, but that was the bulk of NetApp’s revenues. This growth was driven by StorageGRID object storage and all-flash arrays, predominantly ONTAP. The all-flash array annualised net revenue run rate increased 22 per cent year-over-year to $3.1 billion — a record — and product revenue overall grew nine per cent year-over-year to $814 million. Thirty per cent of NetApp’s installed base use its all-flash arrays, leaving lots of adoption ahead still.

Kurian said: “We once again gained share in the enterprise storage and all-flash array markets,” without identifying competitors who lost share. 

Public cloud-based revenues were $87 million — up 85 per cent annually — meaning an 80 per cent increase in annualised revenue run rate (ARR) to $388 million. The public cloud business, 17 per cent of that, has a lot of growth ahead of it to reach this heady level. Kurian said: “We remain confident in our ability to achieve our goal of reaching $1 billion ARR in FY25” — meaning $250 million/quarter in public cloud revenues.

He also pointed out: “Our public cloud services not only allow us to participate in the rapidly growing cloud market, but they also make us a more strategic datacentre partner to our enterprise customers, driving share gains in our hybrid cCloud business.” 

Kurian was especially pleased about NetApp’s relationships with AWS, Azure and GCP. “These partnerships create a new and massive go-to-market growth engine, as three of the largest and most innovative companies in the world are reselling our technology. … We are now the first and only storage environment that is natively integrated into each of the major public cloud providers.” 

He also said NetApp was coping with the supply chain challenges — but more challenges are coming.

Billings increased seven per cent year-over-year to $1.55 billion. There was a 14 per cent increase in software product revenues.

Financial summary:

  • Gross margin — 68 per cent;
  • Public cloud gross margin — 71 per cent;
  • Operating cash flow — $298 million;
  • Free cash flow — $252 million;
  • EPS — $0.98 compared to $0.61 a year ago;
  • Cash, cash equivalents and investments — $4.55 billion at quarter-end.

This was NetApp’s fifth growth quarter in a row. 

But it still has away to go to get back to fiscal 2018 revenue heights, and an even bigger hill to climb to claw back ground from fiscal years 2012 and 2013 — the glory years, in retrospect. 

In the earnings call NetApp said that it had a strong deal pipeline relating to on-premises datacentre modernisation. William Blair analyst Jason Ader argues: “This is mainly motivated by lower costs for storing large, production data sets (where cloud storage costs can be prohibitive), but can also be due to improved control (for compliance and regulatory reasons) and security.”

The outlook for the next quarter is for revenues between $1.525 billion and $1.675 billion — $1.6 billion at the mid-point — which would be an 8.8 per cent increase year-on-year. Full FY2022  revenues are being guided to being 9 to 10 per cent more than last year — $6.29 billion at the mid-point. This revenue increase is depressed by anticipated supply chain issues in the second half of FY2022.

DNA storage and science: archiving unreality

DNA storage researchers have devised a neat way to record events inside cells down to a one-minute granularity, but are wildly off-beam when predicting this could become useful as a technology to store general archival data.

Identifiable segments of DNA can be sequentially added to a base strand and represent ones or zeros. These indicate a particular change in the environment inside a living cell and so function as in-vivo ‘ticker tape’ data recorders. But the rate of ingest is so slow – three eighths of a byte per hour – that suggestions it could scale up for general archive use are little short of ridiculous.

Researchers led by Associate Professor of Chemical and Biological Engineering Keith Tyo at the McCormick School of Engineering, Northwestern University, Illinois synthesized DNA using a method involving enzymes.

A Northwestern University announcement declares: ”Existing methods to record intracellular molecular and digital data to DNA rely on multipart processes that add new data to existing sequences of DNA. To produce an accurate recording, researchers must stimulate and repress expression of specific proteins, which can take over 10 hours to complete.”

It is faster to add DNA nucleotide bases to the end of a single DNA strand. These bases — adenine (A), thymine (T), guanine (G) and cytosine (C) — can be grouped together in different combinations. The researchers’ Time-sensitive Untemplated Recording using TdT for Local Environmental Signals (TURTLES) method uses a DNA polymerase called TdT, standing for Terminal deoxynucleotidyl Transferase, to add the nucleotide bases. 

Their composition can be affected by the cell’s internal environment. According to a paper published in the Journal of the American Chemical Society, the researchers “show that TdT can encode various physiologically relevant signals such as Co2+ (Cobalt), Ca2+ (CalCium), and Zn2+ (Zinc) ion concentrations and temperature changes in vitro (in glass — the test tube). So Cobalt presence could cause more A and fewer G bases, and Cobalt absence could cause the reverse, giving us a binary situation: more A bases equals 1 and more G bases equals 0.

They were able to record sequential changes down to the minute level. “Further, by considering the average rate of nucleotide incorporation, we show that the resulting ssDNA functions as a molecular ticker tape. With this method we accurately encode a temporal record of fluctuations in Co2+ concentration to within 1 min over a 60 min period.”

They were also able “to develop a two-polymerase system capable of recording a single-step change in the Ca2+ signal to within 1 min over a 60 min period”. This means they can look at the timing of changes inside the cell as well as the individual step changes.

The researchers say their method is more than 10x faster than other intracellular solutions, transferring information to DNA in seconds instead of hours. So far so very good.

Brain research

This discovery could, they say, change the way scientists study and record neurons inside the brain — by implanting such DNA digital molecular data recorders into brain cells and looking at changes within and between millions of brain cells. The university’s announcement says that “By placing recorders inside all the cells in the brain, scientists could map responses to stimuli with single-cell resolution across many (million) neurons.”

Scientific team member and paper co-author Alec Callisto said: “If you look at how current technology scales over time, it could be decades before we can even record an entire cockroach brain simultaneously with existing technologies — let alone the tens of billions of neurons in human brains. So that’s something we’d really like to accelerate.”

But they would have to extract the DNA and ‘read’ it — and we are talking about a massive operation overall.

Archiving application

The university says this about the TURTLES method: “It’s particularly good for long-term archival data applications such as storing closed-circuit security footage, which the team refers to as data that you “write once and read never,” but need to have accessible in the event an incident occurs. With technology developed by engineers, hard drives and disk drives that hold years of beloved camera memories also could be replaced by bits of DNA.”

But almost certainly not by the TURTLES technology, because its speed is devastatingly slow. Talking to Genomics Research, Tyo said the researchers’ process wrote data at a rate of three bits an hour!

We read this four or five times to make certain it was that slow. The exact phrasing Tyo used is “up to 3/8 of a byte of information in one hour.”

Tyo speed phrasing in Genomics Research article.

Tyo also said that running millions of these processes in parallel would enable more data to be stored and written faster.

That’s true, but how much faster? Let’s try some simple math and assume we could accelerate the rate using five million parallel instances. That would be 15 million bits/hour, meaning 1.875 million bytes/hour, 1,875MB/hour or 1.875GB/hour. That means in turn, 31.25MB/minute and thus 0.52MB/sec.

This is a paralysingly slow write rate compared to modern archiving technology. A Western Digital 18TB Purple disk drive transfers data at up to 512MB/sec — 984 times faster. We would need to accelerate the TURTLES speed by 4.92 billion to achieve this HDD write speed. It seems an unrealistic, ludicrous idea.

Comparing TURTLES to tape makes for even more depressing reading. LTO-8 tape transfers compressed data at 900MB/sec — faster than the disk drive. LTO-9 operates at up to 1GB/sec with compressed data. We didn’t bother working out the TURTLES parallelisation factor needed to achieve these speeds.

Using TURTLES DNA storage for general archival data storage use would appear to be unrealistic. On the other hand, using it to record event streams inside cells is an exciting prospect indeed. 

Paper details

Bhan N, Callisto A, Strutz J, et al. Recording temporal signals with minutes resolution using enzymatic DNA synthesis. J Am Chem Soc. 2021;143(40):16630-16640. doi:10.1021/jacs.1c07331.

Get full content here.