Home Blog Page 262

AuriStor breathes life into Andrew File System

Andrew File System developer AuriStor updated attendees at an IT Press Tour briefing about its work on the file system with an HPC and large enterprise customer base dating back 16 or more years.

AuriStorFS (a modern, licensed version of AFS) is a networked file system providing local access to files in a global namespace that has claimed higher performance, security and data integrity than public cloud-based file-sharing offerings such as Nasuni and Panzura.

AuriStor is a small and distributed organisation dedicated to expanding the popularity and cross-platform use of AuriStorFS. 

AFS background

The Andrew File System (AFS) began life as the Andrew Project by Carnegie Mellon University, which was founded in a merger of the Carnegie Institute of Technology and the Mellon Institute of Industrial Research in 1900. The founders of these two institutes were Andrew Carnegie (steel industry magnate) and Andrew Mellon (banking magnate) — hence the eponymous Andrew Project.

AFS is a scalable client-server distributed file system, like NFS and SMB, with a global namespace, location independence, client-side caching, callback notification to clients of file system content changes, and replicated access to read-only data. It looks like a local file system on an AFS client. An AFS cell entity is one or more AFS servers and their clients forming an administrative domain.

OpenAFS is an open source implementation of AFS based on code made available by Pittsburgh-based, IBM-owned Transarc in 2000. Transarc was started up in 1989 by several Andrew Project members and IBM was an initial investor.

AuriStor

Jeffrey Altman.

AuriStor was founded in 2007 as Your File System, Inc., by CEO Jeffrey Altman. It is a small — you might even say minute — company, with just five employees listed on LinkedIn. However it says it has a distributed team with members in its offices in New York City, Cambridge MA, Edinburgh, Scotland, and Nova Scotia, Canada.

A briefing presentation slide showed just eight contributing developers since January 2019.

AuriStor’s aim was to accelerate AFS development by selling a licensed version of AFS, and thus fund its own engineering and support effort. Its AuriStorFS is claimed to be a better cross-platform offering than Microsoft’s Windows DFS, more reliable than Gluster, as performant as GPFS, and more cost effective than Panasas for general storage needs. AuriStor wants to sell its AuriStoFS to large, medium and small enterprises, and even individuals with smartphone client software.

AuriStorFS is backwards compatible with AFS and OpenAFS. It has been certified for Red Hat Enterprise Linux 8.4, making it the only AFS-family software certified for use on any Enterprise Linux distribution. It is validated for Debian, Ubuntu, CentOS, and Oracle and AWS public cloud use. Client support includes Linux, macOS and Windows.

The AuriStorFS source code is available to licensed organisations wanting to participate in its development as part of a private community. 

Competition

Here is a competitive matrix from the AuriStor website:

It appears to be somewhat dated. It does not include, for example, Panasas, Spectrum Scale, DAOS or WekaIO.

Scalability and performance

An AuriStor cell can store up to 2259 file streams of up to 16 exabytes — the maximum allowable distributed database size. OpenAFS tops out at 2GB — a tiny fraction of that. 

The number of Volume IDs per cell is 264 — much more than the 231 limit of OpenAFS. There can be 290 objects (directories or files) per volume compared to OpenAFS’s 226.

Auristor’s timestamp granularity is 100ns which compares to OpenAFS’s 1 second.

There’s no need to go on. AuriStorFS is ridiculously more scalable than OpenAFS.

AuriStor says AuriStorFS is faster than OpenAFS and performance-competitive with Lustre, GPFS (Spectrum Scale), Panasas and NFS v4.

Customers

AFS, through IBM’s support, gained HPC and large enterprise users, such as Morgan Stanley, which in 2004 had more than 25,000 hosts in 50+ sites on six continents. Morgan Stanley said AFS could provide WAN file sharing better than NFS as it had a better client:server ratio of hundreds to one compared to NFS’s then 25:1.

At that time AFS was seen as dated, but Morgan Stanley’s investment in it made it impossible for the firm to migrate away.

Morgan Stanley view of AFS in 2004.

Goldman Sachs is another AFS user, and replaced its OpenAFS client with the AuriStorFS client in 2017, deploying it to 25,000 hosts — a number which has since grown. Its experience is discussed in a 2020 YouTube video by VP Core Engineering Tracy Di Marco.

Slide from Goldman Sachs’ VP Tracy Di Marco’s presentation.

Di Marco said AuriStorFS “allows us to provide hundreds of thousands of hosts that perform business functions to the firm, at close to local disk speed, without repeated, possibly expensive, unnecessary network usage, and with a standard small disk allocation, rather than more expensive larger disks.”

Slide from Goldman Sachs’s VP Tracy Di Marco’s presentation.

Other AuriStor customers include CERN, Intel, the IRS, Qualcomm, United and KLM airlines. We couldn’t find any HPC customers and think that the number of such is low. WekaIO, for example, has never encountered AFS, OpenAFS or AuriStor in the HPC market.

Comment

This is a niche business. AuriStor is a small business exclusively focussed on its own products and supporting a relatively small number of extremely large customers. It is funded by license sales and not by venture capitalists, making it markedly different from competing file system suppliers such as WekaIO.

The main competition facing AuriStor in our view comes from the public clouds — AWS, Azure and Google — and applications such as CTERA, Egnyte, Nasuni and Panzura. Hammerspace, with its global filesystem namespace, is another competitor.

A main focus of AuriStor marketing is to encourage its adoption by OpenAFS users. A secondary focus, in our view, is to prevent defection by its user base to the public cloud. 

If it succeeds in moving down market from enterprise users with 20,000-plus servers then it will increasingly come into competition with CTERA, Egnyte, Nasuni and Panzura and may suffer from appearing more complex to install and manage than these suppliers’ products. At the same time it has more bells, whistles, levers and hooks for admin staff to use, secure and optimise it.

Pricing

The starting price for an AuriStor installation is $21,000/year for a perpetual use license with four database service and four file service instances. Additional file or database service instances within a cell start at $2,500 each and decrease to $1,000 each based upon quantity. Single server cells are licensed at a base price of $6,500. 

Fact Sheet

Get a 24-page downloadable AuriStor fact sheet as a PDF from here.

See Tracy Di Marco’s “Leveraging AFS Storage Systems to Ease Global Software Deployment” presentation here and the slides here.

Storage news ticker – November 23

CDS, which supplies datacentre component monitoring services, announced its Raytrix MVS Insight offering with monitoring of a broad array of server, network, and storage environments and integration with leading ITSM providers. Raytrix MVS Insight supports thousands of devices and systems from all the major OEMs including HPE, Dell EMC, Hitachi, IBM, Fujitsu, Juniper, CISCO and Brocade. The product provides comprehensive monitoring at enterprise scale, from single devices to thousands of systems. It features a holistic overview of monitored environments across systems and locations, a broad array of notification methods from SNMP to Slack, alert correlation and predictive utilisation normalisation, and flexible alert filtering and scheduling to avoid network congestion.

Startup HighTouch, which is developing software to enable companies to get their data out of data warehouses and into an application — a reverse to the normal ETL process — has raised $40 million in a B-round of funding. HighTouch software copies data from the data warehouse tables into a SaaS application’s tables, such as Salesforce. Data analysis is done inside the SaaS app and not the data warehouse. HighTouch integrates with BigQuery, Snowflake, and Databricks, and enterprise applications such as Anaplan, HubSpot, Mixpanel, Salesforce, Stripe, and Jira. Policies can be set up to define sync frequencies to keep the destination app up to date with data changes in the source warehouses. 

SaaS data protector HYCU introduced a Global PACE (Partners Accelerating Cloud Environments) Program. It eliminates tiers of engagement making it easier for partners to sell HYCU services. There are separate tracks for Managed Service Providers (MSPs), Cloud Service Providers (CSPs) and Managed Security Service Providers (MSSPs) plus enhancements for reseller and distributor partners. The program features a zero conflict promise, margin assurance, minimum advertised price, and seed and premier tiers.

JetStream Software has announced general availability of JetStream disaster recovery for Azure VMware Solution (AVS), offered through the Microsoft Azure Marketplace and sold by Microsoft sales teams. Microsoft and Jetstream say it’s the first cloud-native, disaster recovery as a service (DRaaS) for VMware-based clouds and on-premises VMware environments. An integration with NetApp enables Azure NetApp Files (ANF) to support storage expansion of larger data sets independent of compute, enabling a much faster recovery time and near-zero RTO — at a lower cost. The offering combines Azure Blob Storage, JetStream DR (VMware-certified), Azure VMware Solution, and ANF storage expansion. Microsoft and JetStream say it will radically transform the economics of disaster recovery for enterprises.

Andy Langsam.

Cloud storage provider Wasabi has opened a London, UK, region to expand the availability and speed of services throughout the UK. It is located in an Equinix datacentre and is Wasabi’s second European region. The first opened in Amsterdam in 2019. The London centre has high-speed network access from multiple carriers, making it easy for UK customers to connect, plus power and space expansion capabilities. Wasabi has plans to open additional storage regions in 2022. 

Veeam appointed Andy Langsam as GM Kasten Kubernetes Business Unit and SVP of Public Cloud for Veeam in August. He was Veeam’s SVP Public Cloud & Enterprise. He is an ex-SVP Sales at SolarWinds and COO of N2SW, acquired by Veeam in January 2018. As GM of the Kasten business unit he took over from prior GM, Niraj Tolia, the CEO and co-founder of Kasten, which was acquired by Veeam in September 2020. Tolia is now SVP for strategy at the Kasten Kubernetes BU.

Storage news ticker – November 22

Datto management believes it is currently the largest pure-play backup software supplier to the MSP market and growing faster than competitors ConnectWise, Kaseya, and N-able. (Thanks to Jason Ader of William Blair.) It think it is well positioned in the long term to capitalise on secular trends toward outsourced IT via the roughly $130 billion managed service provider (MSP) channel. Datto counts more than 18,200 MSP customers (out of an estimated 125,000 MSP providers worldwide), which creates abundant opportunity for deeper penetration. Management is looking to substantially expand its security capabilities over time to fulfil its stated mission of securing all digital assets for its MSP/SMB customers.

Quest Software announced GA of SharePlex v0.1.2 which can replicate Oracle databases in real time to MySQL and PostgreSQL. This can be useful when creating mobile or API-based applications, supported by PostgreSQL or MySQL databases, that require data from a legacy Oracle system. SharePlex for PostgreSQL and MySQL supports AWS (RDS and Aurora) and Azure (Azure Database for PostgreSQL and Azure Database for MySQL). It also supports replication to Oracle data warehouses, SQL Server data warehouses, PostgreSQL data warehouses, Kafka, which can then feed other systems, and Event Hubs, which can then feed other systems in the Azure ecosystem.

A new release of Raidix’s Era software RAID, version 3.4, provides improved automatic error correcting in case of write hole (inconsistent checksum or data), file drive error monitoring, and support for multi-path NVMe drives.

Semiconductor technology developer Rambus has posted a blog about CXL v2.0. It states:”Server architecture — which has remained largely unaltered for decades — is now taking a revolutionary step forward to address the yottabytes of data generated by AI/ML applications. Specifically, the datacentre is shifting from a model where each server has dedicated processing and memory — as well as networking devices and accelerators — to a disaggregated ‘pooling’ paradigm that intelligently matches resources and workloads.” CXL 2.0 supports memory pooling using persistent memory and internal-to-the-host DRAM. 

“By moving to a CXL 2.0 direct-connect architecture, datacentres can achieve the performance benefits of main memory expansion — and the efficiency and total cost of ownership (TCO) benefits of pooled memory.”

SK hynix’s desire to upgrade its DRAM manufacturing operation in Wuxi, China, by using Extreme Ultraviolet (EUV) lithography equipment to draw finer lines on wafers and thus make denser chips, has had doubt raised over it, according to Reuters. The export of EUV gear to China could fall foul of US State Department rules preventing high tech exports to China.

The Taipei Times reports that China’s Alibaba group will lead a consortium offering ¥50 billion ($7.8 billion) to take over the bankrupt Tsinghua Unigroup which makes semiconductor products. There are several Chinese state-backed bids to take over Tsinghua Unigroup, which is seen as important to China’s desire to be self-sufficient in making semiconductor products. It owns, for example, Yangtze Memory Technology Corp.

Yangtze Memory Technology Corp. 128-layer 3D NAND Chips.

DigiTimes reported Chinese NAND fabber YMTC has improved the yield rate for its 128-layer 3D NAND. Output could reach 100,000 wafers per month in the first half of 2022. That could/would mean lower NAND chip sales in China for Kioxia, Micron, Samsung, SK hynix and Western Digital.

Peter Pans – teenage startups with no filed IPO plans

Peter Pan
Peter Oan - Wikipedia public domain - https://upload.wikimedia.org/wikipedia/commons/2/21/Peter_pan_1911_pipes.jpg

What is a startup? What do you call a startup that refuses to grow up and have an IPO — a Peter Pan?

Is Nebulon, a four-year-old company that has raised $14.8 million in a single VC round a startup? Obviously, yes.

Is VAST Data, a six-year-old company that has raised $263 million in four funding events, a startup? We’ll say yes.

Is Fungible, a seven-year-old company that has raised $310.9 million in three rounds a startup? We’ll say yes, again.

Is Databricks, a nine-year-old VC-funded company that has raised $3.6 billion in eight funding events, a startup? The VCs clearly expect a great exit and don’t see this company as a never-to-grow-up Peter Pan.

What of Cloudian, an eleven year-old company that has raised $173 million in six funding events? Still a startup? Well, it’s not so clear, and we’d probably say not.

And how about Delphix, a fourteen-year-old company that has raised $124 million in four funding events. Do we still call it a startup? Fourteen years is a long time to be in startup mode.

Are fifteen-year-old Egnyte and Scale Computing, with $137.5 million and $104 million raised respectively,  startups? The founders are still in control. Both say they are growing. Both are VC-funded. And yet to call them startups just seems wrong — their starting-up phase is surely over.

The classic startup in the storage space is a venture capital-funded enterprise growing at a faster-than-market rate through technology development and demo, prototype product build, first product ship, to general availability and market acceptance, followed by an IPO or an acquisition.

Startup-to-IPO journey

Let’s look at three classic startup-to-IPO stories.

All-flash array supplier Pure Storage raised $470 million over six rounds and took seven years from founding to IPO. Six years later it is still going for growth over profitability and making losses.

Hyperconverged Infrastructure (HCI) vendor Nutanix took eight years and five funding events, raising $312.6 million, to reach its IPO. It’s still loss-making five years after IPO as it keeps on prioritising growth over profitability.

It took Snowflake nine years from founding to run its IPO. That effort needed seven funding rounds and $1.4 billion in raised capital.

The overall picture is eight years from startup to IPO. This is startup success. This, or being acquired — especially being acquired in a bidding war, as Data Domain. Acquisitions like this are successful but other acquisitions can be made at a discount to raised capital because the acquiree had not reached escape velocity and was a semi-successful (Actifio) or semi-failing (Datrium) company.

Escape Velocity

We can view venture capital as the rocket fuel needed to enable a startup to reach escape velocity and become a self-sustaining — meaning ultimately profitable — business.

Some startups clearly fail to reach escape velocity, such as Coho Data, which closed down six years after being founded and raising $65 million. Ditto Data Gravity, which closed down in 2018 six years after startup with $92 million raised.

A semi-success or semi-failure was Datrium, which was bought by VMware nine years after its founding with $165 million raised. Tegile was another similar company, bought by Western Digital with $178 million raised nine years after being founded

Other startups fail or semi-fail and get bought by private equity, which aims to extract value from the ruins and then sell it off. Thus PE-owned StorCentric bought seven-year-old Vexata, into which VC backers had ploughed $54 million.

Veeam was bought for $5 billion by private equity, but this was not a failure — it was a signal of its success, and an outlier.

And what of the rest? Older, VC-funded startups, such as Cloudian, Databricks and Delphix, that have not reached escape velocity?

The in-betweeners

Such companies are no longer operating in startup mode, surely? You can’t sustain that sense of developing completely new technology to change the world for nine, eleven or fourteen years. The backing VCs might still hope for a profitable exit — they surely do with Databricks. Yet these post-puberty companies, if you will forgive the expression, these near-teen and teenage companies, are in-between the startup phase and whatever their exit phase from VC funding will be.

HPEer Neil Fleming, tweeting personally, wrote: “Can we talk about what defines a startup? Databricks is eight years old, has 2K employees and turn over of $425Mm (according to Wikipedia). What’s the threshold of when something stops being a startup and starts being a business?”

Storage consultant Chris Evans identifies Peter Pan companies as the ones that never grow up. We listed pre-IPO, pre-acquisition, post-startup storage companies that could be classed as “in-betweeners” in a table, showing their age, number of funding events, and total raised:

This is far from an exhaustive list.

They are all nine or more years old, with Everspin topping this characteristic at 19 years of age. The in-between stage can clearly be a long march. 

Pivots

Some of them are marked as having pivoted — meaning that they changed the marketing direction of their technology. Thus Kaminario started out as an all-flash array hardware vendor, pivoted to being an all-flash array software vendor and then pivoted again, with a rebrand to Silk, and focussed its software technology on accelerating databases in the public cloud. It is fourteen years old and still has its founding CEO in charge. He’s managed to raise $313 million in nine rounds and the VCs are still, it appears, hoping for a successful exit. Similarly, GridStore became HyperGrid, which became CloudSphere. Reduxio has become Ionir, and Primary Data evolved into Hammerspace. These companies may prefer to say that their new enterprises are fresh startups, but we can see a technology trail and a founding team link between the old and new companies.

The oldest inbetweener we have in the table is Everspin. ExaGrid is 18 years old and still hoping for an IPO. HCI vendor Pivot3 was bought by Quantum 19 years after it started up, passing through 11 eleven funding events which raised $247 million. That was not a successful exit.

Nantero, developing carbon nanotube memory, is 20 years old, with $120 million funding accumulated through eight funding events.

One of the longest-lived in-betweeners is HPC storage vendor Panasas, at 21 years, and $155 million raised in, we think, eight rounds. It may well be profitable. We should note that DataCore, although older, at 23 years old, is an outlier. It had not been a VC-backed startup for many years until it took in $30 million in 2008. The company has been profitable for ten years and very recently bought Maya Data.

It is ridiculous, we would assert, to view DataCore, Everspin, Nantero and Panasas as startups. 

We think a better term is in-betweener, until a better one comes along at any rate. We won’t refer to any company more than nine years old – the Snowflake startup-to-IPO period – as a startup any more.

Panzura update brings array of power-user features

The second generation of Panzura’ CloudFS SaaS filesystem software, also known as Data Services, introduces two-tier licensing with complementary basic and per-user tiers, functional licensing, high-availability nodes and search across multi-vendor file storage systems.

Panzura says its global cloud file system acts like an enterprise NAS, and looks like a standard Windows file share to applications and users. The software has automated, distributed file locking and real-time data consistency for every location in a global cloud network. It is not using capacity-based licensing for its CloudFS software and has made it more enterprise-capable by adding support for highly available nodes, bringing in file scanning on non-Panzura systems, audit data retention improvements, plus a preview of automated CloudFS node configuration and management.

Edward Peters.

Panzura’s chief innovation officer, Edward Peters, said: “Now [users] can go beyond an exclusively storage-based approach and separately license access for value-added functions such as enhanced audit and search capabilities.”

The configuration and management services include faster connection status detection of CLoudFS nodes, new inventory management tables, a Dataset View of a customer’s data sources for a CloudFS deployment, and a Node View feature to access an inventory of all CloudFS nodes.

Basic tier

The Basic tier allows file access and provides:

  • Inventory of CloudFS components;
  • Summary of component operational status;
  • Monitoring and reports on metrics for the CloudFS network;
  • Configurable alerts if storage, system and cloud thresholds are exceeded — quotas can be set — and may require attention;
  • List of plug-ins for nodes registered on Data Services.

Disaster recovery capabilities enable the instantaneous rollback of files to a previous, granular state prior to accidental deletion or file corruption by ransomware or other malware exploits, with no data loss. 

Licensed tier and functions

This is a tier of licensable services with separate licensing for Search, Audit and Audit Retention.

The Search license is provides up-to-the-second results and insight into the entire global file system including recovery services, quota services and data analytics.

This global function enables search and data analysis across all connected file systems, including Panzura and other compatible SMB and NFS file shares, for files, directories and snapshots. Data Services ingests the metadata of files and directories from connected file shares, including CloudFS, NetApp, and Dell EMC Isilon according to a periodic schedule, and indexes them into a searchable database.

The Audit license allows users to track and audit file opens, changes and access, find renamed or deleted files and recover them with a single click. 

Audit functionality is provided by ingesting file logs in near real-time, providing a one-click view into user actions taken on files such as read, write, delete, move or update permissions.

The Audit Retention license provides for 180 days of data-log retention — double the prior amount. Multiple licenses can be purchased to cover longer data-retention periods.

Comment

In September, Panzura said it had been officially refounded, almost a year and a half since it was bought by private equity and given a new CEO. Refounding means, we think, that a founders-type mindset has been reimposed on the company to get it growing again and to get fresh employee commitment. The website was given a new look and the company gave itself a fresh new logo as well to mark a break from the past. Behold:

New Panzura logo.

Let’s see if Panzura gets growing again and becomes a file collaboration and sharing company to be reckoned with. Download Panzura white papers from here to get more details of its software’s capabilities.

Storage news ticker – November 19

AT SC’21 Dell EMC announced new and updated Dell Technologies Validated Designs available for Siemens Simcenter Star CCM+. They provide multi-physics CFD simulation on the latest Dell EMC PowerEdge servers with PowerSwitch and InfiniBand networking, and options for HPC storage. A new Dell EMC PowerSwitch S5448F-ON top-of-rack switch has higher density with 100/400Gbit/sec speeds, and simplifies complex network design, deployment and maintenance when running the Dell EMC SmartFabric OS10 with integrated SmartFabric Services, or CloudIQ for cloud-based monitoring, machine learning and predictive analytics.

MemVerge (in-memory computing with virtualised memory) has a SpotOn Early Access Program. It asks: “Do you have non-fault-tolerant and/or long-running workloads, and would like to run them on much lower cost AWS Spot Instances? Then join our SpotOn early access program to mitigate the risk of terminations which can save you a ton of heartache and money.”

Micron has shipped the first batch of samples of LPDDR5X memory built on its first-to-market 1α (1-alpha) node. It’s designed for high-end and flagship smartphones, powered by artificial intelligence (AI) and for 5G.  Micron has validated samples supporting data rates up to 7.5Gbit/sec, with samples supporting data rates up to 8.533Gbit/sec to follow. Peak LPDDR5X speeds of 8.533Gbit/sec deliver up to 33 per cent faster performance than previous-generation LPDDR5. 

AKQUINET, MT-C and the Fenix consortium have teamed-up to enhance the multifunctional Data Mover NODEUM to provide a data mover service within the Fenix infrastructure. This service will allow users to seamlessly manage data stored in high-performance parallel file systems as well as on-premise cloud data repositories. There are different Fenix locations, including the European supercomputing centres JSC (Germany), BSC (Spain), CINECA (Italy), CEA (France) and CSCS (Switzerland). Federated cloud object stores are used at these locations, which enable researchers to exchange their data. With the Data Mover, researchers can copy their data locally to the existing parallel file systems in order to process them on the fast HPC systems. They can also copy data to cloud object stores to make them accessible to other researchers globally.

… 

OWC announced the OWC miniStack STX, the world’s first Thunderbolt 4-certified storage and hub expansion box that stacks with the Mac mini. It is also a Plug and Play expansion companion for Thunderbolt or USB-equipped Macs, PCs, iPads, Chromebooks, and Android tablets. It has three Thunderbolt 4 (USB-C compatible) ports, a universal HDD/SSD bay and an NVMe M.2 SSD slot to provide storage capacity expansion and can be combined in a RAID 1 configuration. There is up to 770MB/sec of storage performance, and the OWC miniStack STX is good for bandwidth-intensive video editing, photography, audio, virtual machines, and everyday data backup and access tasks.

OWC miniStack STX.

NAND Flash controller maker Phison announced its consolidated revenue in October 2021 ($216M) was the highest single month in the company’s history, and cumulative revenue for the first ten months of 2021 was also a record high for any ten-month period ($1.848B). It has increased its head count to more than 3000 globally, including expanded Enterprise SSD Engineering Lab in Colorado. Phison recently signed a ten-year green power purchase agreement to minimize its environmental impact.

PoINT Data Replicator 2.0 has been announced with S3 Bucket Notification. Version 2.0 is based on GUI independent services. Several replication jobs can be run in parallel and independently. Jobs can be triggered manually; alternatively, execution is controlled by a newly integrated scheduler. With the automated S3 bucket notification PoINT Data Replicator can be used to execute backups of object storage systems. With this function, only new objects are identified and replicated, so that the replication process is much more efficient. The regular jobs then perform a full data synchronization for control purposes. 

… 

Fog busters! Data and storage category clarification

My mental data and storage classification landscape has been getting confused and briefings with suppliers made it obvious a refresh was needed.

Going back to the basic block-file-object data types I had a landscape clear-out and then tidied everything up — notably in the mission-critical-means-more-than-block area. See if you agree with all this.

Update. Suppliers and storage categories diagram updated to remove VAST DAta from block category and add FlashArray//C. 25 Oct 2020.

Block storage is for structured — meaning relational or traditional database — data, and is regarded as tier 1, primary data, and also as mission-critical data. If the ERP database stops then the business stops. Typically such data will be stored in all-flash arrays attached to mainframes — big iron — or powerful x86 servers.

Illustrating this diagrammatically started out being simple, on the left, but then became more complex as we moved rightwards:

File and object data is regarded as unstructured data, also often as secondary data — which it may be, but that is not the whole story. Because file data certainly, and object data also, can be mission-critical. It can be primary data and will be put in all-flash arrays as well.

It’s different from block-based primary data where the applications are all database-centric.

Primary file and object data storage tends to be used in complex application workflows where applications work in sequence to process the data towards a desired end result. Think entertainment and media with a sequence of steps from video capture through special effects stages with layers of effects built up, through editing to production. The same set of files is being used, added to, and extended in a concurrent, real-time fashion needing low latency and high throughput.

The same is true in a hospital medical imaging system, though with fewer stages. There is image capture which can involve many and large scan files, analysis and then comparison with potentially millions of other files to find matches, and follow treatment and outcomes trails to find the best remedial procedure.

This needs to be done in real time, while the consultant doctor is with the patient and reviewing the captured image scans.

These both seem similar to a high-performance computing (HPC) application involving potentially millions of files with low-latency and parallel, high-throughput access by dozens of compute cores running complex software to simulate and calculate results based on the data.

We might say that primary file and object storage is needed by mission-critical applications (at the user level certainly and organisation level possibly) which have enterprise HPC characteristics. A separate use case is for fast restore of backup data.

Secondary and tertiary data

Secondary data is different. It is unstructured, file and object-based, not block, and it is much less performance-centric, both in a latency and throughput sense. The need is more for capacity than performance. This does not, therefore, need the speed of all-flash storage and we see hybrid flash/disk or all-disk arrays being used.

Tertiary data is the realm of tape, of cold storage, of low access rate archives. These can be either be on-premises in tape libraries or in the public cloud, hidden behind service abstractions such as Glacier and Glacier Deep Archive.

Suppliers

Having laid out this data and storage landscape we can match some suppliers’ products to it and see how valid the ideas are.

Let’s start with a straightforward matching exercise: Pure Storage. Its FlashArrays are for storing and accessing primary block data. Its FlashBlade products are for storing and accessing primary unstructured data and for the fast restore of backup data. 

Block area extended to overlap File and Object area to reflect Pure’s FlashArray//C product.

Infinidat is another straightforward case. Its arrays store block data, hence primary data and, although predominantly disk-based, have effective DRAM caching and caching software to provide faster-than-all-flash performance. Recently it has launched an all-flash system with the same DRAM caching.

VAST Data supplies a one-trick hardware pony — this is meant in a positive way — an all-flash system with persistent memory used to hold metadata, and compressed and deduplicated data stored in QLC flash. This so-called Universal Storage is presented as a single store for all use-cases except primary block data.

NetApp supplies all-flash FAS for primary data, both file and block, hybrid FAS systems for secondary unstructured file data, and StorageGRID systems for object data, both primary (all-flash StorageGRID) and secondary. 

A look at Dell’s products suggests that PowerMax is for primary block data, PowerStore for primary block and file data and secondary file data, and PowerScale for primary and secondary unstructured file data. ECS is for object data.

We can position Qumulo too. It’s for primary and secondary file data, with WekaIO targeting primary file data with its high-performance file software.

The two main take-aways from this exercise are that, first, unstructured data refers to both primary and secondary data storage, and second, that primary unstructured data is being used in enterprise HPC applications across a variety of vertical markets.

Analyse this: storage startup funding in 2021

$5.74 billion was invested in 36 storage startups this year, with the vast majority of it – $4.55 billion – going into data lake/warehouse and analytics companies. 

We charted the storage-related startup funding we covered during the year and saw that Databricks easily topped the rankings with its extraordinary $2.6 billion raised – nearly half of the total – distantly followed by Fivetran with $565 million, just under ten per cent of the total. Here’s the chart:

Blocks and Files data.

We subdivided the recipients into categories: analytics and warehouses, data protection, security, cloud storage, hardware-led, storage software and Kubernetes-related startups. Recognising that some startups were in two of these categories, we then totted up the amount raised in each category:

  • Analytics and warehouse – $4.55 billion (79.3 per cent);
  • Data protection – $1.0 billion (17.4 per cent);
  • Security – $272.4 million (4.7 per cent);
  • Cloud storage – $418.75 million (7.3 per cent);
  • Hardware-led – $418.75 (7.3 per cent);
  • Storage software – $137.0 million (2.4 per cent);
  • Kubernetes-related – $48.0 million (0.8 per cent).

The security total is heavily affected by Acronis’s $250 million raise. It was a mere $72.4 million without that. Pure-play storage security startups are not popular. Hardware-led storage startups raised $418.75 million – the same as the cloud storage startups. 

The non-Kubernetes storage software startups raised $137 million, while the two Kubernetes-related startups raised $48 million – less than one per cent of the total. Storage software as a whole accounted for $185 million. Software is not yet eating the storage hardware startup market.

Storage startup funding categories.

Next, we counted the startups in each category to see how popular they were numerically:

  • Analytics and warehouse – 16 
  • Data protection – 5
  • Security – 2
  • Cloud storage – 6
  • Hardware-led – 9
  • Storage software – 5
  • Kubernetes-related – 2

Hardware-led storage startups often have software involvement too – especially array companies like VAST Data and OpenDrives. We were surprised that there were more hardware startup funding rounds than data protection rounds, although the data protectors raised more than twice as much money.

So what does it all mean?

Our conclusion is the obvious one: massive amounts of VC funding is flowing into the analytics/data lake/data warehouse space, with data protection the next most popular category. The fact that cloud storage is getting more funding than other (on-premises) storage software companies is not unexpected – if we take out the VAST Data and OpenDrives hardware-led storage software players we are left with a derisory $34 million.

The low level of funding – $48 million – going into just two Kubernetes-storage related startups was surprising. We think that the mainstream suppliers have adopted Kubernetes quickly and effectively, and that open source development is cheap. In effect, we could say that the Kubernetes storage startup wave is over. Finished.

Essentially the bulk of storage startup funding this year has gone up-stack, away from the hardware and software boxes and into data-handling software. That is, software to protect the data, and – the big bucks golden goose area – software to store, access and manage large amounts of data for analytics.

Storage news ticker – November 18

Storage news
Storage news

Acronis announced that 2021 was the best year in its history. Ronan McCurtin, VP sales Europe, Israel and Turkey, said: “This has been a record-setting year in revenue growth and partner recruitment.” Over 20,000 service providers are using Acronis Cyber Protect to protect more than 750,000 customers. As of end of the third quarter in September, Acronis in Europe grew by 118 per cent in the number of workloads protected, and by 35 per cent in number of customers protected. To date, Acronis has added 408 new employees and expects to close the year at over 450 new positions globally. It has 38 datacentres across the globe, of which 18 are in Europe. Acronis has a roadmap with more products and updates to come in 2022.

Startup Alluxio has pulled in a $50 million C-round of funding from a16z, Seven Seas Partners, and Volcanics Venture. It will spend the cash on opening an office in Beijing, product development, go-to-market, and engineering operations. Alluxio’s open source software virtualizes data stores and enables big data analytics apps to access multiple and disparate data stores in a single way instead of using unique access routes to every storage silo.

Druva has rebranded its product set, calling it a Data Resiliency Cloud, a multi-tenant, cloud-native SaaS offering based on AWS, with on-demand scaling — not that its announcement admitted it was a rebrand. It claims this offers support for all data and applications with globally distributed, cross-cloud recovery capabilities. Druva says it has automatic ransomware protection, scalable orchestrated recovery, and complete operationalisation with pre-built integrations with Security Monitoring (SIEM) and Security Orchestration (SOAR) tools. The DRC can be used for for multiple use cases including backup, disaster recovery, archiving, e-discovery, data cloning, and analytics.

FileShadow announced that it now archives email and attachments from multiple email accounts into its cloud file vault. In addition to connecting Box, Dropbox, Google Drive, local hard drives, and more, FileShadow collects email messages and attachments from Microsoft Exchange, Office 365, Gmail, iCloud Mail, Yahoo! Mail, and generic IMAP servers into its cloud file archive. Once in the FileShadow vault, a machine learning process analyses the content to create metadata tags. Messages and attachments can be organised, searched and published, along with other files from disparate storage locations.

HYCU said analyst house DCIG has namedHYCU for VMware a Top Five Midsize Enterprise VMware vSphere Backup Solution. HYCU boasted that DCIG noted its ability to discover, identify and setup protections for the application each VM hosts, focus on fast backup implementations and eliminating the need and cost of introducing a physical backup server, among others. Thirty suppliers were considered by DCIG and the top five were Acronis Cyber Protect, Arcserve Unified Data Protection (UDP), HYCU for VMware, Quest NetVault Plus, and Unitrends Unified BDCDR.

Analytics database supplier Imply has appointed DevOps expert Eric Tschetter as its field chief technology officer (CTO). This unites the four original creators of the open source and real-time analytics database, Apache Druid, including CEO and co-founder Fangjin ‘FJ’ Yang, co-founder and CPO, Vadim Ogievetsky, and co-founder and CTO, Gian Merlino. Tschetter will be based in Imply’s Tokyo office, report to Yang, and be instrumental in driving Imply’s newly announced “Project Shapeshift” initiative forward.

Intevac, a supplier of thin-film processing systems, today announced the receipt of $10 million in new HDD orders. These orders consist of one 200 Lean system as well as technology upgrades, which will add to the company’s backlog at year-end 2021. The technology upgrades are expected to be installed through mid-2022 and the 200 Lean system is expected to be delivered in the second half of 2022.

Kioxia America has joined the STAC Benchmark Council. 

NAKIVO has announced IT Monitoring for VMware vSphere and has improved backup protection against ransomware in a v10.5 Backup & Replication release. IT admins can monitor the memory, CPU and disk usage of their VMware vSphere infrastructure from the NAKIVO Backup & Replication unified web interface. This is useful for real-time diagnostics and strategic planning. They can project future changes in resource consumption to avoid bottlenecks, from the same interface they use to manage their backups.

Nutanix announced announced that Anja Hamilton will join as chief people officer (effective January 4, 2022). She will help continue, we’re told, to shape a growth-minded, aligned workforce with a focus on employee wellness and diversity and inclusion. She will be responsible for the company’s global people strategy and operations. 

Privacera announced it has joined Snowflake’s Data Governance Accelerated Program to help customers enable secure data sharing and collaboration. Joint customers will be able to manage their cloud data and enforce data access policies with the help of Privacera’s security platform and unified data access governance offerings.

Scality announced  its ARTESCA object storage solution is qualified as Veeam Ready Object with Immutability. With this certification, both the ARTESCA and RING products are now ransomware protection-enabled.

Boom! DataCore arrives in Kubernetes storage space by buying MayaData

Chicen Itza El Castillo
Chicen Itza El Castillo - by Daniel Schwen

Block, file and object storage software supplier DataCore has bought MayaData and its Kubernetes storage business for an undisclosed sum. DataCore’s Caringo object storage software has notched up two seven-figure deals with a prominent west coast US technology supplier and a US government department.

MayaData develops and sells its MayaStore OpenEBS-based, hyper-converged, container-attached, block storage software into the Kubernetes market. It comes with a so-called Maya orchestrator. MayaData also offers Kubera as a Kubernetes management software layer that includes OpenEBS and physical storage layer management.

Dave Zabrowski.

Dave Zabrowski, CEO of Datacore, said: “With this acquisition, DataCore is proud to remain the independent software-defined storage vendor with the broadest product offering spanning block, file, object, HCI, and now container-native storage — and the deepest IP portfolio.”

He wants us to know that: “We are committed to investing in OpenEBS as an open source technology, and expanding the community of users, developers, and contributors around it, while providing a streamlined path to leveraging container storage fast, easily, and affordably. You’ll be hearing from us soon with additional solutions for this space.”

MayaData

MayaData was founded as CloudByte in 2011, relaunched as MayaData in 2018 when Evan Powell became its CEO, and has taken in $32 million in funding. In March this year Intel said: “OpenEBS MayaStor is the fastest open source storage for Kubernetes.” A CNCF 2020 Survey cited OpenEBS as the number one cloud-native storage software used in production. 

MayaData’s entire San Jose, CA-based team becomes part of DataCore, which will support and invest in OpenEBS and the community around it, and accelerate product development and go- to-market activities. DataCore’s channel can now resell the MayaData software.

The MayaData acquisition deepens DataCore’s involvement with Kubernetes — its SANsymphony product already has a CSI plug-in. 

Nick Connolly, technical lead for the CNCF Technical Advisory Group on Storage and DataCore’s chief scientist, said: “MayaData inside DataCore will have the technical resources to reach a wider community of enterprises, faster, while meeting the strict technical requirements of enterprise applications.”

Kiran Mova, architect and maintainer of OpenEBS, also co-founder and chief architect at MayaData, said: “OpenEBS has become the de-facto standard for enterprise workloads on Kubernetes. OpenEBS is seeing millions of pulls every month, and every survey shows it has a good lead on other storage technologies, which reflects the technical superiority of its architecture.”

Timeline

  • April 2008 — Venture Capital firm Insight Venture Partners and Updata Partners invest $30 million in DataCore.
  • April 2018 — Dave Zabrowski becomes DataCore CEO as prior CEO George Teixeira becomes chairman.
  • February 2020 — $26 million invested in MayaData by AME Cloud Ventures, DataCore and Insight Partners. DataCore gets minority share of MayaData, access to its technology and becomes a MayaData reseller.
  • January 2021 — DataCore buys Caringo and its Swarm object storage technology.
  • September 2021 — Don Williams becomes MayaData CEO with ex-CEO Evan Powell staying on the board.
  • November 2021 — DataCore buys MayaData.

In a briefing, Zabrowski told Blocks & Files that Pure Storage’s Portworx was less open than the community-focussed OpenEBS and that Pure Storage was hardware-led, unlike DataCore.

Dave Zabrowski

DataCore CEO Dave Zabrowski was the co-founder and CEO of Cloud Cruiser from 2010 to 2017. Before that he’d been a CEO at Neterion and a general manager at HP. Neterion was acquired by Exar in 2010. Cloud Cruiser developed SaaS application software that provided analytical insights into hybrid cloud consumption for enterprises, SIs and service providers. Its software was licensed by HPE, Cloud Cruiser’s largest customer, as part of its Flexible Capacity offering. This enabled HPE to meter and bill for usage of on-premise IT infrastructure in a pay-as-you-go, cloud economics model

Cloud Cruiser, which had raised $20.7 million in funding, was bought by HPE in January 2017 for an unrevealed amount. Cloud Cruiser became the HPE Consumption Analytics Portal, part of GreenLake Flex Capacity. Zabrowski stayed with HPE for a few months and then joined DataCore.

DataCore and Caringo deals

DataCore tells us that it has made two major, seven-figure deals for its Caringo object storage software. Neither customer was identified beyond one being a significant US technology supplier and the other a US government organisation. DataCore does not have FedRAMP (Federal Risk and Authorization Management Program) certification but some its US reselling system house partners do.

This means DataCore potentially beat Cloudian, Dell EMC ECS, Minio, NetApp StorageGRID, Scality and other competition. That causes us to uprate our idea of Caringo’s technology capabilities and see DataCore as an enterprise object storage supplier.

Zabrowski said he wants DataCore to become the largest privately owned storage software supplier. Yes, an IPO could lie ahead. However, both of Zabrowski’s two previous companies — Neterion and Cloud Cruiser — were acquired. 

Comment

DataCore has expanded into the Kubernetes storage space and its storage protocol coverage for applications looks like this:

There are possibilities for MayaData to use DataCore’s file and object technology portfolio. We were surprised to learn that Caringo has won seven-figure enterprise storage deals, having thought it more of a small and medium enterprise supplier — and a niche one at that. 

An examination of Zabrowski’s background, together with the Caringo deals, suggests that DataCore could well emerge to be a significant enterprise storage player — one to be considered alongside the mainstream suppliers. It has not had a relationship with Gartner that gets it entry into Magic Quadrants and Critical Capabilities reports and such like. Zabrowski said that could well change as DataCore becomes more active in its enterprise marketing.

DDN turbo-charges SFA arrays with massive bandwidth expansion

DDN has accelerated its SFA200 and SFA400 NVX arrays by up to 156 per cent by adding more and faster PCIe lanes and upgrading the controllers to Gen-3 Xeon Scalable Processors.

These two block-access arrays share a common chassis (two rack units x 24 slots) which accepts 2.5-inch NVMe SSDs. Client access links have been upgraded as well, with faster base bandwidth. These systems are the foundation of DDN’s storage portfolio and are available as EXAScaler ES400NVX2 and ES200NVX2 scale-out, parallel file storage systems, as well as the recently announced AI400X2 appliances.

SFA400VX2 front panel.

DDN says the new systems are designed to handle the modern data needs of AI applications, like natural language processing, financial analytics, and manufacturing automation, and production systems. In its view the new systems are needed to handle the ever-expanding amount of data these applications require, and to deliver the data fast enough for real-time processing and insight.

We have tabulated the old NVX2E and new VX2 systems using DDN datasheets to bring out the main improvements:

 

The sheer IOPS performance has not improved at all but the bandwidth has risen significantly, particularly with the old SFA400 putting out 25GB/sec when sequentially writing and its replacement rocketing along at 64GB/sec. There is vastly more internal PCIe bandwidth and speed with the new systems — up to 288 lanes of PCIe 4, versus 192 lanes of half-as-fast PCIe 3 in the replaced systems.

The new SFA400VX2 can be deployed in a hybrid (flash+disk) configuration and have 6.4PB of capacity in half a rack using DDN’s 90-bay drive enclosures. This uses 16 lanes x 4 ports (64 lanes) of SAS connectivity. The new systems can also have up to 64 PCIe lanes providing client connectivity.

The EXAScaler systems have so-called Hot Pools technology which optimises for both performance and low cost by transparently migrating data between flash and disk tiers. 

Get a datasheet for the new systems from this DDN webpage.

StorONE jumps on ransomware protection train with fast-recovering backup system

Storage startup StorONE has developed a fast-recovering, ransomware-protecting, flash-to-disk auto-tiering version of its software called S1:Backup.

S1:Backup is a backup target for Commvault, HYCU, Rubrik and Veeam’s data protection software to provide fast ingest to ransomware-proof storage based on a 4–8 SSD flash tier with automated movement of older backups to high-capacity RAID-protected disk — more than 15PB of it. StorONE also has the ability for the clustered S1 array to be a standby production system, thus offering instant recovery.

Gal Naor, StorONE’s founder and CEO, said: “When we examined the points of exposure, we discovered a gap between backup software innovations and storage hardware capabilities, which led to vulnerabilities and higher cost. Legacy backup storage is the cause of that gap.”

In his view, “S1:Backup fills it and enables companies to complete their ransomware recovery strategy and extract the full value from modern backup software by eliminating ransomware concerns and elevating backup to deliver high-availability.”

StorONE claims S1:Backup delivers 100 per cent ransomware resiliency. In general, existing ransomware-protecting systems “force customers to use S3 for immutability. S1:Backup is the industry’s only solution that delivers immutability across all its protocols, including NFS, SMB, S3, iSCSI, Fibre Channel, and NVMe-oF.”

The company claims that high-capacity disk drives can be used in its S1:Backup because its software rebuilds a failed 18TB disk drive in three hours while other suppliers’ arrays can take days or even more than a week. And that situation only gets worse as capacities rise to 30TB and beyond. Newer, higher-capacity drives can be mixed with older, lower-capacity drives in the StorONE system as well, making scale-up more straightforward.

StorONE Ransomware Protection graphic.

S1:Backup captures every backup job in an immutable state, handling hundreds of incoming incremental backup streams with 30-second snapshot intervals, and retains immutable copies indefinitely. Its software can operate at normal speed when its disks are 90 per cent full — there is no performance drop-off past the 50 per cent utilisation level.

This high-utilisation supports long-term storage. The software detects and corrects so-called bit-rot errors which can occur in long-term disk storage. Naor claims this is a future unique to StorONE. S1:Backup can also consolidate block-level incremental backups and create virtual fulls in minutes by using its 100TB+ flash tier.

StorONE also claims a pricing advantage — its pricing is openly available on its website so that customers can make comparisons with competing suppliers’ systems.

When the S1:Backup becomes a standby production system, backed-up VMs can operate in the array and use block, file and object protocols. The S1:Backup system can expand and be used for other use cases — such as archive, NAS, virtual machine storage, databases and even HPC and AI. 

Comment

StorONE has close to 100 customers, Naor says. “We are growing fast from several aspects,” he told Blocks and Files, and mentioned repeat orders from existing customers as one example. “Repeat orders give us a lot of confidence.”

The alliances with Commvault, HYCU, Rubrik and Veeam, together with its S1:Backup features — such as ransomware protection, fast ingest and restore, stand-by production capability, and high disk utilisation and fast RAID rebuild capabilities — should make it an attractive alternative to other backup storage targets.

Naor thinks that the market is not yet ready for all-flash backup targets like FlashBlade because they are currently too expensive, and S1:Backup can provide similar or equivalent performance with cheaper archiving.