Western Digital was late to market with 16TB Ultrastar nearline disk drives, gifting Seagate an opportunity for its 16TB Exos drives. The arch-rivals are now in a race to see who can ramp 18TB drive manufacturing faster and ship the most drives to customers this year.
Enterprise nearline drives with 10TB or more capacity are the biggest disk drive category by revenue. They are used by businesses to store the ever- growing volume of unstructured data from their operations.
In response to data volumes, Seagate, WD and their smaller rival Toshiba, are constantly developing higher capacity drives to retain and gain market share.
In June 2019 Seagate announced it was shipping a 16TB Exos drive. And in November the company boasted of the fastest product ramp in its history for this type of drive.
WD announced the Ultrastar HC550 16TB and 18TB nearline drives in September 2019, using its ePMR technology, but units became generally available only last month – some 11 months after Seagate’s 16TB Exos. The delay meant WD lost significant shipment capacity market share.
In May 2020, WD’s first quarter 2020 results prompted Wells Fargo analyst Aaron Rakers to suggest to CEO David Goeckler: “It looks like you definitely kind of underperformed some of your peers on nearline” – i.e., Seagate and Toshiba. Goeckler did not deny it.
This month, Rakers estimated Seagate shipped 79.5 EB of nearline capacity drives in the second 2020 calendar quarter, ahead of WD’s 76 EB, and maintaining its 16TB ship share advantage.
18TB race
in an effort to catch up, WD said this month it will boost 18TB drive production to one million units in the fourth 2020 quarter, amounting to 18EB of capacity.
But Seagate started shipping 18TB Exos drives to selected customers in June, with a manufacturing ramp beginning by year-end. It has not yet released any 18TB drive ship numbers, at time of writing.
A 20TB Seagate drive using its HAMR (Heat-Assisted Magnetic Recording) should also start shipping before the end of the year. Blocks & Files expects WD to announce ship dates for its 20TB MAMR drive by then. The nearline capacity race goes on.
Seagate began shipping 16TB Exos nearline drives in June last year, before Western Digital had even announced its own 16TB drive tech. It took another eleven months for WD to ship UltraStar HC550 16TB and 18TB drives. And the key to getting the drives out of the door was a hitherto unexplained technology called ePMR.
WD has now revealed some ePMR details in a short document, which was flagged up to us by analyst Tom Coughlin in a subscription mailing. We’ll explain why ePMR was necessary for WD and then take a closer look at the technology.
Today’s perpendicular magnetic recording (PMR) technology is reaching the end of the technology road. Its ability to make bits smaller is compromised by bit value stability becoming unreliable as the areal density approaches 1TB/in².
For over a year WD has talked up its forthcoming microwave-assisted magnetic recording (MAMR) technology as a means to overcome this obstacle. MAMR uses microwaves to write data to a reformulated and more stable magnetic material that can pack smaller bits packed more closely. The technology provides a path to 40TB drives, according to WD.
However, WD did not use MAMR in the UtraStar HC550s, plumping instead for ePMR to increase the areal density of the drives. We infer that the company chose this stopgap because of MAMR issues, but we have no insider knowledge here.
Whatever the reason for implementation, ePMR enabled the 18TB UltraStar HC550 (1,022Gb/in²) to achieve 13 per cent areal density increase over WD’s 14TB HC530 drive (904GB/in²). WD is now in a race with Seagate to ramp up 18TB nearline disk drive production.
Bias current
This ePMR tech is briefly explained in a WD technical brief, Continuous Innovation for Highest Capacities and Lower TCO, which states the HC550 and HC650 “introduce the industry’s first Energy-Assisted Magnetic Recording (EAMR) technology.”
This is ePMR, which “applies an electrical current to the main pole of the write head throughout the write operation. This current generates an additional magnetic field which creates a preferred path for the magnetisation flip of media bits. This, in turn, produces a more consistent write signal, significantly reducing jitter.”
The need for such a bias current is that disk recording heads can provide an inconsistent magnetic field to bits because their write currents can be distorted – so-called “jitter”. This effect makes bit value signal recognition more difficult and worsens as bits decrease in size and are placed closer together.
A WD chart shows the effect of the bias current being applied, with the write process becoming more consistent:
These HC550 and HC650 disk drives also use triple-hinged actuators to enable finer control of the read/write head’s placement on the disk platter.
Blocks & Files expects WD to announce full-scale MAMR drives with 20TB capacities and higher by the end of the year.
This week’s storage news roundup features Weka going to Mars, Scality going to jail, and host of supporting stories.
WekaIO goes to Mars
NASA is using WekaIO to feed file data to four Nvidia GPUs for Mars lander descent simulations.
The lander arrives at Mars travelling at 12,000mph and slows its descent using retro-propulsion. This takes seven minutes from arrival to touch down. The Martian atmosphere is too thin for a parachute-slowed descent. NASA’s simulation examines how the rocket exhaust interacts with the Martian atmosphere in real time.
A NASA video tells us this involved 150TB of data and took a one week run on Summit at Oak Ridge National Laboratories, with more than 27,000 Nvidia GPUs to produce the simulation dataset. This volumetric data contains about 1 billion data points, each with seven attributes.
The simulation can be played in realtime with help from WekaFS. The system stores the data set files on NVMe SSDs, with data fed in parallel at 160GB/sec to four GPUs via Nvidia’s GPU Direct, thus bypassing their host server CPU and DRAM. Without GPU Direct the bandwidth drops to just over 30GB/sec.
Scality goes to Irish jail
The Irish Prison Service (IPS) is using Scality RING storage to store videos from 5,000 video cameras spread across 12 jails and around 4,000 inmates. To conform with Irish data protection rules, the footage has to be stored for a minimum of four years and one day or until an incident is resolved.
Scality RING replaced an unnamed storage array which came under capacity pressure after the IPS installed new high-definition cameras, that generated larger video files. Amazingly, footage from the traditional storage array had never been deleted,and the IPS lacked the visibility and a technical process for systematically deleting video.
There are three suppliers involved in the IPS video surveillance IT system: HPE, CTERA and Scality. HPE provides local storage using CTERA running on HPE DL380 servers to capture video 24/7 at each prison facility (5TB/server). Long-term offsite storage is on HPE Apollo 4000 servers and Scality RING.
Two Scality RING object storage clouds, with replication between them, provide 300TB of storage that can scales to multiple petabytes, and supports automated and systematic deletion.
News shorts
Alluxio, an open source cloud data orchestration system developer, said it closed out 1H 2020 with sales growth of more than 650 per cent over 1H 2019. Recent notable additions include Alibaba, Aunalytics, Datasapiens, EA, Nielsen, Playtika, Roblox, Ryte, Tencent, VIPShop, Walmart, Walkme and WeRide.
The Fibre Channel Industry Association has published the FC-NVMe-2 standard. Enhancements include Sequence Level Error Recovery (SLER), which significantly increases the speed at which bit errors are detected and recovered during data transmission.
HYCU says it is the first Nutanix Strategic Technology Partner to have a data protection product that has been tested and supports Nutanix Clusters on AWS.
IDC market watchers have forecast the size of the on-premises enterprise installed storage base from 2020 to 2024 in a “Worldwide Enterprise Storage Systems Installed Base Forecast, 2020-2024” report. They have estimated the installed base by deployment location, product category and storage class. They discuss the market context and drivers, looking at all-flash arrays, HCI, and the effect of Covid-19.
InfiniteIO has released v2.5 of its file metadata acceleration software. This goes faster with enhanced memory, processor and 40GbitE network connectivity to reduce metadata latency from 77µs to as low as 40µs in a single node. The new release adds NAS migration via hybrid cloud tiering, data tiering, and enhanced analytics.
N2SW’s v3.1 Backup & Recovery provides replication between AWS S3 buckets across regions and accounts, Amazon EBS snapshot copy to S3 for long-term archiving, custom tag integration; recovery drill scheduling and cross-region DR for Amazon Elastic File System (EFS).
Pivot3 says it is the first HCI vendor to be certified for the Bosch video management system (BVMS) from Bosch’s Building Technologies division.
Tableau customers can rapidly ingest data, enabling exploration, analysis, correlation and visualisation of significantly more data with more dimensions by using a SQream Connector. This is available in the Tableau Extension Gallery.
Storage array vendor StorOne has produced a white paper advising storage admins and CIOs on how to cope with the next pandemic.
The ICM Brain and Spine Institute has selected Western Digital’s OpenFlex open composable infrastructure, with F3100 NVME drives, to speed up work on cures and treatments.
Intel has teased out some more details for the upcoming Barlow Pass PMEM 200 Optane DIMMs and Alder Stream Optane SSDs. These are the second generation versions of the company’s 3D Xpoint memory products.
The company also showed a slide at the Intel Architecture Day 2020 PowerPoint fest yesterday which showed third and fourth gen Optane are in the works.
Alder Stream
Alder Stream is the first Optane SSD to use four-layer 3D XPoint technology and will use PCIe 4.0. This combo will deliver “multiple millions of IOPS” – i.e. much faster performance than the gen 1 DC P4800X SSD which uses PCIe 3.0. The four-layer scheme increases bandwidth via parallelisation opportunities for the Optane controller, with the PCIe 4 bus providing an access pipe that is twice as fast as today’s PCIe 3.
Intel presented this Alder Stream performance chart at IAD 20.
The gen 1 P4800X Optane SSD, with its dual ports, (blue line in chart above) is faster than the P4610 NAND SSD (green dashes). Alder Stream (orange dots) has a 10 microsec latency and delivers upwards of 800,000 mixed read/write IOPS. Intel showed a similar chart at a September 2019 event in Seoul, which showed Alder Stream surpassing 700,000 IOPS. It now goes faster.
Barlow Pass
Barlow Pass is about 25 per cent faster in memory bandwidth than the gen 1 Optane DC Persistent DIMM and will come in 128GB, 256GB and 512GB capacities. The 256GB Barlow Pass PMEM 200 series DIMM has a 497PBW rating while the gen 1 256GB capacity DIMM has a 360PBW rating. Both generations provide up to 4.5TB of memory per socket.
Optane roadmap
A slide at IAD 20 shows the Optane roadmap stretches to four generations.
That’s interesting as far as it goes but the company is not saying anything yet about the technology underlying gen 3 and gen 4 Optane. Gen 2 Optane doubles the 2-layer gen 1 technology to 4 layers (or ‘decks’ in Intel terminology). A continuation of this doubling trend would mean gen 3 has 8 layers and gen 4 will have 16 layers.
However a caveat is necessary as neither Intel nor XPoint manufacturer Micron have said that gen 3 will be a double decker compared to gen 2 or that gen 4 will double-deck gen 3. Layer count increases for 3D NAND have generally followed a 32 – 48 – 64 – 96 layer scheme rather than doubling and XPoint could grow by simply adding an extra 2 decks each generation. Analysts are wary of predicting XPoint layer count progress as neither Micron nor Intel provide any hints.
Intel has not confirmed availability dates for Barlow Pass or Alder Stream, though both are expected this year.
Dell is adding GPUs, FPGAs and NVMe storage to the MX7000 composable system via a deal with Liqid. This makes the MX7000 systems better suited for data-intensive applications such as AI, machine learning and low-latency analytics.
Dell indirectly announced the hookup with Liqid, a software-defined composable infrastructure vendor, via a reference architecture document published on August 7.
We think all composable systems suppliers will need to support NVMe fabrics and PCIe, in order to bring Nvidia’s GPU Direct into their systems.
NVMe is on its way to becoming the dominant composable systems fabric and SSD storage access protocol, in Ethernet and PCIe fabric incarnations. Today, most composable systems suppliers favour Ethernet as their control plane fabric. However, Nvidia GPU Direct bypasses the CPU to load data direct from NVMe storage into the GPUs across a PCIe bus.
MX7000
Composable systems dynamically build servers from pools of disaggregated compute, storage and networking elements using control plane software, with some having dedicated chassis to house the components. When the composed server is no longer needed its components are returned to the resource pools for re-use.
The MX7000 organises component resources across an Ethernet fabric and via Fibre Channel, but GPUs typically need a PCIe bus connection with NVMe to get data to the GPUs.
Dell has worked with Liqid to add a PCIe expansion chassis to its MX7000 and this houses up to 20 full height, full length GPUs or devices such as FPGAs and NVMe storage.
PCIe gen 3 4-lane adapters link the MX7000 compute sled to the Liqid PCIe expansion chassis. The Dell document states: “Once PCIe devices are connected to the MX7000, Liqid Command Center software enables the dynamic allocation of GPUs to MX compute sleds at the bare metal (GPU hot- plug supported). Any amount of resources can be added to the compute sleds, via Liqid Command Center (GUI) or RESTful API, in any ratio to meet the end user workload requirement.”
Dell MX7000 and Liqid PCIe expansion chassis
The MX7000 is managed by Dell’s OpenManage Enterprise Modular Edition (OME-M) software, which means there are two control planes.
Composable systems and NVMe
HPE Synergy is probably the most well-known composable system. The ProLiant servers used in Synergy can support direct-attached NVMe SSDs. However, the Synergy storage modules use SAS connectors, not NVMe, with SAS and SATA HDDs and SSDs supported.
Liqid has a partnership with Western Digital, which has its own OpenFlex composable systems line using chassis filled with disk drives or NVMe SSDs. Liqid can harness OpenFlex NVMe resources in systems it composes.
DriveScale, another composable systems supplier, uses Ethernet NVMe-over Fabrics via RoCE, TCP or iSCSI.
Fungible is set to join the composable systems supplier ranks later this year.
Western Digital has announced the F3200, the latest iteration of its flash fabric shared storage system for composable systems. Key takeaways? The NVMe-oF device is faster than its predecessors, there are software improvements and capacities stay the same. Also it’s not cheap.
The F3200 tucks into WD’s OpenFlex composable systems storage and is “open, fast and composable,” according to the company blog announcing the product. OpenFlex disaggregates server CPUs, memory and storage. It has an openly available API and is supported by DriveScale and Silk (rebranded Kaminario.) Its architecture allows for disaggregating GPUs and FPGAs.
The F3200 looks the same as the F3100.
F3000, F3100 and F3200
Two years ago WD introduced OpenFlex F3000, a ruler format flash drive using 64-layer 3D NAND, with 61TB capacity, NVMe-oF RoCE connectivity, and housed in an E3000 Fabric enclosure which could take ten F3000s.
The F3100 launched in August 2019 and used 96-layer 3D NAND in a ruler format and 61.4TB capacity. It had up to 11.7GB/sec bandwidth; more or less the same as the F3000’s 12GB/sec, and delivered up to 2.1m IOPS with a latency of less than 48μs.
Now WD has announced the F3200, with the same 61.4TB maximum capacity and faster write performance; up to 48 per cent random and 22 per cent sequential write speed gain. There is up to 4 per cent mixed read and write performance increase.
WD says latency is under 40μs 99.99 per cent of the time but a speeds table puts a different slant on this:
This shows the 61.4TB model has 47.2μs 99.99 per cent ransom write latency while the 51.2TB variant has a 46.7μs number. Only the lower capacity models slip under the 40μs figure.
The F3200 has 2 x 50GbitE ports and is available in 15.3, 30.7 and 61.4TB capacity points. The endurance is 0.8 drive writes per day, but can be increased to 2 DWPD by formatting capacity down to 12.8TB, 25.6TB or 51.2TB, respectively.
New software includes VLAN tagging, secure erase, expanded NVMe namespace and non-disruptive firmware updates. The F3200 also incorporates Open Composable APIs to enable filterable email alerts and telemetry data.
A 25.6TB F31200 costs $19,721.04 at NCDS, a Canadian retailer, while the 12.8TB model is priced at $13,829.71.The 61.4TB model will set you back $31,988.64.
WekaIO has devised a “production-ready” framework to help artificial intelligence installations to speed up their storage data transfers. The basic deal is that WekaIO supports Nvidia’s GPUDirect storage with its NVMe file storage. Weka says its solution can deliver 73 GB/sec of bandwidth to a single GPU client.
The Weka AI framework omprises customisable reference architectures and software development kits, centred on Nvidia GPUs, Mellanox networking, Supermicro servers (other server and storage hardware vendors are also supported) and Weka Matrix parallel file system software.
Paresh Kharya, director of product management for accelerated computing at Nvidia, provided a quote: “End-to-end application performance for AI requires feeding high-performance Nvidia GPUs with a high-throughput data pipeline. Weka AI leverages GPUDirect storage to provide a direct path between storage and GPUs, eliminating I/O bottlenecks for data intensive AI applications.”
Nvidia and Weka say AI data pipelines have a sequence of stages with distinct storage IO requirements: massive bandwidth for ingest and training; mixed read/write handling for extract, transform, and load (ETL); and low latency for inference. They say a single namespace is needed for entire data pipeline visibility.
Weka and Nvidia
Weka AI bundles are immediately available from Weka’s channel. You can check out a Weka AI white paper (registration required.)
Pure Storage is reselling Cohesity software with its FlashBlade storage array to provide a single flash-to-flash-cloud, data protection and secondary data management system.
Mohit Aron
Mohit Aron, Cohesity CEO and founder, said in a statement: “We are thrilled to partner with Pure in bringing to market a solution that integrates exceptional all-flash capabilities and cutting-edge data protection offerings that together unleash new opportunities for customers.
Called Pure FlashRecover, the team effort combines the FlashBlade array with a white box server that runs Cohesity’s Data Platform software. This is not an appliance in the sense that it is a dedicated and purpose-built product but it can be used in an appliance-like manner. Pure FlashRecover is a jointly-engineered system, with disaggregated and independently scalable compute and storage resources. The FlashBlade array can perform functions beyond providing storage for Cohesity software.
Flash-to-flash-to-cloud. FlashRecover is the deduping flash store in the diagram
FlashRecover can function as a general data protection facility for Pure Storage and other suppliers’ physical, virtual, and cloud-native environments, with faster-than-disk restore and throughput from the all-flash FlashBlade array. Most functionality of the hyperconverged, scale-out Cohesity DataPlatform is also available to customers. Features include tiering off data to a public cloud, ransomware protection, copy data management, data supply to analytics and test and dev.
Integration
Pure has become a Cohesity Technology Partner and the two companies have integrated their environments. Cohesity Helios management software auto-discovers FlashBlade systems and Cohesity uses FlashBlade snapshots.
Cohesity spreads the data across available space on FlashBlade to maximise restore performance and enhance efficiency. The software is optimised to provide performance even when the storage for the data is from disaggregated FlashBlades.
FlashRecover will be sold by Pure’s channel and supported by Pure. Cohesity and Pure are looking forward to further joint technology developments from this point.
ObjectEngine replacement
Last month, Pure canned the FlashBlade-based ObjectEngine backup appliance. The company told us it was “working with select data protection partners, which we see as a more cohesive path to enhancing those solutions with native high performance and cloud-connected fast file and object storage to satisfy the needs in the market.” Now we see that Cohesity replaces the ObjectEngine software and FlashRecover replaces the ObjectEngine appliance.
Pure FlashRecover, Powered by Cohesity is being tested by joint customers today and will be generally available in the United States in the fourth quarter, and elsewhere at unspecified later dates. Proof of concepts are available now for select customers .
Spin Memory has designed a ‘Universal Selector’ transistor that improves DRAM array density by 20 to 35 per cent, according to the company. It says the technology increases MRAM density by up to five times and accelerates MRAM operations.
Spin Memory derives its name from the field it works in. The company is a Spin Transfer Torque MRAM developer, which positions MRAM as a replacement memory technology for CPU caches using Static RAM. It claims its STT-MRAM could function as storage-class memory like Intel’s Optane.
Tom Sparkman
Tom Sparkman, CEO of Spin Memory, provided an announcement quote: “Our latest breakthrough innovation allows for exciting new advancements and capabilities for [advanced memory] technologies – in addition to pushing MRAM into the mainstream market.”
Spin Memory’s Universal Selector is, according to the announcement, a selective, vertical epitaxial cell transistor with a channel that is electrically isolated from the silicon substrate. Epitaxial refers to the growth of a crystal in a particular orientation on top of another crystal whose orientation determines the growth direction of the upper crystal. A vertical epitaxial cell is grown vertically above the underlying crystal. Epitaxy is used in semiconductor manufacturing to form layers and wells.
Spin Memory thinks this technology can be applied outside the MRAM field to DRAM, ReRAM and PCRAM, for example, hence the ‘universal’ attribute.
The electrical isolation prevents DRAM row hammer attacks, Spin Memory says.These are malware instances where repetitive (hammering) memory access patterns are made to to DDR3 and DDR4 cells, causing leakage and so altering cell contents in adjacent cell rows.
Charlie Slayman, IRPS 2020 technical program chair, was quoted in Spin Memory’s announcement: “Spin Memory’s Universal Selector offers a novel way to design vertical cell transistors and has been presented to the JEDEC task group evaluating solutions to the row hammering problem.”
Spin Memory claims Universal Selector gives any developer of non-Flash memories the means to “drastically improve density for almost every memory technology on the market without requiring an investment in specialised hardware or resources.”
Jim Handy of Object Analysis, an analyst firm, said: “This is a very clever use of the vertical transistors that are the basis of 3D NAND flash. If a vertical transistor can make NAND smaller, then why not harness it for other technologies?”
“Selectors have always been a vexing issue for emerging memories. A number of developers focus all of their effort on the bit element but none on the selector. Both are significant challenges and both need to be solved or you have nothing.”
Ed Walsh left his job as head of IBM Storage last week to take on the CEO role at ChaosSearch, a log data analytics startup. Co-founder Les Yetton has stepped aside to make way for Walsh.
ChaosSearch CTO Thomas Hazel told us in a briefing this week that Walsh “has the experience and vision to enable us to very rapidly scale to meet customer demand we are seeing”.
Hazel and Yetton set up the company in 2016 to devise a faster, more-efficient way of searching the incoming flood of log and similar unstructured data heading for data lakes. They saw that analysis was becoming IO-bound because the Moore’s Law progression of compute power scaling was slowing. Their answer was to build software toc ompress and accelerate the analytics IO pipeline.
Thomas Hazel
Walsh said in a press statement today: “The decision to leave IBM was extremely difficult for me, but the decision to join ChaosSearch was very easy. Once I saw the unique and innovative approach they take to enable ‘insights at scale’ within the client’s own Cloud Data Lake and slash costs, I knew I wanted to be part of this company.”
He added: “Unlike the myriad log analytic services or roll-your-own solutions based upon the Elastic/ELK stack, ChaosSearch has a ground-breaking, patent-pending data lake engine for scalable log analytics that provides breakthroughs in scale, cost and management overhead, which directly address the limits of traditional ELK stack deployments but fully supports the Elastic API.”
The ELK stack refers to ElasticSearch, Logstash and Kibana; three open-source products maintained by Elastic and used in data analysis.
Background
ChaosSearch saw that data in data lakes needed compressing for space reasons, and also extracting, transforming and loading (ETL) into searchable subsets for Elastic Search and data warehouses such as Snowflake. This involved indexing the data to make it searchable. What the founders invented was a way of compressing such data and indexing it in a single step.
Hazel told us Gzip compression takes a 1GB CSV file and reduces it to 500MB. Paquet would turn it into an 800MB file while Lucene would enlarge it three to five times by adding various indices. ChaosSearch reduces the 1GB CSV file to 300MB, even when indexing is performed.
Bump the 1GB CSV file to multiple TB or PB levels and the storage capacity savings become quite significant. Overall analytics time is also greatly reduced because the two separate compression and ETL phases are condensed into one.
The compression means that more data can be stored in the analysis system and this improving analytics results.
Ed Walsh.
ChaosSearch thinks a cloud data lake can be stored in cloud object storage (S3) and function as a single silo for business intelligence and log analytics. The company is initially focusing on the log analytics market and says its technology can run search and analytics directly on cloud object storage. Walsh claimed ChaosSearch technology is up to 80 per cent less expensive than alternatives, in our briefing with the company.
The ChaosSearch scale-out software will be available on Amazon’s cloud initially and be extended to support Azure and GCP. Upstream analysis tools such as Elasticsearch access its data through standard APIs.
Check out a white paper for a detailed look inside ChaosSearch’s technology.
Profile Quobyte produces unified block, file and object storage for high-performance computing (HPC) and machine learning.
Its software runs on-premises and in the Google cloud and is used for data analytics, machine-learning, high-frequency trading, life sciences and similar petabyte-scale, fast-response applications. The company also has a TensorFlow plug-in.
Quobyte was established by CEO Björn Kolbeck and CTO Felix Hupfeld in Berlin in 2013. The pair studied computer science and did distributed storage systems research at Berlin’s Zuse Institute, where they developed the open source XtreemFS distributed file system as part of their PhD. They then worked for Google and this gave them the idea that they could build a Google-like scalable, distributed, parallel access and multi-protocol storage system for HPC customers.
Kolbeck and Hupfel thought that existing HPC storage was placed in restrictive silos such as file or object, was hardware-dependent, and needed a lot of administration. Their response was to develop high-performance and multi-protocol storage software that was independent of its hardware supplier and manageable by a small team.
Björn Kolbeck (left) and Felix Hupfeld
The software use commodity server hardware and is intended to replace several separate storage vaults while delivering good random and small file IOPS and large file throughput.
Competition
Quobyte is a small company. To give an indication of size, the company lists a dozen customers on its website and has 19 employees, according to LinkedIn profiles. Alstin Capital, an investor, valued Quobyte at $35m in 2018 and describes it as an early growth company. The funding history is a little opaque but the last capital raise was in 2016. According to one source, the company is exploring a fresh round.
As an HPC storage software supplier, Quobyte competes with Ceph, DDN, Intel DAOS, Lustre, Panasas, Qumulo, IBM’s Spectrum Scale and WekaIO, amongst others. Compared to most of these, Quobyte is a minnow without substantial backing.
Open source Ceph, which also supplies file, block and object access access storage, has an underlying object store, with file and block access layered above that. Kolbeck and Hupfeld thought they would get better performance from a file system base with block and object access layered on it.
IBM’s Spectrum Scale has no block storage and needs fairly complex tuning and administration. Panasas PanFS software has recently become independent of its hardware, and has high-performance but this is limited to file access. DDN HPC storage requires DDN hardware.
Quobyte technology
Quobyte has developed a Data Centre File System or Unified Storage Plane – two names for the same thing – which supports POSIX, NFS, S3, SMB and Hadoop file and block access methods.
The software has a distributed parallel, fault-tolerant, highly scalable and POSIX-compliant file system at its heart. This links Quobyte nodes (servers) with Quobyte clients in accessing servers, which make remote procedure calls (RPCs) to the Quobyte servers.
The servers are nodes in a Quobyte storage cluster. Each node’s local file system handles block mapping duties using SSDs and HDDs. The system uses a key:value store designed for file system use and metadata operations are not part of the IO path.
Quobyte software can host any number of file system volumes which can be assigned to tenants. Various system policies control data placement, tiering, workload isolation, and use of partitions. The software has built-in accounting, analytics and monitoring tools plus built-in hardware management. It also supports cloud-native workloads with Docker, Mesos and Kubernetes and supports OpenStack and a Google TensorFlow plug-in.
Quobyte’s software operates in Linux user mode and not privileged kernel mode. This avoids context switches into kernel mode and means the software is safer as it can’t execute potentially damaging kernel mode instructions.
Three basic services
The software has three basic services; Registry Service to store cluster configuration details, Metadata Data Service for file access data and Data Service for actual file IO. One Data Service operates per node and can handle hundreds of drives. There can be hundreds or more data service nodes.
The metadata service is deployed on at least four nodes and can scale to hundreds of nodes. Placing metadata on NVMe SSDs provides the lowest latency accesses.
A client can issue parallel reads and writes to the servers. All clients can access the same file simultaneously. Byte-range or file-level locking is supported across all protocols that support locks to prevent corruption from multipole writes to the same data.
Nutanix is ready to announce Nutanix Clusters. This brings the on-premises Nutanix experience to AWS and opens another front in the company’s battle with VMware.
Sources close to the company say Nutanix Clusters in AWS (NCA) has been in an early-access test phase for many months and is now robust and ready to move into general availability.
NCA runs on bare metal all-flash servers in AWS and uses AWS networking. Customers spin up servers using their AWS account and deploy Nutanix software on them. This process uses AWS Cloud Foundation, Amazon’s facility to provision and model third-party applications in AWS. On-premises Nutanix licenses can be moved to AWS to instantiate NCA there.
VMware uses its ESX hypervisor as an overlay atop AWS networking and this can sap resources and become a performance bottleneck, according to Nutanix sources.
NCA supports four types of AWS instance, including Large CPU, Large Memory and Large Storage. The Nutanix Prism management console can be used to manage NCA.
NCA bursts on-premises Nutanix deployments to AWS to cope with spikes – for example, an immediate requirement to add 1,000 virtual desktops. It also has disaster recovery capabilities.
Customers can use NCA to migrate on-premises applications running on Nutanix to AWS. There is no need to re-engineer applications as the software runs the Nutanix environment transparently across the public cloud and on-premises worlds.
When no longer needed, a Nutanix cluster in AWS can be spun down with its data stored in S3, incurring S3 charges only, until spun up again. The the spun-up NCA is rehydrated from the S3 store. We understand this go-to-sleep facility will follow the main NCA release in a few months.
A blog by Nutanix CTO Binny Hill provides more background on NCA and there is more info an early-access solution brief.