Home Blog Page 316

The next game changer? Amazon takes on the SAN vendors

Amazon has re-engineered the AWS EBS stack to enable on-premises levels of SAN performance in the cloud. Make no mistake, the cloud giant is training its big guns at the traditional on-premises storage area networking vendors.

The company revealed yesterday at re:Invent 2020 that it has separated the Elastic Block Store compute and storage stacks at the hardware level so they can scale at their own pace. AWS has also rewritten the networking stack to utilise its high-performance Scalable Reliable Datagrams (SRD) networking protocol, and so lower latency.

The immediate fruits of this architecture overhaul include EBS Block Express, the “first SAN built for the cloud”. AWS said the service is “designed for the largest, most I/O intensive mission-critical deployments of Oracle, SAP HANA, Microsoft SQL Server, and SAS Analytics that benefit from high-volume IOPS, high throughput, high durability, high storage capacity, and low latency.”

Pure conjecture from us, but Amazon could hit the SAN storage suppliers squarely in their own backyards by introducing EBS Block Express to the AWS Outposts on-premise appliance.

Mai-Lan Tomsen Bukovec, VP Storage, at AWS, said in a statement: “Today’s announcements reinvent storage by building a new SAN for the cloud, automatically tiering customers’ vast troves of data so they can save money on what’s not being accessed often, and making it simple to replicate data and move it around the world as needed to enable customers to manage this new normal more effectively.”

Mai-Lan Tomsen Bukovec

AWS noted that many customers had previously striped multiple EBS io2 volumes together to achieve higher IOPS, throughput or capacity. But this is sub-optimal. The alternative – on-premises SANS – are “expensive due to high upfront acquisition costs, require complex forecasting to ensure sufficient capacity, are complicated and hard to manage, and consume valuable data center space and networking capacity”

Now EBS io2 Block Express volumes can support up to 256,000 IOPS, 4,000 MB/second of throughput, and 64TB of capacity. This is a fourfold increase over existing io2 volumes across all parameters. The new volumes have sub-millisecond latency and users can stripe multiple io2 Block Express volumes together to get better performance.

Decoupled compute and storage

AWS yesterday said decoupling of compute and storage in the EBS service has enabled it introduce a new class of Gp (general purpose) volume for general purpose workloads such as relational and non relational databases. Capacity grows in lockstep with improvements in performance ((IOPS and throughput) with the existing Gp2 volumes and this means customers can end up paying for storage that they don’t need.

AWS has addressed this with Gp3 volumes, to enable users to utilise a claimed 4x performance increase over Gp2 volumes – without incurring a storage tax. As well as independent scaling, Gp3 volumes are priced 20 per cent cheaper than Gp2. Migration from Gp2 to Gp3 is seamless, AWS says, and handled via Elastic Volumes.

Tiering and replication

The Archive Access (S3 Glacier) and Deep Archive Access (S3 Glacier Deep Archive) tiers, announced in November with S3’s Intelligent-Tiering, are now generally available. Customers can lower storage costs by putting cold data into progressively deeper and lower-cost AWS archives.

S3 Replication enables the creation of a replica copy of customer data within the same AWS Region or across different AWS Regions. This is now extended to replicate data to multiple buckets within the same AWS Region, across multiple AWS Regions, or a combination of both.

AWS io2 Block Express volumes are available in limited preview.

Pliops strengthens board with appointment of Mellanox founder

Eyal Waldman

Mellanox founder Eyal Waldman has joined the board of Pliops, an Israeli data storage-tech startup. His role will be to help guide Pliop’s growth and scale its technology to new use cases. He will also advise on financial decisions, personnel and overall strategy, and meet key customers and partners.

“Pliops is one of those companies that is poised to make a huge impact. This is a pivotal time in the data centre and I’m looking forward to working with the Pliops team as they roll out their technology,” said Waldman, the former CEO of Mellanox, which was acquired earlier this year by Nvidia for $7bn. He left Nvidia in November.

According to Waldman, “Pliops is tackling the most challenging issues that are vexing to data centre architects – namely, the colliding trends of explosive data growth stored on fast flash media, ultimately limited by constrained compute resources.”

Eyal Waldman
Eyal Waldman.

To date, Pliops has raised $40m to fund the development of a storage processing unit (SPU), which we consider to be a sub-category of the new class of data processing units (DPU). The Pliops card hooks up to a server across a PCIe link and accelerates and offloads the server’s X86 CPUs. The company had originally targeted launch for mid-2019 but is now sampling its storage processors to select customers and expects general availability in Q1 2021.

Waldman’s Mellanox experience, connections and know-how should help the company in a competitive environment that is heating up.

Pliops must contend with VMware and Nvidia’s Project Monterey DPU vision. Nvidia also told us this week of its plans to add storage controller functions to the Bluefield SmartNIC.

Also Pliops SPU is similar in concept to another startup, Nebulon and its SPU, which has a cloud-managed and defined software architecture. Nebulon said it has bagged HPE and Supermicro as OEMs.

Commvault gives VMware workloads some more loving in latest DR software release

Commvault has updated its new DR software with recovery automation for VMware workloads.

The upgrade also sees Commvault Disaster Recovery gain orchestration to, from and between on-premises and Azure and AWS environments. The orchestration can be within zones or across regions, and features simple cross-cloud migration support. It seems reasonable that Commvault will in due course add Google Cloud support.

ESG senior analyst Christophe Bertrand gave his thumbs up to the upgrade: “Commvault Disaster Recovery’s multiple cloud targets and speedy cross-cloud conversions make it extremely compelling. With everything going on in the world today, a true disaster could be right around the corner for any company. It’s critical to have enterprise multi-cloud tools in place to mitigate data loss and automate recovery operations immediately.”

The competition between DR specialist Zerto, which recently moved into backup, and data protector Commvault, which recently moved into DR, is hotting up. Cohesity has also moved into automated DR with its SiteContinuity offering.

Commvault released Commvault Disaster Recovery in July. Its automated failover and failback provide verifiable recoverability and reporting for monitoring and compliance. The software enables continuous data replication with the automated DR capabilities, capable of sub-minute Recovery Point Objectives (RPOs), along with near-zero Recovery Time Objectives (RTOs). 

Commvault cites additional benefits for the software such as cloud migration, integration with storage replication, ransomware protection, smart app validation in a sandbox, and instant mounts for DevOps with data masking. The latter feature moves it into the copy data management area, competing with Actifio, Catalogic, Cohesity, Delphix and others.

Google builds out Cloud with Actifio acquisition

Google is buying Actifio, the data management and DR vendor, to beef up its Google Cloud biz. Terms are undisclosed but maybe the price was on the cheap side.

Actifio has been through torrid time this year. The one-time unicorn refinanced for an unspecified sum at near-zero valuation in May. It then instituted a 100,000:1 reverse stock split for common stock which crashed the value of employees’ and ex-employees’ stock options.

Financial problems aside, Google Cloud is getting a company with substantial data protection and copy data management IP and a large roster of enterprise customers.

Matt Eastwood, SVP of infrastructure research at IDC, provided a supporting statement: “The market for backup and DR services is large and growing, as enterprise customers focus more attention on protecting the value of their data as they accelerate their digital transformations. We think it is a positive move for Google Cloud to increase their focus in this area.”

Google said the acquisition will “help us to better serve enterprises as they deploy and manage business-critical workloads, including in hybrid scenarios.” It also expressed commitment to “supporting our backup and DR technology and channel partner ecosystem, providing customers with a variety of options so they can choose the solution that best fits their needs.”

This all suggests Actifio software will still be available for on-premises use.

Ash Ashutosh, Actifio CEO, said in a press statement: “We’re excited to join Google Cloud and build on the success we’ve had as partners over the past four years. Backup and recovery is essential to enterprise cloud adoption and, together with Google Cloud, we are well-positioned to serve the needs of data-driven customers across industries.”

Ash Ashutosh video.

Actifio was started by Ashutosh and David Chang in July 2009. The company took in $311.5m in total funding across A. B, C, D and F-rounds. The latter was for $100m in 2018 at a $1.3bn valuation.

What Actifo brings to Google Cloud

Google Cloud says Actifio’s software:

  • Increases business availability by simplifying and accelerating backup and DR at scale, across cloud-native, and hybrid environments. 
  • Automatically backs up and protects a variety of workloads, including enterprise databases like SAP HANA, Oracle, Microsoft SQL Server, PostgreSQL, and MySQL, as well as virtual machines (VMs) in VMware, Hyper-V, physical servers, and Google Compute Engine.
  • Brings significant efficiencies to data storage, transfer, and recovery. 
  • Accelerates application development and reduce DevOps cycles with test data management tools.

All-flash arrays shine in anaemic quarter for HPE storage

HPE revenues have returned to pre-pandemic levels – more or less – but data storage lags behind the rest of the business, with revenues down three per cent Y/Y to $1.2bn.

However, all-flash arrays (AFA) and hyperconverged were bright spots. AFA revenue grew 19 per cent Q/Q driven by increased adoption of the Primera AFA, which was up 43 per cent Q/Q ,and the Nimble AFA, which was up 27 per cent Q/Q. We don’t have Y/Y numbers for these two products.

Antonio Neri, HPE CEO
Antonio Neri

In the earnings call CEO Antonio Neri said: “In storage, we have been on a multiyear journey to create an intelligent data platform from edge-to-cloud and pivot to software-as-a-service data storage solutions, which enable higher level of operational services attach and margin expansion. And our strategy is getting traction.

“Our portfolio is well positioned in high-growth areas like all-flash array, which grew 29 per cent year over year; big data storage, which had its sixth consecutive quarter of growth, up 41 per cent Y/Y; and hyperconverged infrastructure where Nimble dHCI, our new hyperconverged solution, continued momentum and gained share, growing 280 per cent Y/Y. We also committed to doubling down in growth businesses and investing to fuel future growth.”

HPE emphasised Q/Q growth to show it is climbing out of a pandemic-caused drop in revenues. Big Data grew 27 per cent Q/Q thanks to to increased customer demand for AI/ML capability. Overall, storage accounts for 16.7 per cent of HPE’s revenues. (A minor point – in HPE’s compute business the Synergy composable cloud business grew five per cent Q/Q.)

CFO Tarek Robbiati said: ‘Our core business of compute and storage is pointing to signs of stabilisation, and our as-a-service ARR (annual recurring revenue) continues to show strong momentum aligned to our outlook.”

For comparison, NetApp yesterday reported Q2 revenues up 15 per cent Y/Y, while Pure Storage last week reported revenues down four per cent Y/Y.

HPE’s outlook is for a mid-single digits revenue decline Y/Y next quarter.

NetApp’s high-end AFA sales lead it out of pandemic recession

NetApp has posted its second successive quarter of revenue growth, thanks to an unexpected boost in high-end all-flash storage array sales.

The company recorded $1.42bn in revenues for the second fiscal 2021 quarter ended October 30, 2020,three per cent higher than a year ago and above guidance. Net income fell 43.6 per cent to $137m.

CEO George Kurian’s said in a press statement: “I am pleased with our continued progress in an uncertain market environment. The improvements we made to sales coverage in FY20 and our tight focus on execution against our biggest opportunities continue to pay off.”

Quarterly revenue buy fiscal year chart shows NetApp climbing out of a revenue dip

Highlights in the quarter included a 200 per cent jump in public cloud services annual recurring revenue (ARR) to $216m, and all-flash array run rate increasing 15 per cent to $2.5bn. NetApp said 26 per cent of its installed systems are all-flash, which leaves plenty of room to convert more customers to AFA systems.

Hardware accounted for $332m of the $749m product revenue, down 18 per cent Y/Y, with software contributing $417m, up 14 per cent. Product revenue in total declined three per cent Y/Y.

On the earnings call, CFO Mike Berry said the company is “on track to deliver on our commitment of $250m to $300m in fiscal ’21 Cloud ARR and remain confident in our ability to eclipse $1bn in Cloud ARR in fiscal ’25.”

The outlook for NetApp’s Q3 is $1.42bn at the mid-point, one per cent up on the same time last year. NetApp hopes that Covid-19 vaccination programs will lead the overall economy to growth after that in calendar 2021.

NetApp gives PowerStore a kicking

Kurian’s prepared remarks included this sentiment: ”We are pleased with the mix of new cloud services customers and growth at existing customers. We saw continued success with our Run-to-NetApp competitive takeout program, an important component of our strategy to gain new customers and win new workloads at existing customers.”

That program targets competitors’ product transitions, such as Dell’s Unity to PowerStore transition. Dell bosses recently expressed impatience about PowerStore’s revenue growth rate in its quarterly results.

Kurian talked about market share gains in the earnings call: “If you look at the results of all of our major competitors, [indiscernible], Dell, and HP, there’s no question we have taken share. I think our product portfolio is the best in the market.” He called out high-end AFAs as doing well – which was unexpected according to Berry. This drove NetApp’s “outperformance in product revenue and product margin”.

Kurian gave PowerStore a kicking when replying to an analyst’s question: “I think as not only we have observed, but many of our competitors have also observed, the midrange from Dell has not met expectations. It is an incomplete product. It is hard to build a new midrange system. And so it’s going to be some time before they can mature that and make that a real system. And you bet we intend to take share from them during that transition… We’re going to pour it on.”

Riding the disk replacement wave

NetApp’s AFA revenue growth should continue, according to Kurian. “We think that there are more technologies coming online over the next 18 to 24 months that will move more and more of the disk-based market to the all-flash market. We don’t think that all of the disk-based market moves to all-flash. But as we said, a substantial percentage of the total storage market, meaning let’s say 70 to 80 per cent will be an all-flash array portfolio.”

He is thinking of QLC flash (4bits/cell) SSDs as they enable replacement of nearline and faster disk drives. Kurian said: “QLC makes the advantage of an all-flash array relative to a 10k performance drive even better. So today, there are customers buying all-flash arrays, when they are roughly three times the cost of a hard drive. With QLC, that number gets a lot closer to one and a half to two times.”

Also, the “economics of all-flash are benefited by using software-based data management”.

Micron shrugs off Huawei hit, raises Q1 financial guidance

Micron has upped revenue guidance for the first quarter ended December 3 from $5bn-$5.4bn to $5.7bn-$5.75bn.

The US chipmaker has also increased its gross margin and EPS guidance for the quarter. Investors are happy and the stock price rose 5.7 per cent in pre-market trading.

Micron said it switched production from Huawei to other customers more quickly than it has previously anticipated. Huawei, hitherto Micron’s biggest customer, is subject to a US trade ban.

In addition, Micron may have recorded stronger than expected DRAM sales, according to Wells Fargo analyst Aaron Rakers.

It will be interesting to see if this is Micron-specific news or Samsung and SK hynix are also benefiting from an end-of-year boost.

Nvidia plots invasion of the storage controllers

Nvidia is adding storage controller functions to its BlueField-2 SmartNIC card and is gunning for business from big external storage array vendors such as Dell, Pure and VAST Data.

Update: Fungible comment added; 3 December 2020.

Kevin Deierling, Nvidia SVP marketing for networking, outlined in a phone interview yesterday a scheme whereby external storage array suppliers, many of whom are already customers for Nvidia ConnectX NICs, migrate to SmartNICs in order to increase array performance, lower costs and increase security. The BlueField SmartNIC incorporates ConnectX NIC functionality, making its adoption simpler.

Deierling said Nvidia is already having conversations with array suppliers; “BlueField is a superb storage controller. … We’ve demo’d it as an NVMe-oF platform … This is very clearly a capability we are able to support, at a cost point that’s a fraction of the other guys.”

As BlueField technology and functionality progresses, array suppliers could consider recompiling their controller code to run on the BlueField Arm CPUs. At this point the dual Xeon controller setup changes to a BlueField DPU system – more than a SmartNIC, and the supplier says goodbye to Xeons, Deierling declared. “It’s software-defined and hardware-accelerated. Everything runs the same, only faster.”

SmartNICs and DPUs

SmartNICs are an example of DPU (Data Processing Unit) technology. Nividia positions the DPU as one of three classes of processors, sitting alongside the CPU and the GPU. In this worldview, the CPU is a general-purpose processor. The GPU is designed to process graphics-type instructions much faster than a CPU. The DPU acts as a co-ordinator and traffic cop, routing data between the CPU and GPU.

Kevin Deierling

Deierling told B&F that growth in AI use by enterprises was helping to drive DPU adoption; “One server box can no longer run things. The data centre is the new unit of computing. East-west traffic now dominates north-south traffic. … The DPU will ultimately be in every data centre server.”

He thinks the DPU could appear in external arrays as well, replacing the traditional dual x86 controller scheme.

In essence, an external array’s controllers are embedded servers with NICs (Network Interface cards) that link to accessing servers and may also link to drives inside the array. Conceptually, it is easy to envisage their replacement by SmartNICs that offload and accelerate functions like compression from the array controller CPUs.

DPU discussion

Today, the DPU runs repetitive network and storage functions within the data centre. North-south traffic is characterised as network messages that flow into and out of a data centre from remote systems. East-west traffic refers to network messages flowing across or inside a data centre.

The east-west traffic grows as the size of the datasets increases. Therefore it makes increasing sense to offload repetitive functions from the CPU, and accelerate it at the same time, by using specialised processors. This is what the DPU is designed to do.

The DPU can run server controlling software, such as a hypervisor. For example, VMware’s Project Monterey ports vSphere’s ESXi hypervisor to the BlueField 2’s Arm CPU and uses it to manage aspects of storage, security and networking in this east-west traffic flow. VMware functions such as VSAN and NSX could then run on the DPU and use specific hardware engines to accelerate performance.

SoCs not chips

The DPUs will help feed the CPUs and GPUs with the data they need. Deierling sees them as multiple SoCs (system on chips) rather than a single chip. A controlling Arm CPU would use various accelerator engines to handle packet shaping, compression, encryption or deduplication, which could run in parallel.

Fungible, a composable systems startup, aggregates this work in a single chip approach, but this does not allow the engines to work in parallel, according to Deierling.

In rebuttal, Eleena Ong, VP of Marketing at Fungible, gave us her view: “The Fungible DPU has a large number of processor elements that work in parallel to run infrastructure computations highly efficiently, specifically the storage, network, security and virtualisation stack.

“The Fungible DPU architecture is unique in providing a fully programmable data path, with no limit on the number of different workloads that can be simultaneously supported. The high flexibility and parallelism occur inside the DPU SoC as well as across multiple DPU SoCs. Depending on your form factor, you would integrate one or multiple DPUs.”  

SoC it to Nebulon

Nebulon, another startup is trying to do something similar to Nvidia, with its ‘storage processing unit’ (SPU). This consists of a PCIe card on which there are dual Arm processors and various offload engines to perform SAN controller functions at a lower hardware cost than an external array with dual Xeon controllers. That’s a match with Deierling’s definition of a DPU.

Nebulon SPU

To infinitesimal and beyond! Micron extends DRAM roadmap

Micron has updated its DRAM roadmap from three to four cell shrink stages, enabling more DRAM capacity per wafer and lowering costs per GB.

The US chipmaker intends to shrink the cell or process node size progressively with the following steps, starting from the no longer mainstream 20nm node process size to the 10nm (19nm-10nm range) node process:

  • 1Xnm – (c19-17nm) older DRAM technology process node size
  • 1Ynm – (c16-14nm) mainstream DRAM bit production technology today
  • 1Znm – (c13-11mn) 15 per cent of Micron DRAM bit production in 3Q20 
  • 1αnm process node – 1 alpha – volume production in first half of 2021
  • 1βnm process node – 1 beta – in early development
  • 1ɣnm process node – 1 gamma – early process integration 
  • 1δnm process node – 1 delta – pathfinding and may need EUV technology

The 1 delta node size is a new entry on Micron’s DRAM roadmap. We do not have indicative process node sizes below 1Znm.

Micron 1 alpha nm DRAM technology

Bit density growth rate slowed with the transitions from 1Xnm to 1Yn and 1Znm, Micron said. However, the company has accelerated the growth rate with a 40 per cent increase from 1Znm to the 1αnm process node size.

Wells Fargo analyst Aaron Rakers informs subscribers Micron has a strong position in 1Znm DRAM production. Citing the research firm DRAMeXchange, he estimates Micron’s 1Znm output at 15 per cent of its DRAM bit production in 3Q20 versus Samsung and SK hynix at six per cent and zero. 

Other things being equal, 1Znm DRAM costs less to manufacture than the preceding 1Ynm node.

In Violet

Micron uses Deep Ultra Violet (DUV) multi-patterning lithography to lay out DRAM cell die details on a wafer. As the process node size in shrinks below the 10nm cell size level, the wavelength of the light beam becomes a constraint.

ASML, the dominant supplier of lithography machines for the chip industry, has developed EUV (Extreme Ultra Violet) scanners which emit smaller wavelength light. This technology etches narrower lines on the wafer and so enable smaller process sizes – i.e. more DRAM dies on the wafer, and consequently higher capacity per wafer and lower cost per GB. But the capital outlay is significant – currently ASML makes only 30 EUV lithography machines a year – they weigh 180 tons and cost $120m each.

Samsung use EUV in 1Znm process nodes and SK hynix plans to use EUV technology for volume production of 1αnm DRAM and1βnm DRAM. Micron thinks EUV will not be cost-competitive until 2023 or later, meaning the 1δnm process node.

APU maker claims 100x speedup vs. Xeon for big data similarity search

Similarity search, a key concept in data science, enables researchers to analyse huge volumes of unstructured data that is not accessible by conventional query search engines.

The technique entails examining the bit-level differences between millions or billions of database records by finding content that is similar to each other. Use cases include face recognition, DNA sequencing, researching drug candidate molecules, SHA1 algorithm hashes and natural language processing (NLP). Facebook’s FAISS library is a prominent example of similarity search in action.

Typically, Xeon CPUs and GPUs are pressed into service to process similarity searches but neither technology is optimised for this work. When handling very large datasets there is a memory-to-CPU-core bottleneck.

A Xeon CPU can search one record at a time per core. When a Xeon CPU runs a similarity search, looking for occurrences of a search term in a target dataset, the dataset or portions of it are read into memory and the Xeon core, or cores, compares each dataset entry with the search term. If the dataset is an image recognition database containing one billion records that search can take a long time. Also, this has an implication for power requirements.

A Nvidia GPU can throw more cores at the problem but even that takes too long when a billion-record database is involved.

So says GSI Technology, a Silicon Valley speciality memory maker, which has designed a parallel processing system dedicated to the single similarity search job. The company claims its Gemini ‘associative processing unit’ (APU) conducts similarity searches on ‘certain big data workloads’ 100 times faster than standard Xeon processors, while reducing power by 70 per cent.

This image has an empty alt attribute; its file name is Screenshot-2020-11-26-at-15.50.13-1024x586.png
Difference between V Xeon server and GSI Technologies’ Gemini APU.

The APU locates compute units directly in a memory array so they can process data in parallel. This avoids moving the data from memory to a server’s Xeon CPU core, traversing Level 3, 2, and 1 caches on the way.

GSI’s Gemini APU accelerator is deployed as a server offload card. Consider this as an example of a data processing unit (DPU). It performs a specific function much faster than a server’s X86 processors, and can be used to offload it, freeing it to run applications in virtual machines and containers.

According to the company, a GSI card with four onboard Gemini APUs found the matches for a scanned face in a billion-record database in 1.25 milliseconds. The records were quite large, with each containing a set of 768-bit vectors hashed from 96 facial image features. A Xeon server takes up to 125 milliseconds to do the same search, according to GSI.

The company said a 1U server with 16 Gemini chips performed 5.4 million hashes per second when running the 256-bit SHA1 algorithm. This is greater throughput than a 4U server holding eight Nvidia V100 cards and uses half the electrical power, according to GSI.

We have no price for a Gemini APU but single quantity pricing for GSI’s Leda-branded PCIe card with four Geminis mounted on it is $15,000.

GSI’s APU

The Gemini APU combines SRAM and two million bit-processors for in-memory computing functions. SRAM – short for Static Random Access Memory – is faster and more expensive than DRAM.

The GSI interweaves 1-bit processing units with the read-modify-write lines of SRAM in its Gemini chip. All these processors work in parallel.

GSI Gemini APU architecture.

In a Gemini chip, data flows directly from the memory cells into a nearby chip and the search term is loaded onto each processor. This is compared to the string loaded from the SRAM cells, with the two million-plus cores operating in parallel to compute Hamming* distance, greatly outperforming a Xeon core, or even 28 Xeon cores, doing the same work.

A Gemini chip can perform two million x 1-bit operations per 400MHz clock cycle with a 26 TB/sec memory bandwidth, whereas a Xeon 8280 can do 28 x 2 x 512 bits at 2.7GHz with a 1TB/sec memory bandwidth.

APU comparison table from GSI-sponsored white paper

For comparison, a Nvidia A100 GPU server can complete 104 x 4,096 bits per 1.4GHz clock cycle, providing a 7TB/sec memory bandwidth. The Gemini chip’s memory bandwidth leaves the A100 trailing in the dust, and the Xeon CPU is even further behind.

*Hamming Distance When a computer runs a search it deals with a search term, represented as a binary string, and it looks for equivalent or similar search terms. The similarity can be expressed as the difference between the search term string and a target string, described as the number of bit positions in which the two bits are different.

It works like this; envisage two strings of equal length; 1101 1001 and 1001 1101. Add them together, 11011001 + 10011101, to get = 01000100. This contains two 1s and the Hamming distance is 2. Other things being equal, strings with smaller Hamming distances are more likely to represent things that are similar than strings with greater Hamming distances. The ‘things’ can be facial recognition images, genomes, drug candidate molecules, SHA1 algorithm hashes and so forth.

Portworx and Mayastor Kubernetes block storage flies highest

Businessman in superman pose

A study of Kubernetes storage supplier performance has revealed that efficiency of storage code is exposed in cloud-native environments. This has a knock-on effect on performance.

In a recently updated review, Jakub Pavlík of Volterra determined that Portworx and MayaData’s OpenEBS Mayastor performed best in delivering block storage IO to containers.

It is not immediately intuitive that the efficiency of different Kubernetes storage performers should differ. A cloud-native server has a hypervisor/operating system core running containers. Containers should be efficient users of hardware resource. There is little or no OS duplication – unlike a virtual server, which has a hypervisor running guest virtual machines, each with an operating system inside them as well as application code.

Pavlik’s evaluation of block storage for containers also included Ceph Octopus, Rancher Labs Longhorn, Gluster FS and Azure PVC. Ceph was third fastest in container storage speed for the Azure Kubernetes Service (AKS).

Portworx was ahead of every other supplier with random reads. 

Supplier comparison for random reads ands random writes

Mayastor was ahead with mixed read and writes.

Supplier comparison with 50/50 mixed read/writes.

Mayastor benchmarks

MayaData’s Mayastor is based on OpenEBS, an open source CNCF project that it created. OpenEBS is a foundational storage layer that enables Mayastor and others to abstract storage in a way that Kubernetes abstracts compute. 

The company earlier this month published some storage performance benchmarks of OpenEBS Mayastor working in tandem with NVMe and Intel Optane SSDs.

Mayadata established baseline Optane performance using the Fio Flexible IO tester from Github to obtain 585K, 516K and 476K random read, write and 50/50 mixed read/write performance from an Optane SSD with an NVMe interface.

Then OpenEBS had Mayastor provide storage to containers by reading from and writing to the Optane drive using NVMe-oF as the data transport method across a network. It measured the delivered IOPS and found little difference (1 – 5.6 per cent). 

Mayastor delivered IOPS across NVMe-oF from Optane SSD

Snowflake embraces data programming (and Slootman writes another book)

Cloud data warehouser Snowflake has updated its product with access to unstructured data, data service providers, expanded data ingress and row access policies. Also, CEO Frank Slootman has written another book.

Let’s deal with the book first. With co-author Steve Hamm, Slootman has penned “Rise of The Data Cloud.” You can read a sample chapter to see if you want it as a Christmas present ($16.84 hardback on Amazon). Slootman clearly likes the writing – or ghost-written – lark. In 2009 he wrote “TAPE SUCKS: Inside Data Domain, A Silicon Valley Growth Story”. Now back to Snowflake.

The company, which enjoyed the biggest ever software IPO in September, is broadening its data ingest pipeline and increasing services for processing customer data in its warehouse.

Benoit Dageville

“Many of today’s organisations still struggle to mobilise all of their data in service of their enterprise, “Snowflake co-founder and president of products Benoit Dageville said in a statement.

“The Data Cloud contains a massive amount of data from Snowflake customers and commercial data providers, creating a powerful global data network effect for mobilising data to drive innovation and create new revenue streams.”

Data in the Snowpark

SnowPark is a new data ingress portal in which data engineers, data scientists and developers can write code in their languages of choice, using familiar programming concepts, and then execute workloads such as ETL/ELT, data preparation, and feature engineering on Snowflake. It brings more data pipelines into Snowflake’s core data platform and is currently available in testing environments.

The company has added more than 100 data service providers to the Snowflake Data Marketplace, which enables customers to discover and access live, ready-to-query, third-party data sets, without needing to copy files or move the data. Services such as running a risk assessment, behavioural scoring, predictive and prescriptive data analysis can be outsourced to a data service provider.

Snowflake has announced private preview support for unstructured data such as audio, video, pdfs, imaging data and more – which will provide the ability to orchestrate pipeline executions of that data. This looks like specific types of unstructured data and not general file storage data.

Upcoming row access policies will allow Snowflake customers to create policies for restricting returned result sets when queries are executed. By creating an umbrella policy to restrict access to row data in its database, users no longer need to ensure their queries each time contain all the right constraints. This feature is slated for private preview before the end of the year.