Home Blog Page 295

HPE ushers in Nutanix Era: Fancy a swim in our everything-as-a-service GreenLake?

HPE GreenLake
HPE GreenLake

HPE has made a pitch to database admin staffers by bundling Nutanix’s Era multi-database management tool with ProLiant servers as a fully managed cloud service through its GreenLake subscription offering.

This builds on the existing GreenLake Nutanix deal announced in October 2019. The pair said it would enable customers to deploy applications and databases in minutes, with “cloud’s” elasticity and pay-per-use, along with the governance, visibility and compliance of an on-premises environment.

Keith White

Keith White, SVP and GM, HPE GreenLake Cloud Services, provided the statutory statement: “By building on our successful collaboration with Nutanix, together the HPE GreenLake and the Nutanix Era database operations and management software solution will increase agility, simplify operations and cut costs by delivering a fully managed cloud offering.”

The two claim Era customers can increase database provisioning speed by “97 per cent”, reduce unplanned downtime, lower storage requirements for copies and backups, and lower database administrators’ overtime work. To be clear that’s data admins’ overtime work, not their standard work. It seems overtime for DB admins is axiomatic.

Supported databases include Oracle, Microsoft SQL Server, MySQL, PostgresSQL, and MariaDB.

HPE and Nutanix said they had seen an approximate 80 per cent Y/Y increase in ACV (Nutanix hypervisor) bookings during the first calendar quarter of 2021, with wins for ProLiant DX servers, and GreenLake with Nutanix. The idea is that adding Era to the mix keeps this going.

Roll-your-own server crew at Liqid add vCentre composable server plug-in

VMware vCentre users can compose server systems using Liqid’s Matrix fabric software via a plug-in.

Liqid uses its X86-based Matrix software software, controlling a PCIe switch and fabric, to dynamically compose servers from separate pools of base server (CPU+DRAM) nodes, GPU, NVMe storage, Optane SSDs and network interface cards. These servers are then presented to server operating systems and applications as exactly equivalent to static bare metal servers with no change to any upstream software.

The idea is that server component utilisation increases by enabling the components to be dynamically allocated to composed servers designed to run specific workloads with no stranded resources. Up until now vCentre users have had to use Liqid’s own UI to compose the hardware infrastructure on which vSphere VMs will run. Now they will be able to do it within the vCenter environment.

Liqid co-founder and CEO Sumit Puri told an IT Press Tour briefing that: “Static infrastructure is dead. The lights-out, dynamic data centre is the future.”

A demo showed a vCenter admin moving servers components, such as GPUs, from one virtual machine to another. Puri said vSphere will be Liqid’s first supported hypervisor and others will follow, such as Nutanix’ AHV, KVM and Hyper-V.

Liqid vSphere client screenshot

Blocks & Files asked if Liqid will support composing disk drive storage for servers as the NVMe v2 specification adds rotating media to the NVMe support list: “Absolutely. 100 per cent yes” was the reply.

Partnerships

Liqid has an OEM agreement with Dell and supports its MX7000 box of composable blade servers. It is working with Inspur, the world’s number three server manufacturer, and also an un-named Japanese server vendor.

This relationship will help it enter the Japanese market as part of an Asia-Pacific geographic expansion, and it’s also hoping to have a presence in Australia. The company recently set up an EMEA office.

It is not partnering with HPE, which has its own Synergy composable infrastructure. Puri said Synergy was a $2bn business for HPE and differed from Liqid’s Matrix in that it disaggregated server components inside a converged infrastructure system. Puri showed a slide discussing legacy (Synergy) and future (Matrix) composability attributes in response to a question about his views on Synergy:

Phone capture of Puri slide

Matrix composes servers from separate pools of disaggregated components – different chassis full of NICs, GPUs, Optane drives, NAND SSDs, and so forth. Also Synergy supports SATA and SAS drives and is not PCIe fabric-based.

 Blocks & Files thinks that, for Liqid, Synergy is gen 1 composability, Matrix is gen 2 composability and, when the CXL bus arrives, that will signal gen 3 composability.

Liqid directions

Puri said Liqid is developing policy-based composability, such that composed systems can be automatically adjusted to meet conditions. For example, if GPU utilisation goes above 80 per cent then add another GPU to the composed configuration. This requires telemetry from the composed servers with software recognising changing conditions and applying policies.

A next stage would be to have machine learning models developed to automatically compose servers for applications; to have machines defining machines.

Specific road map items include adding additional hypervisor support and working with Nvidia, “closely” Puri said, on a GPU-over-Fabrics concept, similar to NVMe-oF and its use of RDMA over Ethernet. 

Liqid is quite lightly funded, having raised around $60m since being founded in 2015. It last raised $25m in a B-round in 2019. Puri said Liqid may have another fund raising round as it grows: “There’s $150 billion of total addressable market out there. … We’ll aim for an IPO.”

Voracious Snowflake partners again to get even more data into its hands and have AI apps analyse it

Yesterday cloud data warehouser Snowflake announced partnerships with BigID, Hammerspace and Talend for data integrity, file ingestion, and sensitive information masking. Today we have two more: C3 AI for enterprise AI app development that can operate on Snowflake-stored data, and Informatica for extra data loading and extract, transform and load (ETL) capabilities

Snowflake customers can use C3 AI app development facilities to run C3 AI-based analyses on their Snowflake-stored data, and that data mountain can be heaped even higher using Informatica ETL processes.

C3 AI

Snowflake customers will be provided with access to the C3 AI Suite and pre-built C3 AI applications for AI-based CRM, predictive maintenance, supply network optimisation, and fraud detection. These apply across a range of industries and enterprise AI use cases.

C3 AI software helps with Snowflake customer AI deployments by unifying Snowflake’s data, platform, and machine learning services through a model-driven, reusable, and extensible object system. It also has data virtualisation capabilities which obviate the need to replicate datasets.

The software includes pre-built, cross-industry object models with capabilities that include energy management, equipment and process reliability, inventory optimisation, and yield optimisation. There are such pre-built object models for industry verticals including manufacturing, financial services, oil and gas, utilities, telecommunications, and aerospace and defence.

C3 AI president and Chief Product Officer Houman Behzadi said “Ultimately, this partnership will create significant time and operational efficiencies for Snowflake’s customers and solidify Snowflake as the operational data platform of choice for enterprise AI applications.”

For more information on C3 AI and enterprise AI solutions, visit https://c3.ai/what-is-enterprise-ai/.

Informatica

Informatica provides an Intelligent Data Management Cloud product which can function as an ETL data onramp into Snowflake and integrates natively with Snowflake’s Java-based User Defined Functions (UDF).

More specifically, it has a cloud-native mass ingestion service to rapidly load and synchronise data from applications into Snowflake via an enterprise-grade, easy-to-use, wizard-driven application synchronisation service. Source apps include SAP, Salesforce, NetSuite, Zendesk, Microsoft Dynamics 365, Workday, Marketo, ServiceNow and Google Analytics.

Customers can transform, cleanse and govern this application data, moving it into Snowflake without any hand-coding or having to assemble their own end-to-end process workflow from disparate systems.  

Tarik Dweik, Snowflake’s Head of Technology Alliances, said “With the enterprise-grade mass-ingestion capabilities and the new support for Snowflake’s Java UDFs that Informatica is bringing to the Snowflake Data Cloud, our customers will have the ability to load data from nearly anywhere.”

Snowflake customers can sign up via Snowflake Partner Connect to start with Informatica at no cost to process up to one billion rows of data per month.

Comment

Snowflake is moving at high speed to set up even broader data ingest facilities than it already has (Hammerspace, Informatika), and to provide its customers with more ways to validate the data’s integrity (Talend), mask out sensitive personal information (BigID), and develop AI-based analytical apps (C3 AI).

It aims, we think, to have so many hooks into — and partnerships with — enterprise data-handling suppliers that it will become easier to use Snowflake than not use it. Snowflake will become so deeply embedded in the grain of the enterprise that users will be at a disadvantage if they don’t become Snowflake customers. Where will Snowflake be in the enterprise data-handling space? Everywhere.

Snowflake builds partnerships for data ingestion, masking, and integrity

Cloud data warehouse supremo Snowflake is setting up partnerships with Hammerspace, Talend, and BiGID to broaden its ecosystem and spread its data ingesting and analysing tentacles further across the IT world.

The aim is to enable customers to be able to move file data into Snowflake (Hammerspace), check its accuracy and reliability (Talend), and ensure sensitive and private information is identified and protected (BigID).

BigID

This Israeli-based startup has developed a Data Access App for Snowflake to natively enforce data access and masking within Snowflake’s Data Cloud based on data discovery and policy controls inside BigID. 

Customers can discover, classify, and automate protection of sensitive and regulated data using Snowflake’s native security and governance policies. It can:

  • Automatically identify and classify sensitive PII, PHI, NPI, IP, and other hard to find sensitive and regulated data stored in Snowflake
  • Define access rules in BigID enforced by Snowflake for who gets access to the data
  • Dynamically mask sensitive data without a proxy
  • Automate and enforce policies in a unified metadata catalog
  • Define, apply, and enforce access policies based on role, sensitivity, and category
  • Reduce risk and address data privacy and protection regulations such as GDPR, CCPA, NIST, and others.
BigID dynamic masking in Snowflake.(Screenshot)

Jon Mayer, BigID’s VP for Business Development & Partnerships, said in a canned quote, “This gives regulated organisations and companies in geographies with data sovereignty, privacy and security obligations a more streamlined experience for managing data in Snowflake.”

Snowflake Head of Product Management — Security, Vikas Jain, said “The integration runs on autopilot, allowing customers to spend more time using the data and less time implementing data protection.”

A BigID blog has more information.

Hammerspace

Hammerspace has joined the Snowflake Partner Network, and Tony Asaro, SVP of Business Development for Hammerspace, said “The combination of Snowflake and Hammerspace creates structure around unstructured data, providing greater control and utility than ever before. It’s a real game-changer.”

Hammerspace’s global file system replicates file data from any source, via NFS and SMB protocols, to be viewable, accessible, and sharable through the Snowflake Data Cloud. Customers can then include file data for any of Snowflake’s core workloads: data warehousing, data lakes, data engineering, data science, data sharing, or data applications. Additionally, Hammerspace imports customisable file metadata so users can run queries based on file names, MIME types, attributes, labels, or keywords.

Hammerspace is promoting the concept of “storage-less storage”. It claims that its customers, by replicating file data into Snowflake, can reduce storage maintenance to near-zero. 

Snowflake customers deploying Hammerspace can simplify their hybrid cloud NAS storage experience and reduce the overall cost of their file data.

Tarik Dwiek, Director of Technology Alliances for Snowflake, issued a statement saying “Our customers increasingly want Snowflake to be the single source of truth for all of their data for governance, research, and analytics. While still in preview, the Hammerspace solution provides an intelligent, scalable, and easy way to get unstructured data into Snowflake that we believe will provide tremendous value for our customers.”

Talend

Talend, with 6500 customers, supplies a corporate Data Fabric product to provide data integration and governance with a trust score indicating a dataset’s reliability. The partnership with Snowflake means Talend’s data profiling algorithms can check the integrity, accuracy and reliability of data inside the Snowflake data warehouse. 

It can then generate a Talend Trust Score for Snowflake, using Snowflake’s Snowpark development environment and Java UDFs (User-Defined Functions). Snowflake customers will be able to run quality checks on entire data sets without the use of external applications or moving sample sets. 

Talend Snowflake diagram.

A statement from Rolf Heimes, Talend’s Global Head of Business Development, said “Data has become a black box that leaves us open to inefficiencies, missed opportunities, and even catastrophic failure. … Talend has developed Trust Score for Snowflake to give Snowflake users an easy way to keep their business safer and bring confidence to every decision.”

Snowflake and Talend say their partnership will make it easier for organisations to better address compliance rules and regulations as well. 

A Talend blog provides more information.

It’s got your back: Cohesity launches DataProtect-as-a-Service in Europe

Cohesity has expanded its Backup-as-a-Service (BaaS) geographical coverage from the USA to Europe, using AWS data centres in the region, with a Disaster Recovery service coming along in a few months or so.

It launched its DataProtect-as-a-Service offering in the USA in October last year, saying it was part of an overall DMaaS (Data Management-as-a-Service) strategy for its product offerings.

Cohesity’s Richard Gadd.

Richard Gadd, Cohesity’s VP for EMEA sales, alerted Cohesity’s partners, saying “Not only does the expansion of our SaaS offering to Europe empower customers to further simplify data management, but also gives our European partners the opportunity to add their unique value and resell the solution through our distribution channels or via AWS Marketplace.”

Cohesity DataProtect-as-a-Service is managed through the Helios cloud-based console and supports data sources such as VMware, Network Attached Storage (NAS), Microsoft 365 SaaS applications, Amazon Relational Database Service, Elastic Compute Cloud instances and compute infrastructure.

Cohesity DMaaS. (Screenshot)

The SiteContinuity DRaaS offering will enable automated disaster recovery of applications and data to the AWS cloud, using on-demand cloud resources instead of a secondary data centre.

Cohesity will still sell its on-premises cluster products and they can be used with its SaaS offerings in a hybrid system and managed through the Helios UI. We can expect more examples of Cohesity’s product offerings moving to the SaaS model, using its preferred cloud provider Amazon, in the next several quarters. We think DataProtect-as-a-Service and SiteContinuity-as-a-Service can also be expected to pop up in other regions of the globe as well, such as Asia-Pacific.

StorONE’s SSD supply crisis survival guide

StorONE has announced there is a current and growing SSD supply crisis, caused by the pandemic and cryptocurrency mining and farming. It has a user survival strategy: use its storage.

It says SSD pricing has increased by 30 per cent, and buyers are seeing eight-week delays so far in 2021, due in part to worldwide manufacturing facility shutdowns caused by the COVID-19 pandemic. Limited production has resumed, but current supply is being stockpiled by the largest technology companies. 

StorONE CEO Gal Naor.

Gal Naor, CEO and co-founder of StorONE, said in a canned quote “We expect the all-flash array segment to dramatically change in the near future due to shortages and price increases, and users need to be prepared for an ongoing crisis. AFAs (all-flash arrays) and other enterprise flash solutions will be very costly and customers need to manage their purchasing carefully.”

Gloomy StorONE says depleted inventories, long waits and production challenges are likely to cause much worse and far more prolonged shortages. It announces that drive vendors expect the high $/TB flash pricing to persist through next year, especially because of bitcoin mining and Chia farming.

What can users do? Make better use of their SSDs by only storing primary data on them — auto-tiering older data off to disk or the cloud. That means using hybrid flash/disk arrays where possible instead of all-flash arrays. They should also look for storage software that can minimise wasted SSD capacity by supporting 90 per cent or higher drive utilisation rates.

That means you should be able to buy less capacity now, bulking up in the future when prices hopefully fall.

Naor says “With such a high price per terabyte, there’s no way to justify purchasing total capacity in advance. Buy it when prices go back to normal, and in the meantime, use tiering to conserve the flash capacity you have.” 

Use SSDs supporting multiple access protocols — NVMe, FC, iSCI, NFS, SMB and S3 — instead of having drives allocated to specific protocols. That should lead to having fewer SSDs overall.

Naor’s view is that “Companies can survive the current crisis by using a storage platform that delivers optimal performance from as few SSDs as possible, rather than relying on all-flash arrays. Staying open and flexible, implementing new storage management tools, and being able to use any commodity flash that’s available will help everyone weather the storm.” 

What storage platform would be suitable? StorONE suggests its own of course, the S1 Enterprise Storage Platform, which can support both AFA and hybrid configurations.

Weka beats DDN, Supermicro Optane and Dell EMC PowerScale in stacks of STAC benchmarks

STAC benchmarks show WekaIO filesystem software running on Amazon cloud EC2 instances can deliver record-breaking tick analytics performance for the financial services industry, beating on-premises DDN EXAScaler, Dell EMC PowerScale and Supermicro Optane systems.

The STAC benchmarks are designed and run specifically for the financial services industry, with audited results by the STAC Benchmark Council. This council consists of over 400 financial institutions and 50 vendor organisations. STAC benchmarks are rigorous, with extensive documentation and none of the potential fudges used in some other storage performance benchmarks. 

Shailesh Manjrekar, Head of AI and Strategic Alliances at WekaIO, said “Capital markets around the world need to analyse more data in less time at the best economics. Public cloud environments were not considered suitable for these workloads due to stringent latency and performance requirements.”

The latest STAC benchmarks show that AWS can be used instead of or augmenting on-premises servers. A supportive quote from Eric Burgener, Research VP, Infrastructure Systems, Platforms and Technologies Group, IDC, said “WekaIO’s latest benchmark results using STAC-M3 show that the vendor can deliver record-breaking performance not only on in-house infrastructure but also in the public cloud using Amazon EC2 NVMe instances.”

Weka STAC history

Weka produced good STAC benchmark results in June 2019, using the M3 baseline Antuco and scaling Kanaga suites for high-speed analytics on time series data running on-premises Penguin servers. It produced more STAC wins in June last year using a setup with 32 HPE Proliant servers.

Now it has repeated its benchmark-topping performance, with the M3 Antuco and Kanaga suites, using AWS cloud compute instances instead of on-premises servers.

Test details

This testing was performed on Amazon EC2 Non-Volatile Memory Express (NVMe) instances using a kdb+ 4.0 database by KX Systems. There were 15 database server nodes and 40 storage nodes.

Weka says its software:

  • Outperformed all publicly disclosed results in three of the five throughput benchmarks in the STAC-M3 Kanaga suite (STAC-M3.β1.1T.{3,4,5}YRHIBID.BPS)
  • Outperformed all publicly disclosed results in three of 24 mean-response-time benchmarks in the STAC-M3 Kanaga suite
  • Was faster in 16 of 24 Kanaga and nine of 17 Antuco benchmarks versus a kdb+ 4.0 solution running on a ten-node cluster with 60TB of Optane persistent memory (KDB200603),
  • Was faster in 20 of 24 Kanaga benchmarks and four of 17 Antuco benchmarks versus a kdb+ 3.6 solution on a DDN EXAScaler parallel file system with 15 database servers accessing all-flash storage appliances (KDB200915),
  • Was faster in 15 of 17 Antuco benchmarks versus a kdb+ 3.6 solution involving nine Dell EMC PowerScale F200 database servers accessing networked flash storage (KDB200914).

Weka says these results show Quants and FSI professionals can run algorithmic trading, quantitative analytics, and back testing use cases in the AWS cloud, getting the low latency they need while taking advantage of AWS elasticity and scalability.

Interested readers can download or view the full KDB210507 STAC report here.

Chia space crypto-mining puts Seagate disk drive revenues in a spin

Disk drive maker Seagate has upped its revenue forecast for the current quarter, with its CFO telling investors there has been “an increase in the crypto farming demand.”

Chia is the basis for the bitcoin-like chiacoin cryptocurrency. It relies on a a linked blockchain of mathematical constructs based on a proof of having disk (or SDD) space over a time period. This is said to be less demanding in electricity supply terms than bitcoin, which relies on a proof-of-work, requiring large amounts of CPU processing.

Yesterday Seagate increased its Q4 revenue and EPS expectations via an SEC filing. Having previously expected revenues of $2.85bn (give or take $150m), it now expects them to be $2.95bn (+/- $150m), a $100m uplift. Expected EPS will also jump, from $1.60 +/- $0.15 to $1.85 +/- $01.5 – a $0.25 increase.

Gianluca Romano

Seagate CFO Gianluca Romano told a Bank of America Merrill Lynch 2021 Global Technology Conference that Seagate was having a good current (fourth) quarter: “We were expecting a very strong near line market demand, both on the cloud and enterprise OEM side, and we see that happening; so really good.” Oddly, “On the legacy side, I would say mission critical consumer are also performing well.”

The chia-related demand is a bonus on top of this. Romano also said the increased demand led to better pricing and better manufacturing capacity utilisation: “This is helping us in utilising some of the capacity now that we’re discussing in the last six to nine months, but was not fully utilised. We had — the industry added too much capacity for a few years.”

Research firm Context said this week that the launch of the cryptocurrency has caused demand for hard disk drives in the European market to blow up, with figures for April showing just under 200,000 enterprise-grade nearline storage drives of 10TB capacity and above sold to end users across the region, a 240 per cent growth compared with the same month in 2020.

Meanwhile, NAS consumer-grade HDDs saw around 250,000 units sold, a year-on-year increase of 167 per cent.

Romano doesn’t know how long the chia demand will last but is happy Seagate can use its manufacturing capacity for longer than expected. Romano also said the chia demand should positively impact all HDD suppliers: “I think is good news, an improvement for the full industry.” That means Western Digital and Toshiba should benefit as well.

While that’s all very well for the firms, it also means higher prices for ordinary business and consumer HDD buyers, ones not involved in chia farming, and ones without long-term supply agreements.

If the chia demand remains strong for another two or three quarters, then Seagate could look to increase its manufacturing capacity.

Seagate shares were priced at $84.05 on May 12 and are now at $100.82, having peaked at $105.2 on May 17.

Dell goes on Epyc Azure Stack HCI excursion

Dell has added second-generation AMD Epyc processors to its Azure Stack HCI systems and made them easier to deploy and manage.

Azure Stack is Microsoft’s Azure public cloud Hyper-V and HCI stack software running in Microsoft partners’ hardware.

Puneet Dhawan.

The news was announced by Dell’s Director of HCI Product Management Puneet Dhawan. Dell has put second-gen Epyc processors in its PowerEdge server-based AX 6515 and AX 7525 nodes. Maybe they’ll get gen 3 Epycs in a future release?

Dhawan blogged: “The high-density, high performance CPUs allow you to handle challenging workloads within a small infrastructure footprint, whether you are running resource intensive workloads with high performance storage needs or running lightweight applications at the edge.”

Dell now factory-installs the Azure Stack HCI OS on the hardware and has added a call-home feature to the software. The Azure Stack systems are managed by a Windows Admin Center extension to which Dell has integrated its Dell EMC OpenManage software. This now provides cluster-aware updating of the OS, BIOS, firmware and drivers.

Dell EMC Integrated System for Microsoft Azure Stack HCI
Dell Azure Stack HCI node.

There are a raft of security features including AMD Infinity Guard, secure erase and secure boot, and the ability to lock the server configuration and firmware.

If customers can withstand the siren calls of VMware vSAN and Dell’s VxRail HCI systems then Azure Stack HCI has just become more attractive.

NetApp goes all-in on hybrid cloud

NetApp appears to making hybrid cloud a core focus, having recently made a slew of product announcements including core storage products, Flexpod converged infrastructure, Keystone subscription and many cloud management services.

It briefed Blocks and Files that its customers wanted to embrace the hybrid cloud by extending their IT capabilities to include the public clouds, and/or migrating workloads and data to them over time. It said users want a unified operational scheme, cloud-style flexible financial arrangements, a choice of where to deploy applications, and help from incumbent suppliers.

Adam Fore, NetApp’s senior director for portfolio marketing, told us: “Every customer has at least one workload in the public cloud. … Most customers expect to stay in the hybrid environment for the foreseeable future.”

NetApp talked up four groups of products:

  • Core hardware and software; ONTAP v9.9, StorageGRID v11.5, FlexPod with Cisco Intersight,
  • Subscription pricing: Keystone with Equinix,
  • Cloud management with Astra, Backup, Data Sense, Manager, Insights and Tiering,
  • Professional services and ActiveIQ.

Core hardware and software

ONTAP is the operating system for NetApp’s FAS (hybrid flash-disk) and AFF (all-flash) arrays.

V9.9 adds:

  • Automatic backup and tiering of of on-premises data to StorageGRID and public clouds,
  • Better multilevel file security and remote access management,
  • Continuous data availability for 2x larger MetroCluster configurations,
  • More replication options for backup and DR for large data containers for NAS workloads,
  • Up to 4x performance increase for single LUN applications such as VMware datastores.

How was the 4x performance gain achieved? A NetApp spokesperson said: “We converted the single-threaded LUN access in distributed SCSI architecture into a multi-threaded stack per LUN. This applies to single-LUN and low-LUN count situations where ONTAP will multi-thread reads/writes per LUN on AFF systems, AFF All SAN Arrays, and FAS systems. 

“When the LUN count is greater than the number of active SCSI threads, then ONTAP returns to single thread per LUN. For the AFF A800, we measured random read performance of up to +400 performance gain for a single LUN running on ONTAP 9.9 compared to ONTAP 9.8.”

An ONTAP blog has more detail.

StorageGRID 11.5 delivers various incremental improvements: S3 Object Lock, support for KMIP encryption of data, usability improvements to ILM, a redesigned Tenant Manager user interface, support for decommissioning a StorageGRID site, and an appliance node clone procedure.

There is a new FlexPod (converged NetApp/Cisco infrastructure reference architecture) generation coming this summer, using the Cisco Intersight (SaaS cloud operations management platform) integration announced in May. This provides full stack monitoring, including ONTAP. There will be new capabilities rolled out over several months, such as intelligent application placement across on-premises and cloud, automated hybrid cloud data workflows, and the ability to consume FlexPod as a fully managed, cloud-like service.

The system will provide automated firmware upgrades and configuration monitoring, workload profiling and guidance, and backup to the public cloud. Supported public clouds are AWS, Azure and GCP.

A FlexPod blog provides more information. 

Keystone Equinix

NetApp is placing its arrays in Equinix co-location data centres, and so providing storage services with fast connectivity to public cloud regional data centres, like Pure Storage and Seagate

The idea is that customers can enjoy having their normal ONTAP storage environment with public cloud compute instances accessing data on the NetApp arrays as if it were local. NetApp claims an approximate 1ms data access latency, with the data not having to be moved into the public clouds.  

The kit can be paid for under NetApp’s Keystone Flex Subscription scheme. It provides a single contract, single invoice, and support for storage and colocation services through NetApp. The supported clouds are AWS, Azure and GCP with NetApp claiming customers can centralise hybrid cloud data management, meaning protection, tiering, visibility, and optimisation, across these clouds

This NetApp Equinix facility is available in 21 Equinix IBX data centres located in 11 countries. Another NetApp blog provides background details.

Cloud management

NetApp is providing a central Cloud Manager console through which six hybrid cloud services can be accessed:

  • Cloud Volumes – ONTAP Cloud Volumes in AWS, Azure and GCP,
  • Cloud Backup as a service for on-premises and in-cloud ONTAP data, with StorageGRID supported both as a source and a target,
  • Cloud Data Sense for data discovery, classification and governance in NetApp’s hybrid cloud,
  • Cloud Insights to visualise and optimise hybrid cloud deployments,
  • Cloud Tiering to move cold data to lower-cost storage, including on-premises StorageGRID
  • Astra, which now supports on-premises Kubernetes-orchestrated container workloads as well as the original in-cloud ones.

Astra has expanded support for ONTAP Cloud Volumes and says a blog will be available to discuss the Astra enhancements.

Comment

NetApp is here presenting itself as a storage-related specialist in hybrid and public clouds.

The company has always produced external, shared file and subsequently block and then object data storage services. It has not historically been a compute supplier, albeit with one diversion into disaggregated HCI via Supermicro servers.

NetApp is effectively betting that its customers will continue to treat compute and storage separately, and is building a hybrid cloud, cloud-operating model and Kubernetes-handling infrastructure on top of its core storage HW/SW systems base. It is building hybrid cloud control and data planes, which feed data to on-premises and in-cloud application compute instances.

These planes also optimise its placement, its protection, security and cost and represent a bet-the-company strategy. It appears that NetApp thinks it cannot grow solely as an on-premises external storage system vendor. It sees its future in the hybrid cloud and has to treat the major public cloud players as partners, even though each one would prefer customers to use their own in-cloud storage facilities.

NetApp’s advantages over AWS, Azure, GCP and the IBM cloud include its multi-cloud, honest broker approach, and its years of storage software experience. It will need to ensure customers see it as a safer data storage bet than its public cloud partners.

I object: Cloudian hooks up HyperStore to AWS Outposts

Cloudian’s HyperStore system been validated by Amazon to run with its on-premises AWS Outposts systems, providing fast and local object storage.

Outposts is Amazon’s fully managed, on-premises version of its AWS cloud, deployed in a converged, rack-level system. HyperStore has achieved an AWS Outposts Ready designation, meaning AWS has fully tested its product running with Outposts.

Joshua Burgin, AWS’ general manager for AWS Outposts, said: “With Cloudian’s HyperStore solution … we’re expanding the workloads AWS Outposts serves by enabling on-prem, S3-compatible storage that meets data residency and latency requirements.”

It’s worth noting that Outposts itself already includes local S3 object storage support; it was added in October 2020, on top of the original support for S3 in the Amazon cloud. Why is Cloudian’s box needed?

Outposts’ local S3 capacity is size limited: users can only add 48TB or 96TB of S3 storage capacity to each rack and create up to 100 buckets. It’s effectively a local cache for S3 in the AWS cloud with the AWS DataSync service moving data to and from AWS cloud regions.

Cloudian HyperStore and Outposts

Cloudian’s HyperStore has a near-unlimited capacity, up beyond petabytes to exabytes. Applications that run in Outposts and use S3 storage execute exactly as before, using the same S3 semantics. They now access S3 objects in the HyperStore namespace and have local access latency, as before, and can meet country or region-specific data sovereignty rules.

Cloudian CMO Jon Toor told B&F in a briefing: “AWS can do more with Outposts by using HyperStore on-premises. … There is no need to move petabytes of data to the cloud. … Now AWS outposts can be used for on-premises use cases they couldn’t satisfy before.”

Toor thinks this is good for financial services, mentioning credit card transactions, and also healthcare, because of data locality rules. Other use cases exist in the media and entertainment, telecommunications and government areas. 

He said: “Customers can now deploy a whole slew of services on-premises and have an easy on-ramp to the cloud.” In effect, “Cloud and on-premises are two sides of the same coin.”

Who needs Snowflake? Dremio’s direct data lake analytics get faster and more powerful

Unicorn cloud data analytics startup Dremio aims to supply analytic processes running directly on data lakes, and says its latest open source-based software release, adding speed and more powerful capabilities, is a step forward in obsoleting data warehouses and eliminating the data warehouse tax.

Its so-called Dart initiative makes Dremio’s in-memory software, powered by Apache Arrow, run faster and do more to save customers time and money. The pitch is that having SQL-based analytics routines run directly on data stored in Amazon S3 and Azure Data Lake means there us no need to pass through an Extract, Transform and Load (ETL) process to load a data warehouse before running analytics processes. Data warehouses, like Snowflake and Yellowbrick Data, have a great deal of functionality and built-in speed. Dremio has to provide both so customers see no need for ETL prepping of data warehouses as a necessary part of running their preferred analytics processes and getting fast query responses.

Dremio founder Tomer Shiran.

Tomer Shiran, founder and chief product officer at Dremio, said in a provided quote: “Enabling truly interactive query performance on cloud data lakes has been our mission from day one, but we’re always looking to push the boundaries and help our customers move faster … Not only are we dramatically increasing speed and creating efficiencies, we’re also reducing costs for companies by eliminating the data warehouse tax without trade-offs between cost and performance.”

Dremio says information held in S3 and Azure Data Lake can be stored and managed in open-source file and table formats such as Apache Parquet and Apache Iceberg, and accessed by decoupled and elastic compute engines such as Apache Spark (for batch processing), Dremio (SQL), and Apache Kafka (streaming).

Apache Iceberg provides data warehouse functionality such as transactional consistency, rollbacks, and time travel. It also enables enable multiple applications to work together on the same data in a transactionally consistent manner.

Dremio supports Project Nessie, which provides a Git-like experience for a data lake, and builds on table formats like Iceberg and Delta Lake to let users to take advantage of branches to experiment or prepare data without impacting the live view of the data. Nessie enables a single transaction to span operations from multiple users and engines including Spark, Dremio, Kafka, and Hive. It makes it possible to query data from consistent points in time as well as across different points in time.

Feature list

Dremio veep Thirumalesh Reddy.

Thirumalesh Reddy, VP of Engineering and Security at Dremio, added his two cents: “There are two major dimensions you can optimise to maximise query performance: processing data faster, and processing less data.” Dremio’s latest software releases does both. Its features are said to include:

  • Better query planning: Dremio gathers deep statistics about the underlying data to help its query optimiser choose an optimal execution path for any given query.
  • Query plan caching: Useful for when many users simultaneously fire similar queries against the SQL engine as they navigate through dashboards.
  • Improved, higher-performance compiler that enables larger and more complex SQL statements with reduced resource requirements.
  • Broader SQL coverage including additional window and aggregate functions, grouping sets, intersect, except/minor, and more.
  • Faster Arrow-based query engine: Arrow component Gandiva is an LLVM-based toolkit that enables vectorized execution directly on in-memory Arrow buffers by generating code to evaluate SQL expressions that uses the pipelining and SIMD capabilities of modern CPUs. Gandiva has been extended to cover nearly all SQL functions, operators, and casts.
  • Less data-read IO: Dremio reduces the amount of data read from cloud object storage through enhancements in scan filter pushdown (now supporting multi-column pushdown into source reads, the ability to push filters across joins, and more).
  • Unlimited table sizes with an unlimited number of partitions and files, and near-instantaneous availability of new data and datasets as they persist on the lake.
  • Automated management of transparent query acceleration data structures (known as Data Reflections).

These features help Dremio’s software process less data and process it faster than before, it is said. Check out the Dremio Architecture Guide here.

Comment

Blocks & Files notes Dremio says it wants to enable data democratisation without the vendor lock-in of cloud data warehouses. In other words, green field users who are not using a data warehouse or not locked in to one can use Dremio’s software to get data warehouse functionality at less cost. They can avoid what Dremio calls the data warehouse tax.

Whether Dremio can actually obsolete data warehouses is another matter, but it’s a nice and clean marketing pitch.