Dell builds its own partner-based data lakehouse

By

-

June 27, 2022

Dell has devised a reference architecture-type design for a combined data lake/data warehouse using third-party partner software and its own server, storage, and networking hardware and software.

Like Databricks, Dremio, SingleStore, and Snowflake, Dell envisages a single data lakehouse construct. The concept is that you have a single, universal store with no need to run extract, transform and load (ETL) processes to get raw data selected and put into the proper form for use in a data warehouse. It is as if there is a virtual data warehouse inside the data lake.

Chhandomay Mandal, Dell’s director of ISG solution marketing, has written a blog about this, saying: “Traditional data management systems, like data warehouses, have been used for decades to store structured data and make it available for analytics. However, data warehouses aren’t set up to handle the increasing variety of data — text, images, video, Internet of Things (IoT) — nor can they support artificial intelligence (AI) and Machine Learning (ML) algorithms that require direct access to data.”

Data lakes can, he says. “Today, many organizations use a data lake in tandem with a data warehouse – storing data in the lake and then copying it to the warehouse to make it more accessible – but this adds to the complexity and cost of the analytics landscape.”

What you need is one platform to do it all and Dell’s Validated Design for Analytics – Data Lakehouse provides it, supporting business intelligence (BI), analytics, real‑time data applications, data science, and machine learning. It is based on PowerEdge servers, PowerScale unified block and file arrays, ECS object storage, and PowerSwitch networking. The system can be housed on-premises or in a colocation facility.

Dell Data Lake — *Blocks & Files diagram*

The component software technologies include the Robin Cloud Native Platform, Apache Spark (open-source analytics engine), and Kafka (open-source distributed event streaming platform) with Delta Lake technologies. Databricks’ open-source Delta Lake software is built on top of Apache Spark and Dell is using Databricks’ Delta Lake in its own data lakehouse.

Dell is also partnering Rakuten-acquired Robin.IO with its open‐source Kubernetes platform.

Dell recently announced an external table access deal with Snowflake and says this data lakehouse validated design concept complements that. Presumably Snowflake external tables could reference the Dell data lakehouse.

With the above Dell graphic, things start to look complicated. A Dell Solution Brief contains more information, along with this table:

Clearly this is not an off-the-shelf system and needs a good deal of careful investigation and component selection and sizing before you cut a deal with Dell.

Interestingly, HPE has a somewhat similar product, Ezmeral Unified Analytics. This also uses Databrick’s Delta Lake technology, Apache Spark, and Kubernetes. HPE is running a Discover event this week, with many news expected. Perhaps the timing of Dell’s announcement is no accident.

NetApp bucks trend for all-flash array revenues

By

Chris Mellor

-

June 27, 2022

All-flash array supplier revenues turned down in the first 2022 quarter, except for NetApp, whose share has grown for six straight quarters.

Such is the picture painted by Gartner in its external storage market report for the quarter. Wells Fargo analyst Aaron Rakers told his subscribers: “The all-flash array market (solid-state arrays) grew 13 percent y/y vs. +14 percent y/y in 4Q21 (62 percent of primary storage rev. and 21 percent of secondary storage), while hard-disk drives and hybrid array revenue declined 6 percent y/y.”

In general the primary storage market eked out growth, up just 2 percent year-on-year (y/y), with the secondary storage market growing four times faster, at 8 percent, while the backup and recovery market went down 2 percent. We envisage that public cloud backup and recovery revenues grew.

Total external storage capacity shipped rose 5 percent year-over-year with the all-flash part of that shooting up 30 percent, accounting for 16.8 percent of all capacity shipped, an all-time high.

A Gartner graph shows how the main vendors fared:

On a year-on-year basis Dell remained top of the external storage heap with a 31 percent share, down from 34 percent a year ago. Its all-flash market share went down from 32.5 percent a year ago to 27.8 percent: $677 million.

NetApp had the second largest all-flash array share, at 22 percent and $536 million, up from the year-ago 20.3 percent. Its total storage revenues were up 6 percent year-over-year and its total external storage market share increased to 14.4 percent from the year-ago 14 percent.

Pure Storage was in third place with a 17.7 percent share of the all-flash market, $432 million, up from 13.2 percent a year ago. Rakers points out that Pure’s flash capacity shipped grew 77 percent while NetApp’s flash capacity shipped was up 12 percent over the same period.

HPE had a poor quarter. Its overall storage revenue went down 8 percent year-on-year to $184 million, and its share declined to 7-8 percent from the year-ago 10.3 percent. From the AFA point of view, its market share of 6.5 percent was down from the year-ago 7.6 percent.

The Gartner vendor market share trends chart shows that IBM lost share while Huawei grew a little and Hitachi was flat. Huawei’s quarter-on-quarter downturn was large but it’s a seasonal pattern and has been seen for five years in a row – indicating it will most probably bounce back.

Interestingly, Pure also exhibits a seasonal Q4 to Q1 downturn pattern and it will likely grow strongly in Q2, like Huawei. Can Pure grow its revenues fast enough to catch up with NetApp?

Nutanix Files 4.1 anti-ransomware: Through the Data Lens

By

Chris Mellor

-

June 27, 2022

We were very interested in understanding more about how Nutanix Files v4.1 ransomware product works so we sent a few questions over and Lee Caswell, SVP of product and solutions marketing at Nutanix, sent his answers back.

Blocks & Files: How does the ransomware detection work? 

Lee Caswell: Data Lens is a cloud-based data governance service that helps customers proactively assess and mitigate security risks. Data Lens applies real-time analytics and anomaly detection algorithms to unstructured data stored on Nutanix Unified Storage platform. The service presents customers with a global risk view of their unstructured data with actionable insights specific to unusual access patterns, data age, and other contextual information, to gain a time advantage in responding to ransomware attacks and insider threats.

Real-time auditing software forwards Nutanix Unified Storage data events, including file reads, writes, renames, and deletes, to the Data Lens service over the Nutanix Pulse framework.  Captured events are used to create audit trails, detect user-defined anomalies, and detect ransomware signature access patterns.  Locally scanned metadata including file extension, file size, file mime type and other attributes are also forwarded to Data Lens.

Data Lens maintains a list of known ransomware file signatures (file names and file extensions.)  It can automatically apply this list to the file blocking capabilities of Nutanix Files to prevent file create and rename events using those patterns.  For customers interested in demonstrating compliance, Data Lens addresses auditing requirements and forensic analysis.

Blocks & Files: Is it a third-party scanner or one built by Nutanix? 

Lee Caswell: Data Lens is developed by Nutanix and the Nutanix scanner runs natively in the Unified Storage File server.

Blocks & Files: How is it updated? 

Lee Caswell: Data Lens is a Software-as-a-Service offering that is updated transparently for customers. Updated ransomware file signatures are applied as validated to the blocking list of the file server in Data Lens releases which are managed by Nutanix. 

Blocks & Files: What ransomware attack patterns does it use? 

Lee Caswell: Ransomware threat detection capabilities are based on file signature blocking and audit pattern detection.   Signature blocking applies a list of nearly 5,000 known signatures to the file blocking capabilities of Nutanix Files. Any file create or rename operations attempted by a client to one of those patterns is automatically blocked in real-time.  

Pattern detection is for unknown variants. Data Lens looks at incoming audit events and applies algorithms to see if any event pattern looks suspicious.  Ransomware attacks typically follow a common pattern, for example, read, overwrite (encrypt), and rename.  Another example is read, write (encrypted new file), and delete.  Data Lens looks for these various patterns, and once detected, additionally then inspects the mime type of the file.  If the mime type reflects a potentially encrypted file, Data Lens will mark the pattern as a ransomware attack.  

Blocks & Files: Is machine learning involved? 

Lee Caswell: Data Lens uses algorithms to detect ransomware patterns, however these patterns are not adaptive or based on machine learning. 

Blocks & Files: If an attack is detected are the affected files identified? 

Lee Caswell:: Yes, all files impacted by the attack are listed. The impacted file list can be downloaded to a .csv or .json file format for reporting or as a list for remediation. 

Blocks & Files: Can they be rolled back to previous known good versions? 

Lee Caswell: Yes, Data Lens checks the file shares to ensure snapshots are enabled (windows previous versions for SMB and .snapshot directories for NFS). Administrators are alerted if snapshots are not enabled for file shares. Impacted files can be rolled back to known good versions using these snapshots.  Administrators or end users via self-service can perform the remediation. We are evaluating how Data Lens could provide automated remediation although this is not currently available. 

Blocks & Files: Are the affected files isolated?

Lee Caswell: No, the files are not isolated by Data Lens today. Nutanix Files integrates with antivirus vendors for scanning offload.  AV vendors can request files to be quarantined as they are scanned, which would isolate the files. 

Blocks & Files: If an attack is detected are further activities from that source user or account halted until an all-clear is somehow generated?

Lee Caswell: Yes, the user account and/or client associated with the attack can be automatically blocked from accessing the file shares. Blocking is configured by the user as part of policy definition and an administrator can later choose to unblock the user or client.

Blocks & Files: If an attack is detected are alerts sent out? To whom and how?

Lee Caswell: Yes, alerts are sent via email to a user-defined list of email addresses. Emails are sent on suspected ransomware events, notification of blocked users or clients, and on detection of user-defined anomaly rules. 

iXsystems adds enterprise features to scale-out NAS

By

Chris Mellor

-

June 27, 2022

iXsystems’ TrueNAS, the popular storage software, is getting high availability and SMB clustering for the enterprise.

TrueNAS says it has more than 1 million installations and >10EB of capacity under management. The scale-out version is called TrueNAS SCALE and is based on Linux and supports Docker Containers, Kubernetes, KVM, and Scale-out OpenZFS.

The updated software supports SMB, Gluster, and NFS File Sharing, iSCSI Block Storage, S3 Object API integration, and Cloud Sync for interoperability with public cloud storage.

*Three editions of one piece of software. There is a migration path from TrueNAS Core and Enterprise to TrueNAS SCALE*

TrueNAS SCALE supports up to 2EB of capacity and 1,000GB/s bandwidth. The scale is indicated by its support of 100 racks, 400 CPUs, and 1,200 drives with clustered M-Series hardware.

Release 2 of TrueNAS SCALE, codenamed Angelfish, adds high availability on dual-controller TrueNAS M-Series systems for scale-out storage, VM, and Container workloads. If one storage controller fails or is taken down for maintenance, the other controller takes over and provides all services.

The upgradable single or dual-controller M-Series start with the M30 (24 x 3.5-inch drive bays) at <$13,000, and runs through the M40 and M50 to the top-end M60. This supports up to 20 PB and 20GB/sec on a single node, and 100 clustered nodes supports the 2EB of capacity.

SMB clustering is the second main addition. With it, clients can connect to any cluster node for access to the cluster’s total capacity and bandwidth; potentially hundreds of petabytes of capacity and terabits of bandwidth per second.

iXsystems TrueNAS — *Note the >15 million downloads of FreeNAS and ZFS*

Fleets of TrueNAS Open Source and Enterprise systems are managed using on-premises or cloud-delivered TrueCommand software. TrueCommand v2.2 has been enhanced to use the additional TrueNAS SCALE APIs and clustered SMB with wizards for creating SMB clusters.

It also has a reworked and faster statistics engine, improved certificate handling for added simplicity, and updated middleware for better error handling and testing frameworks.

Angelfish has been tested by more than 20,000 users in the four months since its initial availability, and is now ready for mainstream production use.

ixSystems TrueNAS SCALE — *TrueNAS SCALE screenshot*

In general, iXsystems is a small and medium enterprise supplier that also counts larger enterprises among its customers. It may surprise you to learn that more than 40 percent of the Fortune 500 are TrueNAS customers. iXsystems is owned by its employees and has no VC funding. Features such as high-availability and SMB clustering will strengthen its enterprise appeal as will the sheer size of its customer base. And there is a roadmap for future functionality additions.

This company is growing its revenues, growing its product, and growing its user base. It’s a force to be reckoned with in open-source storage.

TrueNAS is available as Open Storage software that is downloadable at no cost, or as TrueNAS appliances for a turnkey experience with enterprise-grade features and support.

ASI

By

Chris Mellor

-

June 25, 2022

ASI – Artificial Super Intelligence. A term meaning an artificial intelligence that is more intelligent than an AGI. There is no precise definition of its meaning but it generally refers to greater than human intelligence and would apply generally across the fields to which human intelligence is applied. No such computer systems exist today although possible precursors exist in specific fields where they can out-perform humans, such as playing chess.

AGI

By

Chris Mellor

-

June 25, 2022

AGI – Artificial General Intelligence. This is a term for a proposed extension of current machine learning/generative AI large language models (LLMs) such that they routinely pass Turing Tests and are as smart as an average human being with no hallucinations.

Odaseva points out API bottlenecks in SaaS backup

By

Chris Mellor

-

June 24, 2022

There is an API bottleneck issue in the SaaS application backup area, according to company that provides data protection for Salesforce users.

As with backing up in-cloud applications and data, where users need to go above and beyond the cloud service providers’ basic provision, SaaS application users need to protect their data too. They can do it themselves by developing software to make API calls to the SaaS app to make copies of their data. But there are a set of problems with backing up SaaS applications due to limitations on the various types of API calls. Francois Lopitaux, chief product officer of Odaseva, tells us about them.

Francois Lopitaux, Odaseva — *Francois Lopitaux*

Odaseva is a France-based data protection services vendor for Salesforce customers. Lopitaux joined Odaseva in May 2020 and previously spent 11 years at Salesforce. There he served in a number of product management positions, including VP of product management for the Einstein Analytics product.

Blocks & Files: What’s the situation?

Francois Lopitaux: Enterprise IT has crossed a threshold in which most organizations now realize that they need to protect their SaaS data. In the 2021 Evolution of Data Protection Cloud Strategies report from analyst firm ESG, 64 percent of IT decision-makers surveyed said that they are partially or fully responsible for backing up the data they have in SaaS applications. Of course, that leaves more than one-third (35 percent) who say they depend solely on their SaaS vendor to protect their organization’s data, which is far too high. Because while SaaS vendors do take great care to protect their infrastructure, they don’t typically back up customer data, so if data is accidentally or maliciously deleted, the customer is likely out of luck.

Plus, substantial SaaS outages do happen, even with well-known services. Just recently, Atlassaian suffered an outage in April that wasn’t fully resolved for two weeks. Without backups, that data is completely inaccessible.

Blocks & Files: And what about the SaaS backup API problem?

Francois Lopitaux: Backing up SaaS data is an altogether different beast than traditional, on-premises data protection, in particular because SaaS backup depends on a limited resource: APIs. Obviously, IT does not have full control over the application or the data, because the data is in an offsite service on equipment that an IT team doesn’t manage. So, to get that data, any SaaS backup system will need to access data via APIs. The problem is, API calls are capped, which mean IT teams must ensure they’re choosing the right API for the job, and to complicate matters, APIs change over time. This complexity means that enterprise IT teams must carefully consider API use and strategies for managing them.

Blocks & Files: Are there hard API caps?

Francois Lopitaux: Most SaaS apps operate on a multi-tenant basis, so multiple customers share the same resources, and that includes the APIs. As a result, it’s common for SaaS providers to put a limit on how many API calls any single customer can make in a 24-hour period to ensure that adequate resources are available for everyone, and that individual customers don’t consume a disproportionate amount. The upshot is that IT has a hard cap on how much they can use each API for backup. That’s an important consideration, because these APIs aren’t exclusive to backup – they’re also used for sharing information with other apps that IT has integrated with the SaaS application. And especially in the case of a core SaaS application, such as Salesforce or an ERP, there’s going to be a ton of other applications and services that depend on those APIs.

Blocks & Files: Do you need to use particular APIs for speed reasons?

Francois Lopitaux: It’s critical to select APIs for the data being backed up or restored to optimize for speed. For example, in Salesforce, the REST API can move 1 million records per hour, whereas the BULK API can do 10 million records per hour. And if IT makes parallel API calls to multiplex data out of Salesforce, REST can achieve a maximum of 10 million records per hour, while BULK can hit a maximum of 300 million, depending on the complexity and size of the object. The takeaway is that the choice of API can make an enormous difference to an organization’s recovery point objectives (RPOs). The faster you can perform backups, the more backups you can do in one day.

But that doesn’t mean an organization can depend on BULK APIs alone. In order to pull out and restore all of an enterprise’s data in a timely manner, backup systems must take full advantage of all of the APIs available, such as REST, BULK, BULK V2 and SOAP. After all, some kinds of objects cannot be accessed by BULK APIs, such as share objects, for which IT will need to use the REST API. Plus, BULK APIs are precious to IT organizations, because they’ll almost certainly have other systems that use BULK APIs, and it can be easy to run up against that hard cap. So, IT will need to balance its use of APIs and manage how much they use each.

Blocks & Files: Are there other API problems?

Francois Lopitaux: Just because an API can read data, that doesn’t mean it can write that data, which has enormous implications for restore. It may be that certain objects can be restored with another type of API, but unfortunately, there may very well be some data that cannot be restored due to architectural limitations within the SaaS application itself.

Just because there are limits to how much IT can use each API, however, doesn’t mean that they are set in stone. The actual API limits are based on the customer’s license agreement with the provider. So, it’s important to model out how much IT expects it will need to use each API. And don’t forget about API needs for restoring large amounts of data. Restoration can really eat up API resources, and no one wants to hit an API cap in the midst of a critical restore with business managers chomping at the bit to get back to working in their apps.

Finally, APIs change. SaaS backup systems must adapt to these changes, which adds additional complexity to an already complicated schema for managing and optimizing APIs for data protection.

While organizations are definitely coming to understand the need to back up their data in SaaS services, they may not be aware of how complex it is to build a SaaS backup system that protects all data in a way that meets its RTOs. Carefully considering API use and developing strategies for managing them has to be a critical part of building any SaaS backup solution.

Storage news ticker – June 23

By

Chris Mellor

-

June 23, 2022

…

Data protection and security business Acronis has sponsored another sporting team: Scotland’s Hibernian FC. Acronis will become become Hibernian FC’s Principal Cyber Protection Partner. It will be supported by the expertise of Dunedin IT, which will deliver Acronis cyber protection solutions to improve data storage and access. Dunedin IT will have its logo printed on the lower back of the men’s first team home, away, and third kits for the start of the 2022/2023 season. An Acronis #TeamUp initiative is open to all service providers looking for innovative ways to grow their business with Acronis and tap into “the exciting world of sports marketing.”

…

Data catalog and intelligence startup Alation, founded in 2012, has received a strategic investment from HPE’s venture capital arm, Hewlett Packard Pathfinder. The amount is secret. Alation enables enterprises to operationalize their data assets, which leads to improved data understanding and operational efficiency. The partnership between Alation and HPE will enable enterprise CIOs to drive comprehensive governance of their assets and manage, catalog, and utilize data to deliver increased value to customers. Read a blog to find out more. Alation will exhibit at HPE’s Discover conference from June 28–30 in Las Vegas.

…

Amazon Web Services (AWS) and Axiom Space announced the successful deployment of an AWS Snowcone SSD device in space in Axiom’s Ax-1 mission. This small, rugged edge computing device helped astronauts analyze data from research experiments onboard the International Space Station (ISS). This technology demonstration marked the first time that AWS has remotely operated a Snowcone on the ISS, and it showed it is possible to analyze data at a new edge location: Space. The AWS Snowcone SSD weighs less than five pounds and is smaller than a standard-sized tissue box. The device proved durable enough to provide AWS compute, storage, and networking capabilities on board the ISS, fully disconnected from any ground facilities.

Long term, AWS anticipates that future space missions will incorporate advanced cloud-based technology to support on-orbit research needs. Ax-1 is one of several private space missions Axiom Space has planned to the ISS as it builds the world’s first commercial space station. See The Register‘s coverage here.

…

Tecton acts as a central source of truth for machine learning (ML) features, and automatically orchestrates, manages and maintains the data pipelines that generate ML features. It has announced a partnership with data lakehouse supplier Databricks to help organizations build and automate their ML feature pipelines from prototype to production. Tecton is integrated with the Databricks Lakehouse Platform so data teams can use Tecton to build production-ready ML features on Databricks in minutes. A blog has more information.

…

Hitachi America is using the SaaS-based Druva’s Data Resiliency Cloud to recover from PC breakdown. If a PC breaks down, Druva enables Hitachi America to migrate data to a new device and gives it the the ability to wipe data remotely on lost or stolen devices and recover entire devices. Hitachi America’s legal custodians can also store data by date range and file type, preserve data across endpoints, and identify legal hold data across users and devices. A case study has more details.

…

High-end enterprise array supplier Infinidat will roll out a global version of its partner portal in July, rebuilt from the ground up to train and equip solution providers worldwide to sell Infinidat’s platforms. It also announced it has integrated its STaaS offering into Arrow Electronics’s ArrowSphere in North America. Eric Herzog, Infinidat CMO, said: “We’re streamlining and simplifying the partner experience to boost channel participation and success.”

…

lakeFS, an open source, git-like version control interface that delivers resilience and manageability to object-storage based data lakes, is launching lakeFS Cloud – a fully managed SaaS version of its open source technology. It enables the fast creation of predefined workflows essential for managing surging amounts of data – including assuring data reliability – within every enterprise. lakeFS is for users who prefer a fully managed service instead of managing the lakeFS infrastructure themselves on their data lakes and is available in the AWS marketplace. A blog says more.

…

Micron is sampling the world’s highest capacity microSD card – the i400 – to customers at an unprecedented density of 1.5 terabytes, designed with the world’s first 176-layer 3D NAND. The Register covered the story.

*Micron’s mis-labelled 1.5TB i400 microSD card.*

Micron has not said if the i400 uses TLC or QLC NAND. We have asked – and it uses QLC (4bits/cell) NAND.

*128GB 176-layer 3D NAND die – 69.96mm² in size. Micron stacks 12 of these into the i400 1.5TB microSD card.*

…

Data protection supplier N-able has added a Standby Image feature as part of the company’s recently launched Cove Data Protection and anti-ransomware offering. It says this makes it easy to create, manage, and report on virtual server images in the partner’s location of choice, ready for fast and flexible disaster recovery, without an expensive proprietary appliance. With the addition of Standby Image, Cove creates a safe replica of each backup, ready for fast on-demand recovery in a secondary location, thus combining a cost-effective, cloud-first architecture with comprehensive DRaaS.

…

Nvidia is co-founding a Linux Foundation project to democratize DPU innovation, becoming a founding member of the Linux Foundation’s Open Programmable Infrastructure (OPI) project. Nvidia also announced it’s opening the APIs to the NVvidia DOCA libraries in support of open networking solutions. An Nvidia blog says the OPI project aims to create a community-driven, standards-based, open ecosystem for accelerating networking and other datacenter infrastructure tasks using DPUs. Developers will be able to create a common programming layer to support DOCA open drivers and libraries with DPU acceleration.

Founding members of OPI include Dell Technologies, F5, Intel, Keysight Technologies, Marvell, Nvidia and Red Hat. OPI will help establish and nurture an open and creative software ecosystem for DPU and IPU-based infrastructures. The OPI Project seeks to help define the architecture and frameworks for the DPU and IPU software stacks that can be applied to any vendor’s hardware offerings. The OPI Project also aims to foster a rich open source application ecosystem, leveraging existing open source projects such as DPDK, SPDK, OvS, P4, etc., as appropriate. For more info, see The Register‘s coverage.

…

Kubernetes cloud-native persistent storage supplier Ondat has added Alex Jones to its advisory board. He serves as Kubernetes engineering director at Canonical, contributes to the CNCF TAG App Delivery as Tech Lead, and has more than a decade in engineering leadership roles at Microsoft, JPMorgan, American Express and British Sky Broadcasting. Alex Chircop, founder and CEO of Ondat, said: “Alex brings us an extraordinary combination of technical depth and market savvy.” Jones joins recent additions to the advisory board Lisa-Marie Namphy and Cheryl Hung.

…

Multi-cloud data manager and file sharer Panzura appointed two new members to its board of directors: Julie Parrish, chief marketing officer at Corelight, and Stephen Singh, experienced global vice president of M&A, ITO Strategy, Planning and Implementation at Zscalar. Panzura, led by a female CEO, now has three female and four male directors, furthering its proactive gender equality initiatives.

…

Edge hyper-converged infrastructure supplier Scale Computing is partnering cloud storage supplier Wasabi, which offers a low-cost cloud storage target as a data retention and archival target for Scale Computing’s HCI platform.

…

Studio Network Solutions (SNS) – a supplier of media production workflows, off-prem and on-prem storage servers, and remote cloud solutions for professional media teams – announced a corporate relationship with Major League Soccer’s Portland Timbers and the National Women’s Soccer League’s Portland Thorns FC. The agreement leverages the high-performance EVO shared storage server and its included EVO Suite of software tools to accelerate video production workflow for Portland’s professional soccer teams. EVO enables multi-user collaboration for professional video production. Every EVO shared storage solution comes with the included EVO Suite – a collection of media asset management, automations, remote editing, and cloud workflow tools.

…

The Toshiba board is analyzing eight takeover bids, says Reuters, valuing the company at up to $22 billion, with selected potential investors given due diligence opportunities to examine Toshiba’s books after its June 28 shareholders meeting. Bidders could include KKR & Co, Blackstone, Bain Capital, Baring Private Equity Asia, Brookfield Asset Management, MBK Partners, Apollo Global Management and CVC Capital Partners plus Japan Investment Corp., Japan Industrial Partners and Polaris Capital Group.

…

Enterprise and heritage data protection supplier Veritas is partnering Kyndryl, which styes itself the world’s largest IT infrastructure services provider. Kyndryl has chosen Veritas’s data management portfolio to extend its framework of protection and cyber resilience to its enterprise customers as a fully managed service – “Protection and Cyber Resiliency, Powered by Veritas.” This includes:

Security assurance services – security, strategy & risk management, offensive security testing, and compliance management;
Zero trust services – identity & access management, endpoint security, network security, application & workload security, data protection & privacy, and analytics, automation & orchestration;
Security operations and response – advanced threat detection, incident response and forensics;
Incident recovery services – cyber incident recovery, managed backup services, hybrid platform recovery and datacenter design & facilities.

More information here.

DNA data storage wants to standardize

By

Chris Mellor

-

June 23, 2022

The DNA Data Storage Alliance has joined the Storage Networking Industry Association (SNIA) as a Technology Affiliate to begin technical work and standards development, and speed the development of an interoperable ecosystem, for DNA data storage.

The DNA Data Storage Alliance group was formed in 2020 by Illumina, Microsoft, Twist Bioscience Corporation and Western Digital. These four suggest DNA storage can deliver low-cost, high-density archival storage, with 10 full length digital movies fitting into a volume the size of a single grain of salt. Another example; the volume of space inside an LTO tape cartridge is estimated to hold 100,000 times the number of DNA-bits as an LTO-9 tape in that same cartridge.

Steffen Hellmold, senior veep for Business Development, DNA Data Storage, at Twist Bioscience, issued a quote: “Joining SNIA as a Technology Affiliate is taking DNA to the big stage, signaling a maturity of DNA data storage technology and the fact that it is now ready for the next phase of the market evolution. Now is the time to build on the growing momentum and scale up the ecosystem to service the projected data storage demand within this decade.”

*Catalog DNA storage machine prototype.*

SNI exec director Michael Oro sang off the same hymn sheet: “Industry work on standards development is a pivotal moment for new technologies. It is the key indicator for multi-vendor investment and support for the technology. This critical step assures adopters of the technology to have broad choice of innovative and interoperable multi-party solutions.”

DNA data storage encodes binary data (base 2 numbering scheme) into a 4-element coding scheme using the four DNA nucleic acid bases: adenine (A), guanine (G), cytosine (C) and thymine (T). For example, 00 = A, 01 = C, 10 = G and 11 = T. This transformed data is encoded into short DNA fragments and packed inside some kind of container, such as a glass bead, for preservation. One gram of DNA can theoretically store almost a zettabyte of digital data – one trillion gigabytes. Such fragments can be read in a DNA sequencing operation. They are tiny and can theoretically last for thousands of years or more.

The DNA Storage Alliance Technology Affiliate will aim to standardize interfaces, methods, and best practices to create a global interoperable ecosystem from which both solution providers and clients will benefit. A technical group will develop physical standards, logical and software standards and quality standards to ensure the customers get the best systems.

Check out the DNA Data Storage Alliance website for more information.

Nutanix ups ransomware shields with Files 4.1

By

Chris Mellor

-

June 23, 2022

Nutanix says its Files v4.1 release provides a large spectrum of ransomware protection capabilities for Unified Storage customers, claiming it’s a turnkey anti-ransomware product that will roll out to its current Nutanix Files customers.

Ransomware, it says, will cost businesses around $265 billion annually by 2031, when Cybersecurity Ventures expects a new attack every two seconds. Nutanix Files has integrated ransomware protection and Files 4.1 has ransomware detection and WORM support. A so-called full spectrum of ransomware protection is delivered by Files in conjunction with Nutanix’s Data Lens, a SaaS-based data analytics management plane offering.

We contacted sources close to Nutanix and found out that ransomware detection is based on Nutanix’s own scanner looking through a file system for known ransomware signatures, almost 5,000 of them, obtained from a third-party, crowd-sourced database. If a ransomware pattern is found, alerts can be sent out. Options include full file system recovery, through one or more file shares recovery, down to individual file recoveries.

Two other things come into play here. One, there have to be earlier clean versions of files to recover from and, two, there has to be a way of finding the scope of an attack, finding affected files, and deciding what to do. The earlier clean versions of files can be created in three ways:

SSR (Self-Service Restore) snapshots enabling end-users to access previous read-only versions of their files
Backups by third parties such as Commvault, HYCO and, soon, Veeam
Replicas – share-level replication of snapshots between file server instances for disaster recovery

The Data Lens service can look at what individual users are doing with individual files. A ransomware attack will often penetrate a customer’s systems through an email attack and so compromise a particular users and their machine, a laptop or desktop. It will then start encrypting files in that user’s share. This pattern of activity, different from the user’s normal activity level, can be detected through the Data Lens by setting up policies to alert admin staff if anomalous behavior happens, such as exceeding a threshold level of file system write access activity in a period of time.

If anomalous patterns are detected, a user can be restricted to read-only file access or even locked out. Alerts can be sent to whomever needs contacting and Data Lens facilities used to size and scope the attack.

The obvious risk here is that you get false positives, but judicious policy-setting can help reduce these.

Nutanix and its customers are relying on skilled admin staff setting up policies and procedures that reflect an organization’s structure, culture, and risk level. Nothing is 100 percent foolproof; risk reduction is what’s being offered here.

There isn’t a dedicated anti-ransomware structure within Data Lens – no anti-ransomware module as it were. Instead its features can be used, in conjunction with file system scanning, to provide a system that can detect ransomware attacks and help the user recover from them by restoring last known good versions of files.

Data Lens looks at all the storage options in Nutanix‘s Unified Storage and we can expect that the features in Files 4.1 will be rolled out as appropriate to Nutanix Volumes (block) and Nutanix Objects.

Switchless network supplier Rockport switches CEOs

By

Chris Mellor

-

June 23, 2022

Co-founder Doug Carwardine has left the CEO spot at Rockport Networks, along with co-CEO Marc Sultzbaugh, with new hire Phil Harris drafted in from Intel’s datacenter group, and given a growth remit.

Update: Rockport statement added – 23 June 2022.

Rockport has developed a switchless and flat network for interconnecting servers, storage and clients, with each endpoint effectively being its own switch. It announced product availability in October last year when it came out of stealth mode operations.

In December that year it gained $8 million in funding, led by Northern Private Capital, and a co-CEO structure with board member Sulztbaugh joining Carwardine in the CEO office.

Andrew Lapham, co-founder and CEO, Northern Private Capital, issued the announcement quote: “With his industry experience, strategic vision, and commercial success, Phil is ideally suited to lead Rockport as the company enters its next stage of growth. Disruptive technologies demand passionate leadership making his energy, creativity and focus a perfect match. On behalf of the entire investment group, we thank both Doug Carwardine and Marc Sultzbaugh for their past leadership guiding the company to its current level.”

Phil Harris’ CV shows that he was most recently a VP and GM for Intel’s Data Center Systems and Solutions Group Including full P&L responsibility, getting that role in May 2018 and leaving it in February this year. Sandra Rivera is now the EVP and GM of Intel’s Datacenter and AI Group.

Before Intel, Harris was in VP and SVP roles at at Riverbed, BMC Software, and Cisco, with stints at VCE and Cisco again before that. His Cisco time speaks to a networking skillset and the Intel datacenter experience is perfectly relevant to Rockport’s datacenter market ambitions.

A Harris quote said: “We’re entering a new age of high-performance networking where the industrialization of HPC, the shifting sands of AI and demands of composable infrastructure are changing the way we approach our greatest compute, storage, and environmental challenges. Our approach to tackling the fabric of the future is a complete gamechanger.”

The relationship between Rockport’s switchless fabric and the developing east-west traffic focused DPUs and SmartNICs will be an interesting area to watch, not least because Harris will have a good knowledge of Intel’s IPU product in that space. From the composability point of view Rockport may need to play nice with PCIe-based technologies (heading towards CXL) from suppliers such as Liqid.

It looks as if the co-CEO structure was a temporary thing set up while Rockport’s board looked for a single, permanent CEO. Presumably Carwardine will revert to a full-time R&D role which was a part-time responsibility when he was co-CEO.

A Rockport spokesperson said: “As with all startups, for every season there’s a CEO. At this next stage of growth and commercialization, Phil’s deep data center experience was a great match. When the opportunity presents itself, an organization must be open to change. … we took advantage of the moment. Both Doug and Marc fully supported this decision and have worked closely with Phil during the transition.”

Panasas has become an MLCommons collaborator to develop a machine learning storage benchmark

By

Chris Mellor

-

June 22, 2022

HPC storage supplier Panasas is working with MLCommons on how best to measure machine learning (ML) storage performance, develop an ML storage benchmark, and help develop a next generation of storage systems for ML.

MLCommons is an open engineering consortium which set up the MLPerf benchmark in 2018. This is a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. There are more than 50 founding partners – including global technology providers, academics and researchers. MLCommons says it’s focused on collaborative engineering work that builds tools for the entire machine learning industry through benchmarks and metrics, public datasets and best practices. It promotes widespread ML adoption and democratization through benchmarks, large-scale public datasets, and best practices.

David Kanter, founder and executive director of MLCommons, provided a supportive quote: “The end goal of the MLPerf Storage working group is to create a storage benchmark for the full ML pipeline which is compatible with diverse software frameworks and hardware accelerators.”

Panasas said it approached MLCommons to discuss the storage challenge in the ETL (extract, transform, and load) process and its impact on the overall performance of the ML pipeline. At that point MLCommons had been in the early stages of forming an MLPerf Storage working group to develop a storage benchmark that evaluates performance for ML workloads including data ingestion, training, and inference phases.

MLCommons invited Panasas to attend the foundational meetings, after which Curtis Anderson, a Panasas software architect, was named co-chair of the MLPerf storage working group. That was actually back in March – the announcement took a while to come out. He will be working with the group to define standards for evaluating the performance of storage subsystems that feed AI/ML environments and develop a storage benchmark that evaluates performance for ML workloads including data ingestion, training, and inference phases.

The group’s deliverables are:

Storage access traces for representative ML applications, from the applications’ perspective – initial targets are Vision, NLP, and Recommenders (short-term goal);
Storage benchmark rules for:
- Data ingestion phase (medium-term goal);
- Training phase (short-term goal);
- Inference phase (long-term goal);
- Full ML pipeline (long-term goal);
Flexible generator of datasets:
- Synthetic workload generator based on analysis of I/O in real ML traces, which is aware of compute think-time (short-term goal);
- Trace replayer that scales the workload size (long-term goal);
User-friendly testing harness that is easy to deploy with different storage systems (medium-term goal).

Kanter said “I’d like to thank Panasas for contributing their extensive storage knowledge, and Curtis specifically for the leadership he is providing as a co-chair of this working group.”

There are two other co-chairs: Oana Balmau, assistant professor in the School of Computer Science at McGill University, and Johnu George, a staff engineer at Nutanix. We don’t have access to the working group membership list, but we have requested it. Having three co-chairs suggests it is quite large. Any storage supplier looking to feed data to machine learning applications could well be interested in joining it. For example, Dell EMC, DDN, HPE, IBM, Infinidat, Intel (DAOS), MinIO, NetApp, Pure Storage, StorONE, VAST Data and Weka, plus the main public cloud providers.

NewsPaperStorages and File System News

NewsPaperStorages and File System News

Dell builds its own partner-based data lakehouse

NetApp bucks trend for all-flash array revenues

Nutanix Files 4.1 anti-ransomware: Through the Data Lens

iXsystems adds enterprise features to scale-out NAS

ASI

AGI

Odaseva points out API bottlenecks in SaaS backup

Storage news ticker – June 23

DNA data storage wants to standardize

Nutanix ups ransomware shields with Files 4.1

Switchless network supplier Rockport switches CEOs

Panasas has become an MLCommons collaborator to develop a machine learning storage benchmark

ABOUT US

FOLLOW US