Storage news roundup – 19 June

By

-

June 19, 2023

Open-source data integrator Airbyte has launched checkpointing, column selection, and schema propagation features, now available on both Airbyte Cloud and Airbyte Open Source. By creating a saved state of a data stream at regular intervals, checkpointing allows the system to resume from the most recent checkpoint in the event of a failure, avoiding data loss or redundant processing – essential for large-scale data integration tasks.

The column selection feature enables users to select specific columns from a source for replication, rather than being required to integrate the entire object or table. The schema propagation feature automatically adopts changes from the source schema to the destination schema, saving users from manually making these changes

…

Data protector Asigra has signed up with Direct 2 Channel (D2C), a North American distributor of technology for MSPs and other solution providers. Asigra is making its Tigris Data Protection cloud backup software, used by thousands of MSPs globally, available to the hundreds of channel partners in the D2C network.

…

CloudFabrix, the inventor of Robotic Data Automation Fabric (RDAF) and the “Data-centric AIOps” leader, announced the launch of Observability Data Modernization(ODM) and Composable Dashboards services for Cisco’s FSO (Full Stack Observability) platform. The FSO platform ingests, stores, and provides a unified query engine for Open Telemetry signals in a relationship-based entity model. CloudFabrix showcases the extensibility of the platform, with its patent-pending Robotic Data Automation Fabric(RDAF).

…

SaaS data protector Cobalt Iron has received a patent on its adaptive, policy-driven data cyber inspection technology. US patent 11663362, granted on May 30, introduces new policy-based approaches for effectively validating data integrity using multiple cyber inspection tools. The technology will be available as part of the company’s Compass enterprise SaaS backup platform.

…

Cloud database supplier Couchbase announced its first quarter fiscal 2024 financial results. Total revenue for the quarter was $41.0 million, an increase of 18 percent year-over-year. Subscription revenue for the quarter was $38.5 million, an increase of 21 percent year-over-year. Total ARR was $172.2 million, an increase of 23 percent year-over-year as reported and on a constant currency basis.

…

Cloud filesystem service supplier CTERA has launched a new partner program for resellers, MSP and system integrators to become trusted advisors in secure multi-cloud data management. There are three tiers: Elite, Premier and Select. More info here.

…

Ascend said it had deepened the integration of its Data Pipeline Automation platform with Databricks to further enable transparency, collaboration, and productivity. Joint customers can now take full advantage of Databricks as a cloud data compute platform in Ascend’s new second-generation remote data plane architecture, which allows users to leverage Ascend’s DataAware control plane within their Databricks lakehouse. Ascend has released a new full-featured developer tier for Databricks so that Ascend customers can take advantage of the latest Databricks capabilities.

…

Delta Lake lakehouse supplier Databricks is buying Rubicon, a startup working on building storage systems for AI. The team behind Rubicon, including founders Akhil Gupta and Sergei Tsarev, is joining Databricks. Gupta started at Google almost two decades ago, where he was responsible for the ads infrastructure, which is one of the largest scale use cases of AI on the planet to date. Tsarev cofounded Clustrix, one of the first scale-out OLTP database systems for high value, high-transaction applications. The two met at Dropbox, where they helped build, among other things, one of the world’s largest, most efficient, and secure storage systems which served trillions of files and exabytes of data.

…

Privately-held Databricks announced its it surpassed the $1 billion annual recurring revenue milestone and its sales rose by more than 60 percent in the fiscal year ended in January. Databricks data warehouse offering, Databricks SQL generated $100 million in annualized recurring revenue in April, says Bloomberg. Databricks thinks it is the fastest-growing software company in the world. An IPO lies some time ahead but there is no rush.

…

DataStax, the real-time AI company, announced a GPT-based schema translator in its Astra Streaming cloud service that uses generative AI to transfer data between systems with different data structures within an enterprise. Astra Streaming is an advanced, fully-managed messaging and event streaming service built on Apache Pulsar . It enables companies to stream real-time data at scale, delivering applications with massive throughput, low latency, and elastic scalability from any cloud, anywhere in the world. Translator automatically generates “schema mappings” when building and maintaining event streaming pipelines.

The new GPT Translator is available immediately, for no additional cost, as part of DataStax Astra. Initially, this capability connects event data in Astra Streaming to Astra DB to simplify streaming pipelines and data integration processes, with further connections to come. Learn more about how the DataStax GPT Schema Translator helps to build and maintain streaming pipelines here.

…

DDN business unit Tintri announced BlueHat Cyber, an infrastructure and managed security services provider, implemented Tintri VMstore as the backbone of its Infrastructure as a Service (IaaS) and Disaster Recovery as a Service (DRaaS) business. Tintri technology allows BlueHat Cyber to offload tadministrative storage tasks and replicate across data centres so it can focus on providing its clients with better service.

…

HPE has been busy on the ESG front and released its annual Living Progress Report for fy2022. It claims that in just two years, HPE’s operational (Scope 1 & 2) emissions decreased 21 percent from its 2020 baseline, toward a target of 70 percent in 2030. To achieve this, it surpassed its 2025 target of sourcing 50 percent renewable electricity three years ahead of schedule. Overall Scope 3 emissions remained constant year-over-year despite a 2.6 percent increase in net revenue and HPW continues to focus on decoupling the growth of its business from emissions.

…

An IBM blog says Storage Virtualize has Inline Data Corruption Detection. This uses using Shannon Entropy here to calculate an entropy level of the data streams it’s receiving – in Shannon’s terms, deriving an ‘information level’ in the data streams. We’re told the information content, also called the surprisal or self-information, of an event E is a function which increases as the probability p ( E ) of an event decreases. When p ( E ) is close to 1, the surprisal of the event is low, but if p ( E ) is close to 0, the surprisal of the event is high.

Storage Virtualize is sampling the destage streams from the cache to determine the overall Shannon Entropy nature of that stream – which may be a signal that a ransomware or other data corruption event is taking place. The information is being streamed to IBM Storage Insights where ML and end reporting/alerting can be performed. The idea is that over time it builds a data lake of signals and use ML to understand these over time, train models that can act in near real time to then alert you the end user, or more likely your SIEM.

…

IBM says that, with Storage Scale 5.1.8.0, Storage Scale Data Management Edition and Storage Scale Erasure Code Edition entitle clients to use Storage Fusion Data Cataloging Services. These provides unified metadata management and insights for heterogeneous unstructured data, on-premises and in the cloud. Storage Scale has file system level replication by using the AFM to S3 cloud object storage. The FAQ has been updated.

…

Intelligent data management cloud supplier Informatica has bought UK-based data access control supplier Privitar. The acquisition price was not revealed. Privitar has raised over $150 million in funding since founding in 2014.

…

France-based Kalray announced tape-out of Coolidge2, a new version of its 3rd generation Coolidge MPPA (Massively Parallel Processor Array) DPU processor. The Coolidge2 processor has been designed to deliver higher performance in the processing of artificial intelligence (AI) and data in general. It has one of the best ratios for performance/energy consumption/price in the inference and edge computing markets, with up to 10x the performance of its predecessor. Kalray is already working on its 4th generation DPU processor, Dolomites.

…

Kinetica, which supplies a real-time, vectorized, analytic database for time-series and spatial workloads, has expanded its partner ecosystem in the United States, Europe and Asia. It added strategic global systems integrators to its partner roster, including Expero, Riskfuel and Semantic AI. This comes after it announced record business momentum of 90 percent Annual Recurring Revenue (ARR) growth, Net Dollar Retention Rate (NDRR) of 163 percent and doubling of its customer base.

…

Kioxia America announced that its NVMe/PCIe and SAS SSDs have been successfully tested for compatibility and interoperability with the Microchip Technology’s Adaptec HBA 1200 Series, SmartHBA 2200 Series host bus adapters (HBAs) and SmartRAID 3200 Series RAID adapters.

…

Data orchestrator and manager Komprise says its software supports Pure’s new FlashBlade//E and Purity 4.1, complementing existing support for FlashBlade//S and FlashArray Files. It says Pure Storage already resells Komprise Elastic Data Migration to deliver unstructured data migrations into Pure FlashBlade and FlashArray environments. Komprise is used by Pure Storage Professional Services teams for data migrations. Customers can use Komprise to transparently tier from FlashBlade//S to FlashBlade//E and the cloud to tackle unstructured data growth while cutting cold data costs. Learn more here.

…

According to Reuters, Micron, being harassed by the Chinese state cyberspace regulator for potentially putting China’s security at risk and revented from selling product in China, has said it’s committed to China and would invest 4.3 billion yuan ($603 million) over the next few years in its chip packaging facility in the Chinese city of Xian. Perhaps this will sweeten the Chinese regulator. Around half of its sales to China-headquartered clients may be affected by the cybersecurity probe, affecting 10-15 percent of its global revenue.

…

NetApp released its annual 2023 State of CloudOps report, which found that only 33% of executives are “very confident” in their ability to operate in a public cloud environment, an increase from 2022 when only 21 percent reported feeling very confident. 64 percent of IT decision makers continue to see security and compliance as the top cloud operations challenge, followed by cost management, which was cited as the top challenge by 60 percent of respondents. The biggest areas of focus for improving cloud operations continue to be cost management and security, according to 66 percent of technology executives. Spot by NetApp has offerings to help.

…

Data management supplier Nodeum announced a distribution agreement with VAD Climb Channel Solutions. Nodeum’s software automates data movement and abstracts different hybrid storage tiers: NAS, Cloud, Object & Tape Storage.

…

Oracle reported good fyQ4 results, ahead of estimates, with total fy revenue of $50.0 billion, an all-time high, up 18 percent, and Q4 revenue of $13.8 billion, up 17 percent. Q4 profit was $3.32 and fy23 profit was $8.5 billion. Cloud Revenue (IaaS plus SaaS) was $4.4 billion, up 54 percent. Cloud license and on-premise license revenues were down 15 percent. “Oracle’s Gen2 Cloud has quickly become the number 1 choice for running Generative AI workloads,” said Oracle chairman and CTO, Larry Ellison. “Why? Because Oracle has the highest performance, lowest cost GPU cluster technology in the world. NVIDIA themselves are using our clusters, including one with more than 4,000 GPUs, for their AI infrastructure.”

…

The PCI-SIG standards group, is announcing at its yearly DevCon that it will be making available the initial specification of the latest update: PCIe gen 7.0. Every version of PCIe is designed to be backwards compatible. Gen 7 has a 2x transfer rate increase over PCIe 6.0 with 128 gigatransfers per second. There is the same PAM4 signaling and FLIT mode encoding as PCIe gen 6.

…

Rubrik has appointed Andres Botero as its chief marketing officer. He comes from BlackLine, a Nasdaq-listed SaaS modern accounting supplier, where he was responsible for driving BlackLine’s strategy and global marketing. During his tenure, BlackLine more than doubled its ARR. Prior to BlackLine, Andres was the CMO of CallidusCloud (acquired by SAP) and Aria Systems. Previously, Botero held leadership roles at SAP and Siebel Systems (acquired by Oracle). He also sits on the Board of Advisors at Sendoso.

…

Seagate has released a fancy gaming disk drives: the Starfield Special Edition Game Drive and Game Hub for Xbox. The 2TB or 5TB Game Drive and 8TB Game Hub display a design that feels pulled directly from the Settled Systems of Starfield with customizable RGB LED lighting. They use USB 3.2 GEn 1. The Starfield Special Edition Game Drive for Xbox is available in capacities of 2TB ($109.99) and 5TB ($169.99) and the Seagate Starfield Special Edition Game Hub for Xbox is available in an 8TB ($239.99) capacity.

…

SSD controller supplier Silicon Motion Technology is an NXP Registered Partner. The two combine Silicon Motion’s Ferri embedded SSD tech and SM768 graphics display SoC with NXP’s offerings in automotive electronics to meet the requirements of the automotive and industrial markets.

…

Application high availability (HA) and disaster recovery (DR) supplier SIOS Technology has partnered with Zepto Consulting, which will resell SIOS DataKeeper and SIOS LifeKeeper products across Southeast Asia to help customers achieve high availability for applications, databases, and file storage both on-premises data centers and in the cloud.

…

SpectraLogic has partnered with Titan, a specialist distributor offering end-to-end data management solutions and cybersecurity services. The relationship spans the EMEA region. Titan will distribute the entire Spectra product portfolio, including object storage and NAS with the BlackPearl Platform, StorCycle enterprise software for digital archive, and the company’s full range of tape libraries to its resellers, targeting horizontal and vertical markets such as media and entertainment, life sciences, and HPC.

…

Europe backup supplier StorWare is partnering DataCore so its Backup and Recovery product can write to DataCore SWARM object storage to provide faster backup and recovery times. Storware Backup and Recovery provides a unified console for managing backup jobs, while DataCore Swarm provides a single view of data across multiple locations and storage types.

…

Storware has joined the OpenInfra Foundation as a Silver Member. It wants to play a pivotal role in advancing the success of OpenStack, the flagship open source cloud computing platform supported and sponsored by the OpenInfra Foundation.

…

Taiwan-based memory provider TEAMGROUP launched the PRO+ MicroSDXC UHS-I U3 A2 V30 Memory Card. It meets the Application Performance Class A2, UHS Speed Class 3, and Video Speed Class V30 standards and has read and write speeds of up to 160 MB/s and 110 MB/s, respectively. It’s compatible with a wide range of mobile devices and cameras, and allows consumers to experience high-definition 4K videos and photos.

Datadobi charts course for AI with StorageMAP

By

Chris Mellor

-

June 16, 2023

Datadobi used to concentrate on one-off projects like finding files for migration and moving them to a different filer. Finding files has evolved into regular scanning and migration targets include cloud object storage for AI processing.

The company’s StorageMAP technology scans and lists a customer’s entire file and object storage estates, enabling customers to find out what they have in their distributed unstructured data silos. This helps them to manage their data more effectively, moving old data to archives, in the cloud for example, or deleting unwanted data.

We have seen a blurring of the boundaries between file and object storage, with file systems getting S3 wrappers, export capabilities and even object underpinnings, like with Nasuni. Object storage is getting file access layered on top. Conversion between the two formats is not something Datadobi sees in a general way, though.

Even the cloud providers can be seen as downplaying object in favor of file. Datadobi co-founder and chief revenue officer Michael Jack said AWS “is doubling down on replacing datacenter file processing.” It has a large storage team and has been recruiting from the major datacenter file storage suppliers. “If you want to take out large datacenters it has to be file.”

Jack said in a briefing that “NAS to object is not really happening.” He asked whether there is real value in this conversion and said: “What is valuable about object storage apart from cost? We don’t see cost analysis being done in any significant way.”

CTO Carl D’Halluin said an intrinsic value of object storage was scale since it is not burdened with the file:folder directory structure of file storage. At large scale – billions of files – object storage is simpler to access because of this. He said: “We see NAS-to-object copy for further processing. For example, analysis with cloud AI tools.” He expects generative AI to drive this trend higher.

Jack chimed in that copy to the cloud “is not about cost. They need it in the cloud where cloud-native apps can run against it… It’s a workflow thing.” Such apps and tools are typically cloud-native and built to access object storage, not file. Object on flash storage provides the IO speed such apps need. He said a Datadobi AWS partnership is focused on generative AI use cases.

He suggested that a customer could send 100PB or more of file data to cloud object storage for such processing. Datadobi’s customers need to identify which data to send in this situation, and for that they need to know what data they have and where it is. Enter StorageMAP.

Carl D'Halluin, Datadobi — *Carl D’Hailluin*

D’Hailluin said that if you use apps like Varonis to crack open files and detect personally identifiable information (PII), it is resource-intensive and can only run on a small subset of files. You can’t run against 100PB data sets. But StorageMap can, he told us. It doesn’t actually crack open the files, instead it classifies files, such as HR ones, and then this subset can be given to Varonis to crack open.

Due to this broad scanning capability, StorageMAP is becoming a standalone utility, finding data for analysis, replication, PII tracking and analysis, for example. It’s multi-vendor and hybrid cloud capable. “Every customer is multi-vendor and hybrid cloud,” Jack said.

Datadobi licenses StorageMAP on a subscription basis and so earns recurring revenue. Dobi Migrate is pay-per-use, which fits its typical one-off project usage.

In Australia, new regulations mandate that critical data must be secured against ransomware. Backup apps such as Rubrik can backup the data to an immutable vault but they can’t scan the customers unstructured data estate to find the critical data. Jack said this is a job that StorageMAP can do and so works with data protection apps like Rubrik.

He added, alluding to the company’s rivals: “We’ll stay out of the data path because customers don’t want another lock in.”

Jack commented: “Migration has always been about defeating lock-in. We move files from vendor A to vendor B.”

Pure Storage sneaking up on NetApp in AFA Olympics

By

Chris Mellor

-

June 16, 2023

Analysts have ranked Pure Storage third in revenue share in the all-flash external storage market in a placement that shows its sales just under those of storage giant NetApp.

The revenue share numbers were calculated by Gartner and Wells Fargo Securities and presented to subscribers by analyst Aaron Rakers. The total all-flash array (AFA) market was sized at $11.1 billion and a pie chart showed how the various players ranked:

Dell EMC was first with a quarter share, followed by NetApp with a 17 percent share. Pure’s 16 percent puts it close behind NetApp, with fourth-placed Huawei having 13 percent. Then we have, in declining order, IBM, HPE and Hitachi with Others making up 8 percent.

This group will include relative newcomer VAST Data, which said in March this year it “went from $1 million in annual recurring revenue (ARR) to $100 million in ARR within three years of selling.”

Rakers says that Pure’s revenue share in all-flash primary storage stood at ~15.8 percent in 2022, according to Gartner estimates. This is a good amount of growth when you compare to Pure’s revenue showing in 2020 and 2021: ~12 percent and ~14 percent respectively.

Pure’s fiscal year starts in early February so its fiscal 2023 revenues, $2.75 billion, approximate calendar 2022 revenues. Gartner and Wells Fargo say $1.775 billion of that was all-flash storage revenues, with about $1 billion coming from elsewhere, subscription services for example.

We can compare NetApp and Pure’s quarterly revenue histories to compare their growth rates:

The chart is normalized to NetApp’s fiscal quarters. We can see that Pure Storage’s revenues have been growing quite consistently while NetApp’s have been trending flat, allowing Pure Storage to outgrow NetApp. On this basis Pure’s all-flash revenues may overtake NetApp’s equivalent number by 2025, putting it in second place in the AFA market by revenue share, behind leader Dell EMC.

But, as NetApp has recently upgraded its all-flash array products, its current lead may well be maintained.

VAST Data’s Elemental Pixarification for Disney

By

Chris Mellor

-

June 15, 2023

VAST Data’s single tier flash storage has been used by Pixar to feed data to the 150,000 cores needed to render the Elemental movie – six times more cores than Pixar needed for its previous film, Soul.

Pixar is an animation studio owned by Disney. Elemental is viewable in US theatres from June 16 onwards, and is a romantic story showing relationships between cartoon characters who represent different elements such as earth, air, fire and water, who live in Element city, and whose intrinsic nature means they cannot mix. But hey – this is a movie and anything is possible.

Eric Bermender, head of data center & IT infrastructure at Pixar, said. “Elemental is the most technically complex film that Pixar has ever made.” It was developed with “new techniques that no one had ever considered before because we didn’t have the technology in place to support it.”

*Fire and water characters in a scene from* Elemental

Pixar had to develop new visual effects algorithms to render the characters in the movie and, Bermender said, “VAST’s technology has allowed us to change the way we store and access data while also opening the door to new potential visual pipelines.”

Pixar uses its own Renderman software to compute the color of every pixel in every frame of its 3D animated movies.

Typically, an object’s surface is made out of triangles, with the three defining points being in the same plane. Computer special effects characters, such as the clone army fighters in Star Wars: Episode II – Attack of the Clones, are generally solid in that they have non-transparent surfaces and clear boundaries between them and their background. Their surfaces are represented by thousands – maybe hundreds of thousands – of connected triangles which are given pixel values.

*Clone soldiers in* Star Wars: Episode II – Attack of the Clones

They are movable objects which have shape and texture and color, and on which light shines and reflects. The video effects software, when rendering these movable objects for a frame in the movie, needs to model the effects of light on hundreds of thousands of such surface items for each object (character).

Volumetric rendering is different. In general, volumetric rendering is used to display in 2D a set of individual 3D samples from a 3D volume – such as a CAT scanner. The process needs to define the opacity and color of very data point on the 3D grid formed by the samples – voxels in the trade. Such voxels could be given RGBA (red, green, blue and alpha) values and then projected onto a pixel in a frame buffer.

Getting back to Elemental, the characters, made of fire and water, are translucent and don’t have clearly defined surfaces. Light rays that hit the characters pass through them – they are only partly reflected. As a Pixar document explains: “The boundary behavior of the volume containers is not relevant to the render: rays will not bend.”

*Fire and water characters in* Elemental.

This means that, if we were to imagine a succession of vertical 2D slices through them, all the pixels on the slices would have color and be visible, to an extent, behind the slices in front of them from the camera’s viewpoint. That makes the computation of the overall object’s pixels on a 2D screen much more complex.

Each of the surface triangles involved in the previous type of rendering would have to have their pixel values affected by the succession of interior pixel values behind them, as it were – increasing the number of individual pixel values involved in the computations enormously.

Pixar said that, unlike the geometric surfaces and materials used in earlier Pixar projects such as Soul, the volumetric animation methods used in Elemental created six times the data footprint and computational demands for data. There were, we’re told, 150,000 volumetric frames in Elemental.

The 150,000 compute cores used to render the movie needed access to an overall 7.3PB of data in a cluster of VAST Data storage nodes using a single namespace. The Soul movie needed a mere 24,000 cores in comparison.

At peak rendering usage time, the VAST cluster had to provide direct access to 2PB of data – whereas previous Pixar movies needed access to just 300 to 500TB of capacity. And the VAST storage kit was also being used by artists at Pixar working on other movies at the same time.

Jeff Denworth, co-founder of VAST Data, said: “For Elemental and future films, we’re delivering a data platform that powers the animation and rendering workflows for their most data-intensive and computationally heavy projects, while enabling its AI and ML pipeline for the future in order to further the ambitions of Pixar artists and the stories they’re able to tell.”

Soul was not as successful as either DIsney or Pixar hoped. Elemental will, hopefully, restore Pixar’s reputation for producing great animated movies.

A Pixar report discusses the volume rendering in Elemental in more detail if you want to find out more. It discusses how Pixar updated and developed earlier volume rendering approaches to work on Elemental.

Dremio lines up hat-trick of AI enhancements

By

Chris Mellor

-

June 15, 2023

Dremio is embracing generative AI with a three-step process adding Text-to-SQL, Autonomous Semantic Layer, and Vector Lakehouse functionality to its product.

The company supplies open-source Dremio Cloud lakehouse technology, with data persisted in Apache Iceberg format tables using Apache Parquet’s columnar data format. The lakehouse combines the capabilities of a more structured data warehouse and less structured data lake with self-service SQL analytics. Dremio’s view is that data warehouses rely on extract, transform and load (ETL) procedures to get data from different sources into the warehouse for subsequent analysis. With a lakehouse, data from multiple sources can be amalgamated into a data lake with no need for the ETL procedures to precede analytical processing.

Tomer Shiran, co-founder and Dremio’s chief product officer, said: “Generative AI will transform data engineering, data science and analytics over the coming years, and we are excited to provide our users with the industry’s most powerful tools to uncover the true potential of their data.”

These include an intuitive Text-to-SQL experience, allowing users to have their natural language queries converted into SQL within the user interface. This is based on a semantic understanding of metadata and data, which ensures more accurate SQL generation. Dremio says automatic correction of SQL queries is coming soon as well.

The Autonomous Semantic Layer – we’re told – is software that automatically learns the intricate details of users’ data then produces descriptions of datasets, columns and relationships by using generative AI. There will be no need for manual cataloging. The software will autonomously learn workloads and create “reflections” to accelerate data processing, the company says, providing users with an AI-powered semantic layer for better data insights.

Dremio also supports vector embeddings, the data schema needed for generative AI processing, in its lakehouse. It says this will enable users to store and search vector embeddings directly and provide a foundation for users to build machine learning applications such as semantic search, recommendation systems and anomaly detection within the Dremio platform.

All analytics software will soon regard vector embedding schema support and a ChatGPT-like interface as table stakes – witness SingleStore and Zilliz. Dremio wants its AI adoption to go deeper than being just a GUI layer.

Dremio’s Text-to-SQL is available today while Autonomous Semantic layer and Vector Lakehouse capabilities are coming soon.

Western Digital HAMR tech ’18 months away’

By

Chris Mellor

-

June 15, 2023

Western Digital could have HAMR disk drives in volume production in 12 to 18 months, according to the company’s CFO. He also denied that SSDs will make HDDs obsolete.

This is unexpected as Western Digital has been prioritizing its MAMR (microwave-assisted magnetic recording) technology over HAMR (heat-assisted magnetic recording), saying HAMR will likely follow MAMR. Now it’s suggesting MAMR is a relatively short-term stopgap technology. Competitor Seagate has a 30TB HAMR drive coming this quarter and will be volume shipping HAMR drives in three to four quarters.

Wissam Jabre, Western Digital — *Wissam Jabre*

Wissam Jabre told a Bank of America analysts meeting: “On the HAMR side, we’re probably a year to 1.5 years plus before we get sort of volume production.” That’s a fairly wide time span and could actually be 12 to 24 months.

Western Digital is currently shipping 22TB conventional disk drives and 26TB shingled drives. A 30TB Seagate HAMR drive gives it an 8TB advantage over the 22TM WD product. Western Digital must be bringing out a >22TB conventional drive using its MAMR/Optinand tech but is unlikely, in our estimation, to be able to manage a leap to 30TB. It has to move to HAMR, it appears, because MAMR has a limited technology runway in capacity terms.

Jabre also answered questions about SSDs killing hard disk drives, prompted by Pure Storage’s marketing points around its QLC (4 bits/cell) flash arrays. He said: “I think hard drives will still be shipping in five years plus. Look, there’s this whole discussion around where the flash prices are today, [whether] there’s cannibalization of hard drive or not. We don’t see it. And we do sell both.”

“In fact, over the last quarter or two, probably the enterprise SSD business was a little bit more impacted than the hard drive business,” meaning sales were lowered more.

What about HDD and SSD cost declines over the next three to five years? He said: “I think the cost declines on the HDD side would still be there. They won’t be as high as the cost declines we would experience necessarily on the flash side as an industry, but they’ll still be there. And they will still be there in a meaningful way to allow for that TCO to continue decline and make it economic for our customers to adopt them.”

GigaOm gurus say three Cs lead large enterprise hybrid cloud backup

By

Chris Mellor

-

June 14, 2023

GigaOm has published findings showing Cohesity, Commvault and lesser known Cobalt Iron are “outperforming” other suppliers when backing up data in large enterprise hybrid clouds.

The GigaOm Radar for Hybrid Cloud Data Protection for Large Enterprises report looks at suppliers of hybrid (on-premises + public cloud) data protection specifically for large enterprises, including cloud and managed service providers, which will have the gamut of SaaS apps, virtual desktops and NAS filers. They need services on top of basic data protection such as cyber resiliency, data management and governance, analytics and disaster recovery orchestration.

Analysts Max Mortillaro and Arjan Timmerman, who wrote the report, said: “The intensity and impact of ransomware attacks is now so high that cyber resiliency capabilities in data protection solutions are crucial and shouldn’t be considered optional. Data protection solutions are often the last line of defense against a ransomware attack, and enterprises are looking at ransomware protection capabilities with increased scrutiny.”

Their report contains a Radar diagram with most of the 18 vendors clustered as innovative platform players (lower right quadrant). A second and smaller cluster contains mature and less innovative suppliers in the upper right quadrant. Baracuda and Clumio are outlying players in the lower left area. Here’s the diagram:

Cohesity and Commvault are the clear leaders, with Cobalt Iron in third place. All three are classed as outperformers. The other leaders are Veritas, Dell, HYCU, and Druva.

Rubrik is a Challenger that’s almost in the Leader category, as is Veeam. Both are poised to enter the Leaders ring. HYCU is the most innovative supplier.

You can get a copy of the report from Cobalt Iron here.

Bootnote

GigaOm’s Radar diagram places suppliers across a series of concentric rings, with those set closer to the center judged to be of higher overall value. The chart characterizes each vendor on two axes – balancing Maturity versus Innovation and Feature Play versus Platform Play – while providing an arrow that projects each solution’s evolution over the coming 12 to 18 months.

The rings, starting from the outside, represent new entrants, then challengers and lastly leaders. The inner circle is always left blank.

Pure intros fresh FlashArray features for enterprise

By

Chris Mellor

-

June 14, 2023

Pure Storage announced improved FlashArray hardware and software, an unstructured data storage FlashArray system, and a ransomware recovery guarantee at its Las Vegas Pure Accelerate event.

FlashArray is Pure’s line of all-flash unified block+file storage arrays for applications needing fast data IO with high-performance //X models and tier 2 workload FlashArray//C products. These now get a capacity-optimized //E model, similar in design to the FlashBlade//E.

Pure chairman and CEO Charles Giancarlo said: “We are now delivering the industry’s most consistent, modern, and reliable portfolio that can address all enterprises’ storage needs. As we enter a new age of AI, the superior economics, and operational and environmental efficiencies of Pure’s product portfolio over both hard disk and SSD-based, all-flash competitive offerings will be more critical to our customers than ever.”

Pure Storage FlashArray — *FlashArray//E (top), //X (middle) and //C (bottom)*

Field CTO Alex McMullan told us: “This is building for colossal scale part two. It is the sequel to the previous, hopefully blockbuster news that we had on this particular topic,” referring to the FlashBlade//E news and coming 300TB Direct Flash Modules.

The existing FlashArray//X and //C models are upgraded to release 4 with the //X10 entry-level product being retired. Pure said: “For customers on Evergreen//Forever, support will transition to the FA//X20 models over time as they become eligible for their ever modern upgrades.”

R4 FlashArray models have Intel’s Sapphire Rapids Xeon SP processors, DDR5 memory, and PCIe gen 4 to provide an up to 40 percent performance boost, over 80 percent increased memory speeds, and a 30 percent inline compression boost to stretch storage capacity. FlashArray//X uses 36TB TLC (3bits/cell) Direct Flash Modules (DFMs), Pure’s flash drives with built-in NVRAM. FlashArray//C will get 75TB QLC (4bits/cell) DFMs, delivering 1.5PB per rack unit, a 106 percent increase in density/RU. For comparison, the largest off-the-shelf SSDs will be Solidigm’s 60TB drives due later this year.

These QLC DFMs are scheduled to double in capacity to 150TB in 2025 and then double again to 300TB in 2026. The TLC DFMs have no such doubling and redoubling in their public roadmap.

There is a new FlashArray//C90 product, topping the existing //C50 and //C70 models. The //C90’s maximum effective capacity, assuming a 5:1 data reduction ratio, is 8.9PB from a 6RU chassis. The //XC70 tops out at 4.8PB.

FlashArray//E

Pure has also added a FlashArray//E using the same chassis and 75TB QLC DFMs to bring capacity-optimized, unstructured data storage capability to FlashArray. Its raw capacity ranges from 1 to 4PB and a slide showed its maximum effective capacity, set at a 2:1 data reduction ratio is 6PB, less than the //C90’s 8.9PB.

Pure told us that the whole FlashArray portfolio uses the same Purity operating system software. By the virtue of which they use the same dedupe and compression engines, and therefore similar workloads on FlashArray//E should get the same average as FlashArray//C and FlashArray//X over time.

Pure said that FlashArray//E will enable customers to benefit from an 80 percent reduction in power and space, 60 percent lower operational costs, and 85 percent less e-waste compared to disk arrays. It costs less than $0.20 per GB with three years of support.

Chief product officer Ajay Singh said: “Pure Storage, with the expansion of the Pure//E family of products, is eliminating the last remnants of disk in the enterprise.”

Customers can deploy FlashArray//E and FlashBlade//E through either a new //UDR service tier of Pure Storage Evergreen//One Storage-as-a-Service (STaaS) subscription or through the Evergreen//Flex asset utilization payments for customer-owned arrays.

Ransomware recovery SLA

The Evergreen//One ransomware recovery SLA is a purchased an add-on service guarantee that, in the event of a ransomware attack on a Pure array, a clean replacement array will be shipped to the customer.

The thinking is that, after an attack, storage arrays are often locked down for forensic investigation by cyber insurance or law enforcement, leaving organizations unable to recover data to infected arrays. By guaranteeing clean arrays, Pure enables customers to recover faster.

Evergreen//One guarantees a next business day window to ship clean storage arrays, 48 hours to finalize a recovery plan started at any time, a data transfer rate (8TiB/Hour), and a professional services engineer onsite through RMA (Return Merchandise Authorization).

Evergreen//One also gets new AIOps features, including enhanced data structure change anomaly detection, data protection assessment and multi-factor authentication.

Pure said that its FlashBlade hardware is GPU Direct Storage (GDS) ready, with software enhancements delivering complete GDS support to be available in the near term.

Zilliz offers access to database to combat AI hallucinations

By

Chris Mellor

-

June 14, 2023

Startup Zilliz has made its open source Milvus vector database available to use – on various pricing tiers starting with a freemium offering – as it attempts to grow amid the generative AI frenzy.

Generative AI processing accesses myriad quantitative measures of audio, image and other data formats, measuring weird and wonderful digital aspects of them. These are collectively called vector embeddings and need a special database schema. Zillix has developed its vector-specific Milvus database while other suppliers, such as SingleStore, are adding vector capabilities to their multi-format databases. The company claims Milvus is the fastest vector database on the planet, speeding large language model (LLLM) development.

Founder and CEO Charles Xie said: “Generative AI is going to change everything. But first we have to trust it. Developers of all sizes will need Zilliz Cloud to power their generative AI applications. And with our new pricing they can.”

Zilliz says the Achilles’ heel of LLMs is their tendency to hallucinate and make things up. It says this problem can be minimized by using an external Zilliz Cloud database of domain-specific data. By providing the LLM with correct information stored in the Zilliz Cloud, it can be made to deliver answers dependable enough for business use.

There are four pricing tiers:

Starter and free service tier – no charge
Price-optimized plan for apps not having low latency or large volume needs
Standard plan from $65/month for teams with <5 engineers
Enterprise plan from $99/month for at-scale apps and organizations

The new software release also features:

API support for Python, JS and also RESTful APIs.
Organizations and Roles so users can manage access and permissions for their team
Dynamic schema support so schemas can be customized with specific fields or attributes
Open-source Benchmarking tool to measure Zilliz Milvus performance against other offerings

You can read more about the Zilliz Cloud release on its blog.

DDN revenues ride high on generative AI wave

By

Chris Mellor

-

June 13, 2023

DDN says it shipped more AI storage in the first quarter of this year than in all of 2022 and claims it expects to boost revenues further.

The company has a long history of supplying storage arrays for supercomputing and high-performance computing. AI-type applications need GPU processing, which means enterprises have started using HPC-class storage. DDN has ridden this wave and has been working with Nvidia and its GPUs, like the DGX, for four years or more.

Kurt Kuckein, DDN’s VP for Marketing, told an IT Press Tour: “We’re behind over 5,000 DGX systems today.”

Previously B&F has learned that DDN has about 48 AI400X2 arrays supporting Nvidia’s largest SuperPODs. DDN has reference architecture papers for Nvidia’s POD and SuperPOD coming out in the next couple of weeks

Generative AI is, Kuckein said, driving DDN’s expansion.

DDN has recently launched an all-flash EXAScaler array, the AI400 X2, using SSDs with QLC (4bits/cell) flash and hopes its combination of flash speed and capacity – 1.45PB in a 2RU chassis – will prove popular for generative AI use cases. The AI400 X2 provides up to 8PB of TLC (3 bits.cell) and QLC NAND in 12RU with a head chassis containing TLC drives, and 2 x QLC drive expansion chassis.

There is linear scaling to 900GBps reads from 24RU of storage – 12 x AI400X2 appliances with 90GBps/appliance. These 12 appliances also provide 780GBps writes. DDN says that, with large language model work, compute overlaps data transfer and faster transfer means compute completes faster.

Startup VAST Data has been showing off its SuperPOD credentials by announcing that Nvidia has certified its QLC flash array as a SuperPOD data store. It claims that its Universal Storage system is the first enterprise network-attached storage (NAS) system approved to support the Nvidia DGX SuperPOD.

DDN’s customers, however, also include data-intensive organizations and enterprises.

Kuckein told his audience that “DDN gives you 39x more writes per RU than a prominent newcomer to Nvidia data supply,” without naming the newcomer. He added that DDN provides “50x more IOPS/RU, 10x more writes/rack and 6x more capacity/watt.”

DDN says its EXAScaler can provide up to 8PB in 12 RU of space. A VAST Data Ceres-based system would need 10PB for 10 x 807.8TB 1 RU storage enclosures and, say, 4U more for 4 x CNodes (Controller nodes), making 14RU in total. When the Ceres box gets 60TB SSDs later in the year, that rackspace total will drop to 5 x 1.6PB 1RU enclosures plus 4RU for the controllers again – 9RU in total and just one more than DDN.

A VAST source suggested that arguing about who can fit the most flash in a rack unit doesn’t make any more sense than arguing over how many angels can dance on the head of a pin because customers don’t care. They run out of power and cooling at each rack long before all the rack slots are filled. If I have 12KW per rack do I care If that’s 24U or 32U of kit in a 42U rack?

DDN’s density comes at the price of having a lot of SSDs behind a small number of controllers. In 12U they probably only have six controllers in three HA pairs. That’s going to bottleneck the 130 60TB SSDs they need to get to 8PB.

A VAST system is going to need fewer SSDs because it claims to have more efficient erasure codes and better data reduction, and VAST customers can vary the number of CNodes to match the cluster’s performance to their applications. If that flexibility means VAST uses a few more rack units, that seems a good trade to VAST.

DDN says it will provide new search and tagging in the AI fabric in the second half of 2023. This software is technology to understand the representation of data in different modalities used by AI applications.

VAST wants to knock DDN off its Nvidia DGX perch and reckons its Universal Storage has the edge over the Lustre-powered EXAScaler. DDN, claiming to be the world’s largest privately owned storage supplier, does not want to let that happen. Pure Storage also has skin in this QLC-based storage for AI game, as does NetApp. We’re entering a multi-way fight for AI storage customers. Expect claim and counter-claim as supplier marketing departments become hyperactive.

AI inferencing feels the need – the need for speed

By

Travis Vigil, Dell Technologies

-

June 13, 2023

Commissioned: Speed and performance often play an outsized role in determining outcomes of many competitions. The famed Schneider Trophy races of the early 20th century offer a classic example, as multiple nations pushed the boundaries of speed in feats of aerial supremacy.

Italy and United States produced strong showings, but it was Britain’s Supermarine S.6B seaplane that secured victory in the final race, setting a then world speed record of over 400 miles per hour. Quaint by today’s standards with jet fighters topping Mach 3, but a marvel for its time.

Like the famed Schneider Trophy races, the scramble for AI supremacy is also a competition where high speed and performance are critical.

This is particularly salient for generative AI, the emerging class of technologies that use large language models to process anything from text to audio and images. Also, like its AI predecessors, generative AI relies on high-quality training data and its next phase, known as inferencing.

Why inference matters for predictions

AI Inferencing works like this: After a machine learning model is trained to recognize the patterns and relationships in a large amount of labeled data, the model takes new data as input and applies the learned knowledge from the training phase to generate predictions or perform other tasks. Depending on the model (or models), the input data could include text, images or even numerical values.

As input data flows through the model’s computational network, the model applies mathematical operations. The final output of the model represents the inference, or prediction, based on the input.

Ultimately, it takes a combination of the trained model and new inputs working in near real-time to make quick decisions or predictions for such critical tasks as natural language processing, image recognition or recommendation engines.

Consider recommendation engines. As people consume content on ecommerce or streaming platforms, the AI models track the interactions, “learning” what people prefer to purchase or watch. The engines use this information to recommend content based on the preference history.

Using generative AI models, businesses can analyze purchase history, browsing behavior and other signals to personalize messages, offers and promotions to individual customers. Nearly a third of outbound marketing messages enterprises send will be fueled by AI, according to Gartner.

To ensure that these engines serve up relevant recommendations, processing speed is essential. Accordingly, organizations leverage various optimizations and hardware acceleration to facilitate the inference process.

Why generative AI needs speedy hardware

Generative AI is a computation-hungry beast. As it trains on massive data sets to learn patterns, it requires significant processing firepower and storage, as well as validated design blueprints to help right-size configurations and deployments.

Emerging classes of servers come equipped with multiple processors or GPUs to accommodate modern parallel processing techniques, in which workloads are split across multiple cores or devices to speed up training and inference tasks.

And as organizations add more parameters – think millions or possibly billions of configuration variables – they often must add more systems to process the input data and crunch calculations. To accommodate these larger data sets, organizations often interconnect multiple servers, creating scalable infrastructure. This helps ensure that AI training and inferencing can maintain performance while handling growing requirements.

Ultimately, powerful servers and reliable storage are critical as they facilitate faster and more accurate training, as well as real-time or near-real-time inferencing. Such solutions can help organizations tap into the potential of generative AI for various applications.

Speedy algorithms will win the day

There’s no question the Schneider Trophy aerial races of last century left their mark on the history of aviation. And just as those multinational races underscore how competition can fuel surprising advancements in speed and engineering, the AI arms race highlights the importance of technological innovation driving today’s businesses.

Organizations that ride this new wave of AI will realize a competitive advantage as they empower developers with the tools to build smarter applications that deliver material business outcomes.

As an IT leader you should arm your department with the best performing inferencing models along with the hardware to fuel them. May the best generative AI algorithm(s) – and models – win.

Learn more about Dell Technologies APEX is fueling this new era of AI inferencing.

Commissioned by Dell Technologies

VergeIO launches IOfortify ransomware defense

By

Chris Mellor

-

June 13, 2023

VergeIO has added an IOfortify feature, claiming it provides attack detection and recovery in seconds.

Enterprise or MSP tenants of a VergeIO datacenter run VMs in software-defined hyperconverged infrastructure (HCI) combining compute, storage, and networking. VergeIO calls this “ultraconverged” infrastructure. The VergeOS controlling software presents these nodes to tenants and protects the VMs, meaning systems, applications, and files. Immutable clones are made, globally deduplicated and stored, then used as the basis for recovery in the event of ransomware attacks.

Greg Campbell, VergeIO founder and CTO, said: “100 percent of VergeIO customers who were affected by ransomware have successfully restored their entire system to a secure state within a matter of minutes. Those looking for unbeatable ransomware protection should embrace the future of fortified data integrity by building their infrastructure on VergeOS with IOfortify.”

The IOfortify software detects attacks by monitoring the deduplication process. When ransomware encrypts files, it causes significant and detectable unique data writes due to VergeIO’s global deduplication technology. VergeIO makes no mention of machine learning being used in this detection and nor does it provide any estimated time between attack start and detection.

When IOfortify detects an attack, it issues an immediate alert giving the customer an opportunity “to act fast to prevent it and activate our rapid restoration services.” VMs issuing the file encryption IOs can be blocked. VergeIO’s software indicates which clone is the best candidate for rapid recovery with minimal data loss.

VergeIO says clones, “IOclones” in its parlance, are similar to snapshots in that they include metadata, since global inline deduplication is part of the metadata in VergeOS. It can create space-efficient clones of VMs, volumes, or entire virtual datacenters in milliseconds regardless of capacity, we’re told. They can be used to restore VMs or complete virtual datacenters.

It claims that IOfortify enables customers “to get back on track in a matter of moments” and “ensures data remains safe and ensures complete recovery of any infected virtual machines.”

IOfortify is integrated into VergeOS and is available now at no additional charge to VergeOS customers.

NewsPaperStorages and File System News

NewsPaperStorages and File System News

Storage news roundup – 19 June

Datadobi charts course for AI with StorageMAP

Pure Storage sneaking up on NetApp in AFA Olympics

VAST Data’s Elemental Pixarification for Disney

Dremio lines up hat-trick of AI enhancements

Western Digital HAMR tech ’18 months away’

GigaOm gurus say three Cs lead large enterprise hybrid cloud backup

Pure intros fresh FlashArray features for enterprise

FlashArray//E

Ransomware recovery SLA

Zilliz offers access to database to combat AI hallucinations

DDN revenues ride high on generative AI wave

AI inferencing feels the need – the need for speed

VergeIO launches IOfortify ransomware defense

ABOUT US

FOLLOW US