Panasas has halved the size of its ActiveStor Ultra parallel file system node to make a lower-cost edge datacenter box.
The ActiveStor range runs scale-out PanFS, with NFS and SMB/CIFS protocol support, and optimized performance to match workflow needs. The OS protects against drive, node, and site failures. PanView and PanMove services provide data visibility and movement capabilities.
Tom Shea, Panasas CEO, said: “Our customers require storage infrastructures that can dynamically adapt to a changing mix of both AI/ML and traditional HPC workflows, but also have the capacity to support workflows across core data centers, remote satellite locations, and into the cloud.“
ActiveStor Ultra Edge 100 enclosure with 12 drive slots
“The reduced footprint of ActiveStor Ultra Edge delivers the required performance, reliability, and ease of management that Panasas is known for, making it ideal for smaller and remote HPC datacenters.”
The Ultra 100 Edge system comes in a 2RU box and our table shows how it stacks up against the other ActiveStor systems:
There is one physical enclosure that contains four nodes and two redundant 2200W titanium-level power supplies. There are two chassis versions. A Smart enclosure has one director node and three storage nodes. An Expansion enclosure has four storage nodes. The director node processes file system metadata, coordinates the activities of the storage nodes and Direct-Flow software drivers for file access, manages membership and status within the storage cluster, and controls all failure recovery and data reliability operations. Both node types contain servers with one M.2 NVMe SSD, and run the PanFS file system. The storage nodes use 8TB or 16TB disk drives.
The system then comes in two standard configs, Minimum and Base, varying in storage capacity:
Panasas claims the system is simple to operate, needing part-time attention from a single person, and costs less than its other products. View it as an HPC storage appliance for smaller-scale workloads in the genomics, cryo-FM, and decentralized instrumentation areas.
We view Panasas, at least in the enterprise AI/ML market, as being positioned with smaller systems than DDN’s Lustre-using ExaScaler systems, although there is a fair degree of overlap. Panasas has indicated it did not favor QLC flash in the past, preferring disk. DDN has recently adopted QLC flash drives and Panasas may be evaluating them too. Generative AI workloads may be so valuable to customers that they’ll pay for QLC flash over disk.
You can access an Ultra Edge 100 datasheet here and a solution brief here.
Analysis: A deeper look into Dell’s APEX announcements in Vegas this week reveals a deal with Databricks for Dell storage to feed its lakehouse, the use of a stopgap block service, and both PowerScale file and PowerProtect being migrated to AWS.
Making Dell’s file and block storage available in the public cloud is the goal of Project Alpine, announced a year ago. This was part of its APEX Multi-Cloud Data Services initiative. Dell wants to provide a consistent public cloud-to-on-premises experience with PowerStore (file and block), PowerFlex (HCI block), PowerScale and/or ObjectScale (file and object).
At Dell Technologies World in May last year it had put its PowerProtect software into the AWS and Azure clouds in the form of PowerProtect Cyber Recovery and a preview showed Dell storage could feed data into Snowflake’s data warehouse. A demo showed PowerScale’s Isilon SW operating in AWS.
Later, in November that year Dell said its PowerFlex hyperconverged infrastructure system was available on the AWS marketplace as part of the APEX program. PowerFlex is like VxRail but doesn’t use the vSphere hypervisor. It scales out to thousands of nodes and originated as a ScaleIO upgrade and rebrand. PowerFlex provided block storage in AWS using Elastic Block Storage (EBS) or the EC2 Instance Store.
Now, a year later, Dell has announced that APEX Storage for Public Cloud, with block and file storage, is available in AWS and will debut in Azure later this year.
The announced Databricks partnership follows on from the earlier Snowflake deal and we have APEX Protection Storage for Public Cloud in existence as well.
APEX File Storage for AWS is based on PowerScale’s OneFS, the Isilon scale-out filesystem software, but APEX Block Storage for AWS and Azure is not based on block storage software from its PowerStore or PowerMax arrays. PowerStore’s operating system provides unified file and block storage and, like the PowerMax OS, is not a scale-out system.
Instead the scale-out PowerFlex system software is used to provide block storage. The public cloud scales out by adding instances and PowerFlex matches that capability. This means that PowerFlex customers will see a consistent environment across the on-premises, AWS and Azure public clouds, as will PowerScale customers across their on-premises and AWS cloud, but PowerStore block and file storage customers will not, nor will PowerMax customers. PowerFlex is being used as a kind of stopgap. We will have to wait longer for Dell’s mainstream PowerStore software to be available in the public cloud. NetApp still has its Data Fabric lead in this respect. Pure Storage also has its Cloud Block Store. HPE with its GreenLake storage has yet to match these moves.
Cloudy object, PowerProtect and Databricks
APEX Protection Storage for Public Cloud is PowerProtect DDVE (DataDomain Virtual Edition), building on what Dell showed a year ago. This has been ported to four public clouds, supporting AWS, Azure, Google Cloud and Alibaba Cloud. Dell says more than 17 exabytes of data is protected by its software in public clouds to date.
Will there be an APEX Object Storage for public cloud? A Dell spokesperson told us: “Yes. We plan to offer object storage for the public cloud in the future. Stay tuned.” We may then see Dell object storage SW in AWS and Azure built on top of S3 and Blobs respectively.
With the Databricks deal, a Databricks Lakehouse in the public cloud can use Dell ECS object storage, on-premises or in a co-lo, to analyze data in-place, store results and share it with third parties using Databricks’ Delta Sharing capabilities. Dell says it is collaborating with Databricks to jointly engineer additional integrations that will deliver a seamless experience for Dell object storage within the Databricks Lakehouse Platform. We dare say ChatPT-like large language models, like Databricks’ Dolly chatbot, may be considered as part of this.
On the second day of Dell Technologies World, the Texan tech titan has announced projects looking at chatbots, Zero Trust security, edge kit and more.
Dell wants enterprise customers to extend their use of its hardware, software, and services as they adopt AI chatbots, move deeper into zero trust security, and develop their edge site facilities at scale.
Project Helix
Project Helix is a joint offering with Nvidia that involves validated blueprints to build on-premises generative AI systems. These will use Dell and Nvidia hardware and software with the aim of improving enterprise search, market intelligence, customer services, and other activities with chatbot front ends.
Dell COO and vice chairman Jeff Clarke said: “Project Helix gives enterprises purpose-built AI models to more quickly and securely gain value from the immense amounts of data underused today.” Enterprise customers can, he said, use them to reinvent their industries whilst maintaining data privacy.
Dell and Nvidia product components include the PowerEdge XE9680 and R760xa servers, Nvidia H100 Tensor Core GPUs, networking and AI Enterprise software, Dell PowerScale arrays and ECS Enterprise Object Storage plus CloudIQ observability.
AI Enterprise comprises around 100 frameworks, pre-trained models and development tools like the NeMo large language model framework and NeMo Guardrails software for building secure chatbots.
Project Helix will support a generative AI project lifecycle, we’re told, from infrastructure provisioning, modeling, training, fine-tuning, application development and deployment, to deploying inference and streamlining results. The on-premises presence, Dell claims, reduces inherent risk and helps companies meet regulatory requirements.
The validated designs developed through Project Helix will be available in July through traditional channels and APEX consumption options.
Project Fort Zero
Rubrik and others have promoted Zero Trust tech for over a year. Fort Zero is Dell’s equivalent product and it is planning to deliver a validated solution within the next 12 months, building on its existing Zero Trust Center of Excellence.
Herb Kelsey, Federal CTO at Dell, said: “Zero Trust is designed for decentralized environments, but integrating it across hundreds of point products from dozens of vendors is complex – making it out of reach for most organizations. We’re helping global organizations solve today’s security challenges by easing integration and accelerating adoption of Zero Trust.”
Dell will use a private cloud approach and handle the technology integration and orchestration that it says typically falls to customers and involves several vendors. Dell expects Project Fort Zero to be used to secure datacenters, remote and branch office sites, mobile edge sites with transitory connectivity, road vehicles, planes and more.
A US government assessment team will evaluate the Project Fort Zero end-to-end offering for accreditation and certify compliance against the Department of Defense Zero Trust reference architecture. Customers can then build their own Zero Trust defenses using a validated Dell blueprint.
Upping the edge ante
Dell has announced five edge site offerings with a generic NativeEdge software framework and specific retail site warehouse automation and networking.
COO Clarke claimed: “As our customers look to fuel new workloads and AI at the edge, they are turning to Dell to find simpler and more effective ways to manage and secure their ecosystem of edge technologies and applications.”
NativeEdge is described as an edge operations software platform helping customers deploy and manage edge site hardware and software at scale, with zero touch deployment. The software is said to cover any enterprise edge use case and includes Dell products plus third-party ones as well in an open systems approach.
Dell claims a 25-site manufacturing organization could get a 130 percent ROI with a three-year NativeEdge deployment. It says customers across various industries could get similar returns.
The other four offerings are:
Dell Validated Design for Retail Edge with inVia Robotics intelligent automation. This design uses software and automation to help retail employees with last-mile picking, packing, shipping and delivery by converting existing warehouse and retail space into micro-fulfillment centers.
Dell Private Wireless with Airspan and Druid is a validated private wireless setup providing wireless connectivity for thousands of remote location edge technologies, like devices and sensors.
Enterprise SONiC Distribution by Dell Technologies 4.1 is a scalable, open source-based networking operating system on Dell switches for edge deployments, including User Container Support (UCS) and streaming telemetry.
Dell ProDeploy Flex is an edge-focused modular deployment service .
NativeEdge software will be available to customers, OEMs, and partners in 50 countries beginning August 2023. Dell Validated Design for Retail Edge will be available globally in August 2023. Dell Private Wireless with Airspan and Druid will be available globally in June 2023. Enterprise Sonic Distribution by Dell Technologies 4.1 is available globally from today. Dell ProDeploy Flex will be available globally in August 2023.
Don’t forget services
A Project Success Accelerator (PSX) for Backup service comes after the recent PSX for Cyber Recovery, which implements and operationalizes an isolated cyber recovery vault. Customers can use the PSX for Backup service to help them protect and recover data in the event of disruption.
It has three components:
Ready – includes planning workshops, configuration of a validated backup or vault environment, a success plan, a runbook and outcome-based skills training.
Optimize– adds quarterly assessments, improvement recommendations and assisted restore test simulations.
Operate– adds ongoing operational assistance to meet the solution’s performance objectives. Dell experts monitor and investigate alerts, initiate corrective actions and help with restore tasks at the customer’s direction.
Availability
Project Fort Zero’s offering will be available in the next 12 months. Dell Product Success Accelerator for Backup is now available in locations across North America, Europe, and Asia Pacific. Availability of Dell Product Success Accelerator for Cyber Recovery has expanded to locations in Europe, Latin America, and Asia Pacific in addition to North America.
SpectraLogic and Arcitecta are partnering to enter the high-performance scale-out NAS market with the former’s all-flash BlackPearl storage system and the latter’s MediaFlux software.
BlackPearl is SpectraLogic’s hybrid flash/disk, front-end cache and gateway that can be used to store files as objects on back-end tape devices. It also comes in all-flash BP-X form for faster response to file and object I/O requests. Spectra and Arcitecta have developed their partnership and produced two products: Arcitecta Mediaflux + Spectra BlackPearl NAS, and Arcitecta Mediaflux + Spectra BlackPearl Object Storage system.
SpectraLogic CTO Matt Starr said: “The combined solutions offer unprecedented high performance, scalability, security, efficiency and significant cost savings … Customers can cost-effectively manage massive data volumes in powerful new ways – a game-changer in the data storage industry.”
Jason Lohrey, Arcitecta CEO and founder, said: “We developed our joint solutions’ NFS and SMB protocols in-house to achieve the level of computing performance organizations require to process enormous amounts of data. Mediaflux plus BlackPearl NAS can support hundreds of billions of files in a single, easily managed global namespace. Multiple storage nodes work in parallel, allowing the system to handle vast I/O request volumes, increase throughput and reduce latency.”
The multiple nodes provide high availability with node failover. A distributed file system, with data stored across multiple nodes, allows data to be accessed from any node in the cluster, increasing data availability.
For the archiving config, replace the NFS and SMB tags with S3 ones
The combined offerings provide automatic tiering with data lifecycle management, easy administration, and on-premises S3 object storage with infinite scalability and easy search and access control. By storing data on-premises, the joint product reduces the latency associated with accessing data stored in the cloud. Infrequently accessed data can be stored on lower-cost media such as tape or object storage – while leaving active data on more expensive, high-performance flash storage. That should lower cost.
The joint products have ransomware resiliency and security with multiple layers of protection, such as intrusion detection and prevention, and real-time monitoring and response capabilities to detect and prevent unauthorized access or activity.
A SpectraLogic spokesperson tells us that the Arcitecta/BlackPearl combo is an “order of magnitude cheaper than Isilon (PowerScale)” and mentioned “about $30,000/PB.”
They added: “For BlackPearl a small scale NAS disk unit will be about $120–150 per terabyte with five years of service, or about $0.004 per gig per month. At scale, 30-plus PB, with a large portion going off to tape archive, [it will be] under $50 a terabyte with five years of service or $0.00071 per gig per month.”
The archiving is claimed to be more affordable than Amazon Glacier.
Comment
This will be a natural extension for existing Spectra and Arcitecta customers. How it will fare in a straight fight against Dell PowerScale and Qumulo remains to be seen. They have substantial scale-out NAS market presence and their existing customers are unlikely to look at an alternative unless there is a compelling reason – such as a substantially lower price or need for specific Arcitecta and/or Spectra features.
VAST Data also has skin in the game, but is probably aiming at a higher-scale customer. HPE may not be so picky, though. It’s going to be an interesting contest.
Arcitecta Mediaflux + Spectra BlackPearl NAS is available from channel partners as a turnkey offering. Get a NAS backgrounder for the combo here and an archive (object) backgrounder here.
DRAM virtualizer MemVerge has teamed up with SK hynix on what they’re calling Project Endless Memory to run applications needing more memory than is actually attached to the server by using external memory resources accessed by CXL.
This is the Computer Express Link, the extension of the PCIe bus outside a server’s chassis, based on the PCIe 5.0 standard. CXL v2.0 provides memory pooling with cache coherency between a CXL memory device and several accessing server or other host systems. SK hynix has built CXL-connected DRAM products and MemVerge provides a software bridge to use them in the so-called Endless Memory project.
Charles Fan, CEO and co-founder of MemVerge, said in a statement: “The ability to download RAM automatically and intelligently will lead to improved productivity for users of data-intensive applications.”
Endless Memory combines MemVerge elastic memory service software and CXL-based Niagara Pooled Memory System hardware from SK hynix enabling host servers to dynamically allocate memory as needed to avoid running out of DRAM. Basically, if an app needs more DRAM than is physically present, the server can use MemVerge’s software to request more to be made available across CXL.
Hoshik Kim, VP/Fellow for Memory Forest R&D at SK hynix, said: ”Testing shows that just 20 percent extra CXL memory from our Niagara Pooled Memory System can improve application performance by 3x compared with the existing swap memory approach.” The swap memory approach is basically paging between physical memory and some kind of storage.
SK hynix‘s Jonghoon Ohexplains Memory Forest
Memory Forest is an SK hynix concept. “Mem4EST” means memory for environment, society, and technology. Jonghoon Oh, EVP and CMO at SK hynix, described it like this: “Memory Forest represents how different layers of memory form a computing architecture, just like a forest, where everything from the bottom like the ground, grass to dense trees, fulfill their respective roles to form a huge ecosystem. Forests are generally, from bottom to top, divided into forest floor, understory, canopy layer, and emergent layer. These layers are naturally comparable to those of the computing architecture such as SSD (Solid State Drive), SCM (Storage-Class Memory), high-capacity memory (Capacity Memory), main memory and IPM (In Package Memory).”
Basically, SK hynix’s CXL expander memory hardware is included in this MemVerge initiative.
Project Gismo
Gismo stands for Global IO-free Shared Memory Objects and is a CXL-based multi-server shared memory architecture for data access and collaboration in distributed environments. It enables real-time data sharing, using memory, across multiple servers, eliminating the need for network IO and so reducing data transfer delays.
B&F diagram
Fan said: “Network and storage IO are performance bottlenecks for distributed, data-intensive applications. Project Gismo leverages the emerging CXL platform and will effectively remove this bottleneck known as the IO Wall. We envision a memory-centric future where memory is the network and memory is the storage.”
He suggests example use cases could be AI/ML applications, next-generation databases, and financial trading platforms.
One of the early adopters of Project Gismo is Timeplus, which is developing a real-time streaming database. Timeplus CEO Ting Wang said: “With Gismo’s revolutionary CXL-based multi-server shared memory architecture, we have experienced a remarkable improvement in the fault-tolerance of our database system. The speed of streaming query fail-over has been accelerated by an impressive 20x, allowing us to provide unparalleled user experience for the continuation of data streaming processing.”
MemVerge did not provide a timescale for Gismo completion.
Demonstrations of Endless Memory are being presented at the International Supercomputing Conference (ISC) in MemVerge’s booth, D401.
Cohesity has unveiled a boosted Google Cloud partnership for large language model (LLM) access, added to its own AI technologies to improve LLM execution, and detailed an expanded Data Security Alliance.
The three-pronged announcement came at Cohesity’s three-day Catalyst virtual conference. The Google Cloud Platform (GCP) element is focused on using generative AI to investigate an organization’s entire data contents better by integrating Cohesity’s Data Cloud offering with GCP’s Vertex AI.
Sanjay Poonen
Cohesity CEO and president Sanjay Poonen, said: “To apply generative AI transformatively, businesses need to be able to easily get rapid insights from their data utilizing cutting-edge and leading AI/ML models.”
Google, which announced its Palm 2 LLM at Google IO earlier this month, can help supply them.
Google Cloud CEO Thomas Kurian said: “Vertex AI is one of the best platforms for building, deploying, managing and scaling ML models – and we’re excited that Cohesity is joining our growing open ecosystem to help more customers get value from their data via AI. Cohesity’s excellent data security and management capabilities, combined with Google Cloud’s powerful generative AI and analytics capabilities, will help customers get exceptional insights into their backup and archived data.”
GCP’s Vertex AI Workbench is a set of services for creating AI/ML models. It’s a platform or framework for data scientists to deploy machine learning models, ingest data, and analyze the results through a dashboard. Users with different AI/ML experience levels have access to a common toolset across data analytics, data science, and machine learning.
Cohesity and Google reckon that their joint customers will be able to get “human readable” insights into the data they’re securing and managing on Cohesity’s platform. Poonen claimed Cohesity “provides phenomenal search via our built-in indexing capabilities,” and has robust security protocols to keep the data private. Chatbots will provide a simpler way to search through Cohesity content.
Cohesity Turing and RAG
Turing is Cohesity’s own set of of non-generative AI/ML capabilities and technologies that are integrated into its Data Cloud. They include:
Ransomware anomaly detection: Uses modeling and data entropy detection to “see” anomalies in data ingested, which can provide early warnings of a hidden threat.
Threat intelligence: Provides curated and managed threat feeds used in conjunction with machine learning models to detect threats.
Data classification: Helps ensure that organizations can identify their most sensitive data and its location.
Predictive capacity planning: Forecasting capacity utilization based on previous capacity utilization and including a what-if simulator.
Cohesity is adding Retrieval Augmented Generation (RAG) AI model workflows to Turing, which it says can help customers get deeper insights and discovery from data or find content in petabytes of data faster. It has filed a patent application in this area. A Cohesity blog says RAG enables LLMs to generate more knowledgeable, diverse, and relevant responses and is a more efficient approach to fine-tuning such models.
Cohesity is not developing its own LLM. Instead it wants to make LLMs more efficient when looking into Cohesity datasets. Think faster, better, super-charged search. Poonen has talked to Microsoft about this. A video helps set the RAG scene:
The blog says: “The retrieval-augmented response generation platform under development by Cohesity accepts a user or machine driven input – such as a question, or a query. That input is then tokenized with some keywords extracted that are used to filter the petabytes of an enterprise’s backup data to filter down to a smaller subset of data. It then selects representations from within those documents or objects that are most relevant to the user or machine query. That result is packaged, along with the original query, to the Language Model (such as GPT4) to provide a context-aware answer. This innovative approach ensures that the generated responses are not only knowledgeable but also diverse and relevant to the enterprise’s domain-specific content.”
Specifically: “By using RAG on top of an enterprise’s own dataset, a customer will not need to perform costly fine-tuning or initial training to teach the Language Models ‘what’ to say. … Leveraging RAG always provides the most recent and relevant context to any query.”
Cohesity RAG should be available in the near-future but no specific date was given.
ServiceNow: ServiceNow Security Operations provides closed-loop detection and response for ransomware attacks via SOAR (Security Orchestration, Automation, and Response) integration to create workflows through their IT service management (ITSM) offering. Customers can access and address threats based on the potential impact to the business.
Tenable: The updated Tenable integration provides improved scalability so that snapshots can be scanned rapidly and improved vulnerability scanning used proactively, as part of cyber resilience best practices. Tenable powers Cohesity’s CyberScan capability.
Kioxia has replaced its gumstick client SSD with a new model with twice the capacity and double the random write speed of its predecessor.
The BG5 with its 1TB maximum capacity and PCIe gen 4 interface has been replaced by the BG6 with the same interface but a 2TB capacity. Like the BG5 it is an M.2 format drive in either 2230 or single-sided 2280 format and uses TLC (3 bits per cell) flash with a host-managed DRAM buffer. The BG5 used 112-layer 3D NAND, Kioxia’s BiCS 5 generation, but the new BG6 uses 162-layer BiCS 6 flash.
Neville Ichhaporia, Senior Veep and GM of Kioxia America’s SSD business unit, said: “Our new Kioxia BG6 SSDs deliver increased performance and density in a small footprint, making them well-suited to today’s ‘work and play from anywhere’ lifestyle.”
The BG6 has across the board performance improvements compared to the BG5, both random and sequential:
The random write IOPS have doubled from the BG5’s 450,00 to the BG6’s 900,000. Other speeds have increased as well but not by so much. The 2TB version goes faster than the 1TB version and we can expect the 250GB and 512GB drives to be slower again when they become available.
Kioxia chart
Kioxia has not supplied any endurance numbers for the BG6. We asked for them and a spokesperson said Kioxia “will release the full specification in July”. It says the new drive has:
Support for the latest TCG Pyrite and Opal standards.
Power Loss Notification signal support to protect data against forced shut downs.
Sideband signal (PERST#, CLKREQ# and PLN#) support for both 1.8V and 3.3V.
Supports platform FW recovery feature.
Support for the NVMe technology 1.4c feature set and basic management command over System Management Bus (SMBus), enabling tighter thermal management.
Having the drive use a DRAM buffer in the host makes it cheaper to manufacture. The BG6 does not have 256GB and 512GB capacities available yet, as they are still under development. All-in-all the BG6 looks a straightforward better drive than the BG5, thanks to the extra layers in the NAND and, no doubt, updated firmware in the controller.
Kioxia hasn’t announced pricing information yet but will have the BG6 series hardware on display at Dell Technologies World in Las Vegas this week. The series starts sampling in the second half of 2023, when OEM customers can give it the once-over.
Bootnote. This writer’s MacBook Air with its 256GB flash drive looks positively anaemic compared to a 2TB M.2 SSD. Having 2TB available would be great. And if Kioxia populated the other side of its M.2 2280 variant we could envisage a 4TB drive. That’s serious storage for a notebook.
Data protector Acronis announced general availability of Acronis Advanced Security + Endpoint Detection & Response (EDR) for Acronis Cyber Protect Cloud, with new capabilities such as AI-based attack analysis. EDR is designed for MSPs.
…
Open source data integration platform supplier Airbyte has launched a no-code connector builder that makes it possible to create new connectors for data integrations. The builder enables non-engineers such as data analysts to create an extract, load, transform (ELT) connector within five minutes – a process that traditionally could take more than a week.
…
Microsoft is publicly previewing Azure Container Storage. It provides a consistent experience across different types of storage offerings, including Managed options (backed by Azure Elastic SAN), Azure Disks, and ephemeral disk on container services. You can create and manage block storage volumes for production-scale stateful container applications and run them on Kubernetes. It’s optimized to enhance the performance of stateful workloads on Azure Kubernetes Service (AKS) clusters by accelerating the deployment of stateful containers with persistent volumes and improving quality with reduced pod failover time through fast attach/detach. Details in a blog.
…
CData announced that CData Sync is available on SAP Store and allows CData to deliver data integration to the data sources and SAP databases that organizations use.
Brantley Coile
…
Brantley Coile, the founder and CEO of Etherdrive SAN company Coraid, has shaved off his beard. He says he’s reinventing Coraid, now The Brantley Coile Company, as well.
…
External data integrator Crux has announced its Crux External Data Platform (“EDP”) SaaS offering to automate the onboarding of any external dataset directly from vendors into a customer’s store. The self-service cloud platform allows data teams to onboard and transform external data for analytics use up to 10 times faster than traditional manual methods. External data from governments, non-profits, and commercial data vendors is a critical business resource in many sectors such as finance, supply chain, retail, healthcare, and insurance. Crux has partnerships with over 265 leading data providers including MSCI, Moody’s, S&P, SIX, FactSet, and Morningstar.
…
File migrator and manager Datadobi has a blog about not forgetting stale data in WORM storage. It will need deleting eventually. Read the blog here.
…
data.world has announced the introduction of the data.world Data Catalog Platform with generative AI-powered capabilities for improving data discovery. data.world is the industry’s most popular data catalog with more than two million users, including enterprise customers with tens of thousands of active users.
…
Flipside has said its Flipside Shares offering is available on Snowflake Marketplace and provides joint customers with access to modeled and curated blockchain data sets, without the hassle of managing nodes, complex data pipelines, or costly data storage. Flipside provides access to the greatest number of blockchains and protocols in Web3, including Ethereum, Solana, Flow, Near, Axelar, and more than a dozen others.
…
IBM has acquired Israeli company Polar Security whose agentless product allows customers to discover, continuously monitor and secure cloud and software-as-a-service (SaaS) application data, and addresses a shadow data problem. Polar Security is a pioneer of data security posture management (DSPM) – an emerging cybersecurity segment that reveals where sensitive data is stored, who has access to it, how it’s used, and identifies vulnerabilities with the underlying security posture, including with policies, configurations, or data usage.
…
GPU-powered RAID card startup GRAID has signed up Trenton Systems as a partner. US-based Trenton Systems is a designer and manufacturer of ruggedized, cybersecure, made-in-USA computer systems for defense, aerospace, test and measurement, industrial automation, and other major industries.
…
Data manager and lifecycle organizer Komprise has new governance and self-service capabilities that simplify departmental use of its Deep Analytics – a query-based way to find and tag file and object data across hybrid cloud storage silos. It’s providing share-based access for groups, a new directory explorer, and exclusion query filters in file index search. Komprise says its latest release makes it dramatically easier for teams to find and manage their own data, while simplifying governance for IT.
…
Micron plans to install a 1gamma DRAM manufacturing line at its fab in Hiroshima, Japan, according to a Nikkei Asia report. This is part of an up to $3.6 billion (500 billion yen) investment program in Japan and will involve extreme ultraviolet (EUV) lithography, which will also be used in its Taiwan DRAM fab.
…
N-able CEO John Pagliuca has signed the CEO Action for Diversity & Inclusion Pledge, reinforcing company support and commitment towards its Diversity, Equality, and Belonging philosophy. CEO Action for Diversity & Inclusion was founded in 2017 and is the largest CEO-driven business commitment to advance diversity and inclusion in the workplace with more than 2,400 CEOs having pledged to create more inclusive cultures.
…
Pure Storage says Virgin Media O2, one of the UK’s largest entertainment and telecommunications operators, is a customer, using its portfolio – including FlashArray//X and Evergreen//Forever – to support its 47 million connections.
…
Security-focused data protector Rubrik has added user intelligence capabilities that utilize time series data recorded over consistent intervals in Rubrik Security Cloud to proactively mitigate cyber risks before they can be exploited. Customers will have visibility of the types of sensitive data they have, which users have access to the data, how that access has changed over time, and whether that access may pose any risk to their business.
…
Samsung has announced development of the industry’s first 128-gigabyte DRAM to support Compute Express Link (CXL) 2.0. It worked with Intel and its Xeon CPU to do so. The new CXL DRAM supports PCle 5.0 interface (x8 lanes) and provides bandwidth of up to 35GBps. CXL 2.0 supports memory pooling – a memory management technique that binds multiple CXL memory blocks on a server platform to form a pool, and enables hosts to dynamically allocate memory from the pool as needed. It will mass produce the product later this year.
…
SingleStore has launched SingleStore Kai for MongoDB, an API that turbocharges (100-1,000x) real-time analytics on JSON and vector-based similarity searches for MongoDB-based AI applications, without the need for any query changes or data transformations. The new API is MongoDB wire protocol compatible, and enables developers to power interactive applications with analytics with SingleStoreDB using the same MongoDB commands. It is available at no extra cost and is open for public preview as part of the SingleStoreDB Cloud offering. SingleStore is also introducing replication (in private preview) that can replicate MongoDB collections into SingleStoreDB.
…
The Storage Management Initiative’s SNIA Swordfish v1.2.5 is out for public review. This new bundle provides a unified approach to manage storage and servers in hyperscale and cloud infrastructure environments, making it easier for IT administrators to integrate scalable solutions into their datacenters. This new version provides:
Expanded support for profile and mapping in the Swordfish NVMe Model Overview and Mapping Guide
New use cases and section to the Swordfish Scalable Storage Management API User’s Guide
Functionality enhancements supporting both traditional and NVMe and NVMe-oF storage
…
The Information reports that Snowflake wants to acquire search engine startup Neeva to help its customers search documents stored in Snowflake’s data warehouse. Neeva has added a large language model front end and has a vector database. Neeva seems to have shopped itself to Databricks as well.
…
StorMagic says German manufacturers Witholz GmbH and WST Präzisionstechnik have deployed StorMagic SvSAN to simplify their IT environments and reduce hardware requirements, resulting in maximum operational efficiency, high availability of data and 100 percent uptime at a lower cost.
…
Veritas has updated its Veritas Partner Force program for FY 2024. It is incentivized with improved rewards for cloud-based deals, a simplified transaction process, and new training and accreditation programs. It will also support Veritas in delivering growth on the Veritas Alta secure cloud data management platform, continuing to modernize routes to market by improving available resources to two-tier channel and managed service providers.
…
Veza says its Veza Authorization Platform is available on the Snowflake Data Cloud. Joint customers can manage access permissions and secure their sensitive data at scale. Veza’s Authorization Platform provides companies with visibility into access permissions across all enterprise systems, enabling customers to achieve least privilege for all identities, human and non-human, including service accounts.
…
Scale-out, parallel filesystem supplier WEKA has unveiled v4.2 of its Data Platform with advanced data reduction and a container storage interface (CSI) plug-in for stateful containerized workloads that can lower storage and operational costs. It also offers significant performance improvements in the cloud (6x over alternatives in Azure), providing the scale and application data protection needed to support thousands of containers for cloud-native artificial intelligence (AI) and machine learning (ML). The advanced block-variable differential compression combined with cluster-wide data deduplication delivers data reduction at scale for an estimated cost saving of up to 6x for AI/ML training models, 3-8x for exploratory data analysis, and up to 2x for bioinformatic or large-scale media and entertainment workloads like visual effects.
…
HPE’s Zerto business unit has announced real-time encryption detection mechanism and air-gapped recovery vault features as part of Zerto 10, which includes monitoring for encryption-based anomalies. This capability monitors and reports on encryption as data streams in and can detect anomalous activity within minutes to alert users of suspicious activity. It can provide early warning of a potential ransomware attack – unlike backups, which can be up to a day old – and help pinpoint when an attack was initiated, so data can be recovered to a point seconds before it began. The Cyber Resilience Vault provides the ultimate layer of protection allowing for clean copy recovery from an air-gapped setup if a replication target is also breached.
…
Zerto also announced the launch of Zerto 10 for Azure, delivering disaster recovery and mobility. It delivers a new replication architecture for scale-out efficiency and native protection of Azure Virtual Machines with support for multi-disk consistency for VMs in Azure. It’s available in the Azure Marketplace.
DDN has unveiled an upgraded AI400 X2 ExaScaler array for AI and machine learning storage workloads that uses QLC SSDs and adds a compression facility for higher capacity.
ExaScaler is DDN’s Lustre-based scaleout and parallel file system software. QLC SSDs have a 4bits/cell format enabling the die to hold more data than a TLC (3bits/cell) arrangement at the cost of slower IO speed and shorter endurance. DDN says its new compression feature has been optimised for HPC and AI workloads.
Kurt Kuckein, DDN’s marketing VP, told us: “Over the last four or five years, we’ve seen this uptake in enterprise customer interest around DDN system, specifically driven by these AI algorithms. And that has really taken off this year with the broad interest in generative AI, ChatGPT and others have really driven interest in our AI solutions, especially in conjunction with the Nvidia SuperPOD systems.”
Senior Veep of Products James Coomer tells us that DDN has about 48 AI400X2 arrays supporting Nvidia’s largest SuperPODs: “All the other SuperPODs in the world, the vast majority of them, are running just multiples of the same unit.”
Adding QLC flash and compression to the AI400 X2 array “delivers both the best performance, as well as really good flash capacity for customers.” This system can provide 10x more data than competing systems and use a fraction of their electrical energy, he claimed. It uses 60TB QLC drives, enabling 1.45PB capacity in a 2RU x 24-slot chassis, doubling capacity per watt compared to the 30TB SSDs available from other suppliers.
The AI400X2 QLC uses a standard AI400X2 controller (storage compute node), in a 2RU chassis. It has 732TB of TLC SSD storage and a multi-core, real-time RAID engine and controller combo that can pump out 3.5 million IOPS and 95GBps. This can have two, four or five SP2420 QLC SSD expansion trays added to it, linked across NVMe/oF and Ethernet. Each tray holds up to 2.9PB of raw QLC capacity. That’s 2.3PB usable which, after compression, becomes 4.7PB effective.
The maximum effective QLC capacity is 11.7PB in a fully configured system. DDN claims the new array provides an up to 80 percent cost saving versus a comparable capacity TLC array, and enables apps to run faster than an NFS array. The added client-side compression uses array CPU cycles but, because the resulting dataset is smaller, the overall read and write performance is about the same.
DDN says its QLC version of the AI400X2 has a better price per flash TB than its existing TLC version which delivers better IOPS, up to 70 million, and throughput per rack. A hybrid TLC flash/disk system offers an even lower price per TB. It says it can meet datacenter AI storage budgets at three levels: either optimized for sheer performance, for price/performance, or for lower costs.
It is generally thought that the larger the model dataset used for training machine learning models, the better the result. That would encourage use of DDN’s AI400X2 QLC array. DDN also sees possibilities for it in other application areas, such as realistic 3D and immersive universes in gaming, protein and molecule creations for drug discovery, and autonomous driving.
DDN says its AI400X2 QLC system design does not need the internal switches and networks used by scale-out NAS systems, which are based around a controller chassis talking to flash JBOFs through a switched network. That helps lower its rackspace occupancy, cost and management complexity.
Coomer said: “Today’s QLC scale-out NAS systems offer low cost and high capacity, but they are extremely inefficient with IOPS, throughput and latencies, making them unusable for high-performance environments such as AI, machine learning, and real-time applications.”
Given that VAST Data has just announced SuperPOD certification for its scale-out NAS system, and says that parallel file systems are complex compared to NAS, we are going to see the two competing for the same customers. We could see customers new to AI model training who currently use NAS and not a parallel file system, go with VAST, in preference to DDN. Existing parallel file system users could possibly find that the AI400X2 QLC slides more easily into their workflows than a NAS-based system.
The 60TB drives make a huge capacity increase possible over 30TB drives and we know Solidigm has 60TB QLC SSDs coming. Kioxia and other NAND fabricators/SSD suppliers are bound to follow suit but maybe not Micron – it’s plugging away at higher capacity TLC drives built with 232-layer technology.
DDN will ship its AI400X2 QLC systems in the June-August period.
There is quite a lot of tech terminology that’s specific to the storage industry, and if you’re like us, you might even have jogged your memory with prior articles in the Blocks & Files site to discover what the definition of terms like B-Tree, LSM-Tree or yobibyte mean when you come across them.
But that’s not the easiest way to get to the bottom of the problem in a hurry when all you want is a quick and concise description of what a tech term means. It’s not a new problem. Businesses like Gartner, HPE, IBM and Kioxia each have glossary mini-sites to fix the same problem – generally using an index page with links to individual entries which explain what a term means.
And who better to create a storage news glossary mini-site than Blocks & Files? We’re obsessed with the tech. So we’ve cooked up an index listing all the storage tech terms we could initially think of, just over 300 so far. Here’s a glimpse of what that looks like:
Each term listed is URL-linked to an explainer definition and, where we think it’s needed, a mini-article. There is a link to the glossary mini-site on the Blocks & Files homepage to provide a single point of entry:
We tried to balance the economy of a mere definition with a bit of an explanation where it seemed to be a good idea. The SR-IOV entry, for example, says SR-IOV stands for Single Root I/O Virtualization and then provides a short explanation of what this means as well.
Getting the balance right here is difficult. Our glossary is not an encyclopedia, nor does it try to emulate Wikipedia, nor be as comprehensive as HPE. It’s intended to be a reasonably quick mini-reference. Do contact us if you think something can be improved, is wrong or missing. There’s a contact webpage you can use or you can message me on Twitter at @Chris_Mellor or drop me a line on LinkedIn where I can be found as Chris Mellor.
So what is a yobibyte? It’s 1,024 zebibytes with a link to a Decimal and Binary Prefix entry to explain what that is. Exabytes, exbibytes and other binary and decimal prefixes are all covered there.
Open source Velero has recorded more than 100 million Docker pulls – making it among the most popular Kubernetes app backup software, says CloudCasa, which supports it.
CloudCasa is the Kubernetes backup business of Catalogic that’s heading for a spin-out. Its cloud-native product integrates with Kubernetes engines on AWS, Azure, and GCP, and can see all the K8s clusters running through these engines. Velero provides snapshot-based backup for Kubernetes stateful containers and can run in a cloud provider or on-premises. CloudCasa supports Velero and provides a paid-for support package.
COO Sathya Sankaran told us the Velero stats imply “at least a million clusters are downloading.”
He says he had a conversation with a VMware product manager at KubeCon who told him that “they estimate about one-third of all Kubernetes clusters have been touched by Velero and at some point have had Velero installed and running … it’s a very substantial market presence.”
Sankaran added: “This is already a community ecosystem, driven very strongly by what the rest of the community thinks is good or bad.
We have asked business rivals Veeam, Pure, and Trilio what they think.
Sankaran says CloudCasa for Velero is the only Kubernetes Backup-as-a-Service offering with integration across multiple public clouds and portability between them. It offers a swathe of extra features over the base Velero provision (think Red Hat and Linux.)
Sankaran says: “Velero wants to become the Kubernetes backup standard. The commercial backup products are pre-packaged… Velero wants to be a plug-in engine, useable by new storage products as well as the historic incumbents.”
The exec’s hope is that CloudCasa can overtake rivals Kasten, Portworx, and Trilio by riding what he sees as a wave of Velero adoption, particularly in the enterprise, by offering them a multi-cluster and anti-lock-in offering. K8s app protection is different from traditional backup, says Sankaran, who claims layering it on to legacy backup is the wrong approach.
Whether it’s wrong will be decided by the market, by whether enterprises agree they need special (Velero-based) protection for their K8s apps or such protection provided by their incumbent data protection supplier.
VAST Data says its all-QLC flash file storage has been certified as an Nvidia SuperPOD data store.
Nvidia’s SuperPOD houses 20 to 140 DGX A100 AI-focused GPU servers and uses its InfiniBand HDR (200Gbps) network connect. The DGX A100 features eight A100 Tensor Core GPUs, 640GB of GPU memory and dual AMD Rome 7742 CPUs in a 6RU box. It also supports BlueField-2 DPUs to accelerate IO. The box provides up to 5 petaFLOPS of AI performance, meaning 100 petaFLOPS in a SuperPOD with 20 of them.
18-rack SuperPOD
VAST CEO and co-founder Renen Hallak said: “VAST’s alliance and growing momentum with Nvidia to help customers solve their greatest AI challenges takes another big step forward today … The VAST data platform brings to market a turnkey AI datacenter solution that is enabling the future of AI.”
The VAST pitch is that its Universal Storage system brings to market the first enterprise network attached storage (NAS) system approved to support the Nvidia DGX SuperPOD.
VAST Data co-founder and CMO Jeff Denworth told us: “For years customers have not had an enterprise option for these large systems, since the AI system vendors need to adhere to a very limited set of offerings. Many were burned by other NFS platforms in the past.”
A VAST statement said: “AI and HPC workloads are no longer just for academia and research, but these are permeating every industry and the enterprise players that own and manage their own proprietary AI technologies are going to be differentiated going forward. Historically, customers building out their supercomputing infrastructure have had to make a choice around performance, capabilities, scale and simplicity.”
The company reckons its storage system provides all four attributes, and says: “We have already sold multiple SuperPODs with more in the pipeline so the market is validating/recognizing this as well.”
The Nvidia-VAST relationship dates back to 2016, VAST says, with original development of its disaggregated, shared-everything (DASE) architecture. VAST supports Nvidia’s GPUDirect storage access protocol and also its BlueField DPUs. VAST’s Ceres data enclosure includes four BlueField DPUs.
DDN has a combined SuperPOD and Lustre-based A3I storage system. Previously, NetApp has certified its E-Series hardware running ThinkParQ’s BeeGFS parallel file system with Nvidia’s SuperPOD. Neither of these are enterprise NAS systems.
Back in 2020, NetApp provided a reference architecture for twinning ONTAP AI with Nvidia’s DGX A100 systems for AI and machine learning workloads. ONTAP is an enterprise NAS operating system as well as a block and object access system. Surely it must be possible to get an all-flash ONTAP system certified as a SuperPOD data store, unless ONTAP’s scalability limit of 24 clustered NAS nodes (12 HA pairs), meaning a 702.7PB maximum effective capacity with the high-end A900, proves to be a blocking restriction.