Home Blog Page 157

Opinion: Online disk archives are just wrong

A question: what’s the difference between nearline disk storage and an active archive system only using disk drives? The answer is none.

The Cambridge Dictionary defines the word archive thus: “A computer file used to store electronic information or documents that you no longer need to use regularly.”

In that case it no longer needs to be stored on disk drives offering continuous access.

Active Archive Alliance

The Active Archive Alliance (AAA) organization definition of an active archive says: “Active archives enable reliable, online and cost-effective access to data throughout its life and are compatible with flash, disk, tape, or cloud as well as file, block or object storage systems. They help move data to the appropriate storage tiers to minimize cost while maintaining ease of user accessibility… Creating an active archive is a way to offload Tier 1 storage and free up valuable space on expensive primary storage and still store all of an organization’s data online.”

In other words, an active archive covers non-primary data, meaning secondary (nearline) and tertiary (offline) data with no mention of online media being restricted to caching. The AAA is saying you can have an online archive.

Its version of the four-tier storage model omits the media types from all the tiers, and contains a deep archive sub-class:

Active Archive Alliance 4-tier storage model as shown in the Storage Newsletter
Active Archive Alliance 4-tier storage model as shown in the Storage Newsletter

This opens the door to online archives, such as products from disk drive maker and AAA member/sponsor Seagate.

Seagate Enterprise Archive Systems

Seagate describes enterprise data archives as “storage systems or platforms for storing organizational data that are rarely used or accessed, but are nevertheless important. This may include financial records, internal communications, blueprints, designs, memos, meeting notes, customer information, and other files that the organization may need later.”

The “early enterprise data archives were mostly paper records kept in designated storage units… More recently, organizations are moving their data archives to cloud-based solutions. Cloud-based solutions make data archives more accessible and reduce the associated costs.” 

Cloud-based solutions include on-premises object storage disk-based systems using Cloudian, Scality or other object storage software and Seagate Exos disk drive enclosures or its Lyve Cloud system of a managed disk array service.

There is no concept of the disks caching data in front of a library of offline tape or optical disk cartridges here. Analyst Fred Moore of Horrison Information Strategies has a different view.

Horrison view of archive

He explains what an archive means in a “Building the Archive of the Future” paper sponsored by Quantum. Unlike backup, which is making copies of data so that the copy can be restored if the original is lost or damaged, an archive is a version of the original data from which parts can be retrieved, not restored.

This definition, with the restore vs retrieval keystone, is the one used by W Curtis Preston in his Modern Data Protection book published in 2021.

Modern Data Protection by W Curtis Preston talks about archive storage

The moving of data to archival storage frees up capacity on the primary storage location and takes advantage of cheaper and higher-capacity long-term storage with slower access times, such as tape or optical disk. Moore says there are two kinds of archive; an active archive composed from offline tape and online disk drives, and a longer-term or deep archive composed just from offline storage.

An archive is defined as well by its use of specific software; object storage software that scales out and geo-spreads unstructured and object data to manage and protect archival storage needs. It includes smart data movers, data classification and metadata capabilities.

Moore says: “A commonly stated objective for many data center managers today is that ‘if data isn’t used, it shouldn’t consume energy.‘” This clearly places tape as the greenest storage solution available. He suggests: “Between 60 and 80 percent of all data is archival and much of it is stored in the wrong place, on HDDs and totals 4.5-6ZB of stored archival data by 2025 making archive the largest classification category.” Note that thought: “Stored in the wrong place… on HDDs.”

Archive is mentioned in Fred Moore’s four-tier storage diagram
Fred Moore’s four-tier storage diagram

His point is clear: disk storage is the wrong medium for an archive. What role then does disk play in the active archive tier? Moore says: “An active archive implementation provides faster access to archival data by using HDDs or SSDs as a cache front-end for a robotic tape library. The larger the archive becomes, the more benefit an active archive provides.”

In Moore’s view, online media, disk or NAND, is a cache in front of a tape library, not a storage archive tier in its own right. That’s quite different from the Active Archive Alliance viewpoint.

Online archives and nearline storage

The AAA’s active archive definition is confusing as it includes both online and offline media. For Moore, an archive is inherently offline.

An archive in the traditional sense should not include storage systems using constantly moving media, such as disk or tape; it uses too much electricity and archive data access needs generally don’t require continuously available access. An archive should be based on offline media only, with a front-end online cache for active archives.

To my mind there needs to be a strong distinction between offline and online archive media because the energy consumption and access characteristics are so different. Letting online disk into the same category of system as offline media is like letting a carbon-emitting fox into an environmentally green hen house. Calling disk-based storage an active archive systems is a misnomer. They should be regarded as nearline object storage systems.

Some Active Archive members appear to agree. In an August 2022 blog, IBM’s Shawn Brume, Tape Evangelist and Strategist, said: “In a study conducted by IBM in 2022 that utilized publicly available data, a comparison of large-scale digital data storage deployments demonstrated that a large scale 10 petabyte Open Compute Project (OCP) Bryce Canyon HDD storage had 5.1 times greater CO2e impact than a comparable enterprise tape storage solution.” 

Brume blog graphic on archive storage
Brume blog graphic. Tape is far more environmentally friendly than disk

“This was based on a ten-year data retention lifecycle using modern storage methodologies. The energy consumption of HDD over the life cycle along with the need to refresh the entire environment at Year 5 drives a significant portion of CO2 emissions. While the embedded carbon footprint is 93 percent lower with tape infrastructure compared to the HDD infrastructure.”

Brume goes on to include the AAA’s four-class tiered storage diagram in his blog, which distinguishes between active archives and archives, which have the deep archive sub-class.

Seagate and spin-down

You could theoretically have a disk-based archive system if it used spin-down disks. This was tried by Copan with its MAID (Massive Array of Idle Disks) design back in the 2002-2009 period, and revisited by SGI in 2010. It’s not been successful, though.

Disk drive manufacturer Seagate actually produces spin-down disk systems. Its Lyve Mobile array is  “portable, rackable solution easily integrates into any data management workflow. Get versatile, high-capacity and high-performance data transfers. With industry-standard AES 256-bit hardware encryption and key management in a rugged, lockable transport case.” The disk drives are not spinning when the transport case is being transported.

In theory then, it could develop a spin-down Exos or Corvault disk enclosure and then its attempts to present itself as lowering the lifetime carbon emissions of its products would have a stronger substance.

Linbit pushes DRBD-based software as SAN replacement

Savvy Linux users can provide SAN services to physical, virtual and containerized applications without buying a SAN product, by using LinBit open-source software based on DRBD, which is included in the Linux distribution.

LINBIT provides support for DRBD in a similar way to how Red Hat supports its Linux distro and has four downloadable products on its website: a DRBD Linux kernel driver, DRDB Windows driver, LINSTOR cluster-wide volume manager and, in tech preview form,  LINBIT VSAN for VMware. The core product is DRDB, a Distributed Replicated Block Device for Linux. 

Philipp Reisner

Unassuming LINBIT founder and CEO Philipp Reisner describes the Vienna-based company, with its €4.5 million annual revenues, as “a small money machine.” 

He was talking to an IT Press Tour in Lisbon, describing his products and their place in the Linux world.

The company is privately owned and not VC-funded. Reisner started up the company in the early Linux open-source days. The company has 35 to 40 full time employees in Europe, mostly in Vienna, with 30 other employees in the US. It works with SIOS in the Japanese market.

Its software is based on Linux kernel technology which Reisner devised as part of his 2001 diploma thesis at Vienna’s Technical University. That has become DRBD and it has been included in the Linux kernel since version 2.6.33 (2009), been deployed on all major Linux distributions and is hardware and software agnostic.

The basic idea seems simple now, looking back 21 years later. Everything happens in the kernel data path, reducing the number of context switches and minimizing block IO latency.

A primary compute node with local storage issues a block write to that storage. A copy of the write is sent (replicated) to a connected secondary node, written there as well, and an ack sent back to the primary; data safely replicated, job done.

Linbit supports DRBD primary and secondary consistency groups with replication between them. There can be up to 32 replicas and each may be asynchronous or synchronous. All participating machines – there can be from 3 to 1,000s of participating nodes – have full replicas.

Linbit DRBD diagram

Linbit has developed DRBD to software-defined storage (SDS) including persistent volumes for containers, and says DRBD can be used for transaction processing with Oracle, PostgreSQL and MariaDB, virtualization with OpenStack, OpenNebula and others, and for analytics processing in data warehouses and other read-intensive workloads. Linbit SDS customers include Intel, Porsche, Cisco, Barkman, IBM, Sphinx, Kapsch, BDO Bank.

It recently added support for persistent memory and nonvolatile DIMM metadata and improved fine-grain locking for parallel workloads. More performance optimizations are on its roadmap as is a production release of WinDRDB – a port of the Linux DRBD code for highly available replicated disk drives to Windows. The V1 non-production release took place in the first 2021 quarter.

Other roadmap items are support for the public cloud with storage drivers for AWS EBS, Azure disks and Google Persistent Disks.

The world of Linbit is a universe away from the aggressive code-wrangling characteristic of Silicon Valley startups. It is a tribute to a single-minded concentration on a core Linux technology and a complete opposite to VC-funded, money-focused, growth aggression.

Object storage: Coldago moves Quantum up and Cloudian down in the rankings

Research firm Coldago has released a 2022 object storage map, moving Quantum into the Leader’s area from last year’s Challenger’s ranking, and giving Cloudian a lower Execution and Capabilities score than in 2021.

The Coldago Research Map 2022 for Object Storage lists and ranks 15 vendors playing in the object storage market segment. They are, in alphabetic order: Cloudian, Cohesity, Commvault, DataCore, Dell, Hitachi Vantara, IBM, MinIO, NetApp, Pure Storage, Quantum, Scality, Spectra Logic, VAST Data and WEKA. There are seven leaders: Cloudian, Hitachi Vantara, IBM, MinIO, NetApp, Pure Storage and Quantum, in alphabetic order. 

DDN, Huawei and Fujitsu, which all appeared in Coldago’s 2021 object storage map, disappeared from the 2022 supplier rankings while WEKA is a new entrant.

We asked Coldago analyst Philippe Nicolas questions about about this year’s map.

Blocks & Files:  Could you describe the basic Map concept please?

Philippe Nicolas

Philippe Nicolas: A Coldago Map has essentially 2 dimensions: one – Vision and Strategy, and two – Execution and Capabilities. A leader is recognized by its position to the right but also to the top. Coldago doesn’t only compare products … but analyzes companies playing in a domain. This includes company profile, business and strategy, products, solutions and technologies. I recognized that the product part occupied a large piece and our features matrix uses more than 50 criteria or attributes.

Blocks & Files: What are your data sources?

Philippe Nicolas: We have collected lots of data during this period from end-users, partners and vendors meetings, watching companies and products announcements, cases studies, integrations or partnerships. These informations could be positive like number of deployments, large configurations, application integrations or negative like product un-installations, data integrity and sales behaviors. Of course negative info could have serious impacts. All these data are then digested, segmented, grouped and normalized to make comparison and ranking easier.

Blocks & Files: Talk us through some of the vendor highlights.

Philippe Nicolas: Pure Storage and VAST Data had an excellent year in many aspects; product release and features, deployments, revenues or market penetration. MinIO and Cloudian are leaders like last year even if their positions changed a little based on their 12 month’s operations. 

Scality is still a serious challenger and it reflects their 12 month’s activity. Relatively, Scality has vertically a better position that Cloudian but it belongs in the challenger partition. 

For MinIO, the company continues to confirm their ubiquitous presence with tons of integrations, large adoption and it has moved up vertically. Again, Cloudian is a leader which is not the case for DataCore and Cohesity. DataCore delivered a better job than Caringo – which it acquired in January 2021. 

Cohesity had clearly some positive aspects on the execution side with strong channels, and a comprehensive product with multiple deployment flavors. WEKA is added this year, entering for the first time, as it provides an S3 interface like all others, but it belongs in the specialists’ section. We’ll carefully monitor WEKA’s S3 activity in 2023.

Blocks & Files: Any comments about the other suppliers?

Philippe Nicolas: The historical players, many of them, missed some opportunities and clearly lost some visibility except Hitachi Vantara or MinIO. Also, Quantum has made an interesting move with a product established for years and several developments; it now belongs in the leader’s section, thanks to an improved activity in many aspects during the 12 months. NetApp and IBM, Red Hat included, also occupied a leader’s position.

Blocks & Files: What has changed in the object storage market since last year?

Philippe Nicolas: What is important to realize is the shift in the market for a few years. Being internally an object storage with the few characteristics we know like flat address space… is no longer so important. In other words, the Map is not about object storage purists but more about the market reality. On the users’ side, they care about a S3 interface with a high degree of compatibility with the Amazon API but less on other aspects. They’re looking for ease of deployment, maintenance and support, scalability, capacity and a few other core features of course, especially in the data protection area.

Storage news ticker – 3 January 2023

Storage news
Storage news

Acronis‘ latest cyberthreats and trends report for the second half of 2022 has found that phishing and the use of MFA (Multi-Factor Authentication) fatigue attacks are on the rise. The report provides an in-depth analysis of the security landscape including ransomware threats (number 1), phishing, malicious websites, software vulnerabilities and a security forecast for 2023. Threats from phishing and malicious emails have increased by 60%, and the average cost of a data breach is expected to reach $5 million in 2023. Download a copy of the full Acronis End-of-Year Cyberthreats Report 2022 here.

AWS has announced Amazon S3 Block Public Access that works both at the account and bucket level, and enables an AWS customer to block public access to existing and future buckets. This feature can be accessed from the S3 Console, the CLI, S3 APIs, and from within CloudFormation templates. More info is here.

Druva has replaced Code42 at Alector, a San Francisco startup aiming to slow the progression of neurodegenerative diseases like Alzheimer’s and Parkinson’s. Druva’s case story claims Code42 wasn’t meeting Alector’s demands and plagued the IT team with tickets. Alector also suffered from data sprawl with data residing across a variety of environments, including the data center and Microsoft 365. This made it difficult to track the status and location of workloads and their security, we’re told. Here is the case study.

Molly Presley, Hammerspace’s SVP of Marketing, predicts 2023 will be the year distributed organizations can realize the value and insights of unstructured data faster and more efficiently. Unstructured data will – she reckons – be able to have the same unified access, management, utilization, and rationalization as structured data thanks to unified data management with a unifying metadata control plane and automated data orchestration. Trends for the year include:

  • IT supply chain challenges will compel new approaches to data management.
  • To access sufficient compute resources, organizations need the ability to automate Burst to the Cloud.
  • Access to software engineering talent must be possible from anywhere in the world.
  • Edge will no longer be used only for data capture but also for data use. 
  • The use of software-defined and open-source technologies will intensify.
  • Metadata will be recognized as the holy grail of data orchestration, utilization, and management.
  • A shift away from hardware-centric infrastructures toward data-driven architectures.
  • Data Architects will be the upcoming King of the IT Jungle. 
  • True storage performance that spans across all storage tiers.

… 

IBM has produced a Spectrum Scale Container Native Version 5.1.6.0. Spectrum Scale in containers allows the deployment of the cluster file system in a Red Hat OpenShift cluster. Using a remote mount attached file system, it provides a persistent data store to be accessed by the applications via a CSI driver using Persistent Volumes (PVs). This project contains a golang-based operator to run and manage the deployment of an IBM Spectrum Scale container native cluster.

Containerised Spectrum Scale.

High-end array supplier Infinidat has hired Dave Nicholson as a new Americas Field CTO to replace the retiring Ken Steinhardt. Nicholson’s experience includes being a member of the Wikibon/Silicon Angle/theCube storage and IT analyst firm, GM for Cloud Business Development at Virustream, VP & CTO of the Cloud Business Group at Oracle and Chief Strategist for the Emerging Technology Products Division at EMC. He has roughly 25 years of experience in enterprise storage.

Unified data platform supplier MarkLogic’s MarkLogic 11 product version includes:

  • Geospatial analysis – a more flexible model for indexing and querying geospatial data, scalable export of large geospatial result sets, interoperability with GIS tools; includes support for OpenGIS and GeoSPARQL
  • Queries at scale – Improved support for large analytic, reporting, and/or export queries with external sort and joins
  • Unified Optic API for reads and writes – write and update documents with the Optic API without having to write server-side code
  • BI analysis – use GraphQL to easily expose multi-model data to BI tooling using an industry standard query language
  • Docker and Kubernetes support – deploy MarkLogic clusters in cloud-neutral, containerized environments that use best practices to ensure your success

Effective January 17, Micron has promoted Mark Montierth to corporate VP and GM of its Mobile Business Unit. He is currently VP and GM of high-bandwidth and graphics memory product lines in Micron’s Compute and Networking Business Unit. Raj Talluri is still listed on Linked in as SVP and GM of Micron’s Mobile BU but he is leaving to pursue another opportunity, we’re told.

Amazon FSx for NetApp ONTAP now has FedRAMP Moderate authorization in US East (N. Virginia), US East (Ohio), US West (N. California), and US West (Oregon), and FedRAMP High authorization in AWS GovCloud (US) Regions. Additionally, Amazon FSx for NetApp ONTAP is now authorized for Department of Defense Cloud Computing Security Requirements Guide Impact Levels 2, 4, and 5 (DoD SRG IL2, IL4, and IL5) in the AWS GovCloud (US) Regions. NetApp said that, with this announcement, agencies at all levels of government can move data workloads to the AWS Cloud.

Germany’s Diakonie in Südwestfalen GmbH, which stores medical data from surgical robots, X-ray PACS and mammography screening, has countered rising storage HW (all-disk) costs by introducing automated archiving on tape with PoINT Storage Manager. This has a two-tier storage architecture with primary (disk) and archive (tape) storage tiers and automated transfer from the former to the latter. German-language case study here.

Diakonie in Südwestfalen PoINT system diagram.

Digital insurance provider Allianz Direct is using Rockset’s cloud-native Kafka-based technology to deliver real-time pricing. An algorithm incorporates over 800 factors and adapting these rating factors to pricing models would previously have taken weeks. Rockset’s schema-less ingest and fast SQL queries allow Allianz Direct to introduce new risk factors into its models to increase pricing accuracy in 1-2 days, we’re told. Rockset’s native connector for Confluent Cloud enables Allianz Direct to index any new streaming data with an end-to-end data latency of two seconds, Rockset said. Allianz Direct also uses Rockset to power real-time analytics for customer views and fraud management.

Taiwan’s Digitimes reports that Samsung has raised its NAND prices by as much as 10%. This follows Apple terminating a NAND purchase deal with YMTC and YMTC getting placed on the US Entity list.

Samsung announced the development of its 16-Gbit DDR5 DRAM built using the industry’s first 12nm-class process technology, and the completion of product evaluation for compatibility with AMD. The new DRAM features the industry’s highest die density, Samsung said, which enables a 20% gain in wafer productivity. Its speed is up to 7.2Gbit/sec and it consumes up to 23% less power than the previous Samsung DRAM. Mass production is set to begin in 2023. The new DRAM features a high-κ material that increases cell capacitance, and proprietary design technology that improves critical circuit characteristics. It is built using multi-layer extreme ultraviolet (EUV) lithography.

Samsung 12nm DDR5 DRAM chips.

SK hynix is showcasing a new PS1010 SSD at CES in January. The product was first unveiled at the October 2022 OCP summit. It is an E3.S format product, uses 176-layer 3D NAND and has a PCIe gen 5 interface making it, SK hynix claimed, 130% faster reading, 49% faster writing and 75% better performance/watt than previous generation. Its also showing CXL memory, GDDR6-AiM memory and HBM3 memory.

The 2020 era PE8010 uses 96-layer TLC NAND with a PCIe gen 4 interface and a 1TB to 8TB capacity range. It delivered random read of 1,100,00 IOPS, random writevof 320,000 IOPS, sequential read of 6,500 MB/sec and sequential write of 3,700MB/sec.

Research house Trendfocus has produced a native tape capacity ship table from 2017 to 2021 with a forecast out to 2027. We charted the numbers, reported by the Storage Newsletter, to show the annual exabyte shipments and year-on-year percent changes;  

The 2021 percent change peak was due to the late arrival of the 18TB (raw) LTO-9 format in 2020. The 2022 to 2027 capacity ship CAGR is said to be 21 percent. Trendfocus sees an economic recovery in 2024 lifting the capacity ship growth rate. Over 80 percent of the shipments will be LTO-format tapes, with IBM 3592 format following. Tape still reigns supreme for archive data storage.

TrendForce further projects that the Client SSD attach rate for notebook computers will reach 92% in 2022 and around 96% in 2023. A demand surge related to the pandemic is subsiding, and the recent headwinds in the global economy have caused slower sales in the wider consumer electronics market. As such, client SSDs are going to experience a significant demand slowdown, which, in turn, will constrain demand bit growth. TrendForce projects that for the period from 2022 to 2025, the YoY growth rate of NAND Flash demand bits will remain below 30%. Eventually, enterprise SSDs will take over from client SSDs as a major driver of demand bit growth in the global NAND Flash market.

DataOps observability platform startup Unravel Data has confirmed that David Blayney has joined as Regional VP, Europe, the Middle East, and Africa (EMEA). Unravel raised a $50 million Series D round of funding in September led by Third Point Ventures, with participation from Bridge Bank and existing investors that include Menlo Ventures, Point72 Ventures, GGV Capital, and Harmony Capital, bringing the total amount of funding raised to $107 million.

WANdisco has announced a string of contract wins. It has signed an initial agreement worth $12.7m with a global European based automotive manufacturer for IoT data in the client’s data centre to be migrated to the cloud. This is a one-off migration. WANdisco has also signed a commit-to-consume agreement worth $31m with a second tier 1 global telco and IoT app supplier. Half of the $31m will be paid in advance following the commencement of the project. WANdisco now expects that FY22 revenues will be significantly ahead of market expectations and no less than $19m. Bookings for FY22 are expected to be in excess of $116m. 

DataOS company TMDC moves into view, but some details are missing

A startup called The Modern Data Company has appeared with DataOS, a data operating system aimed at unifying data silos, old and new, and giving users a self-service programmable data OS for analytics, AI, ML and applications.

These are bold claims. Do they stack up? TMDC was co-founded by CEO Srujan Akula and CTO Animesh Kumar in 2018. According to Linkedin profile Akula had been a product consultant and advisor to multiple companies since leaving a VP Product position at Apsalar (now Singular) in 2017. Kumar left  a DMP architect position at Apsalar in 2018 and was co-founder and CTO at automated billing company 47Billion until 2018.

An Akula statement said: “DataOS makes your existing legacy infrastructure work like a modern data stack without rip-and-replacing anything. It costs significantly less, gives you complete control of your data, and makes creating new data-driven applications and services simple for developers and business users alike.”

TMDC says it has developed a data operating system called DataOS to remove complexity and future proof a user’s data ecosystem by intelligently unifying data management under one roof. It is a “complete data infrastructure product” and TMDC claims in a video that “DataOS delivers an operationalisation layer on top of an existing legacy or modern data stack.” 

DataOS video

It has 121 employees and is headquartered in Palo Alto. There are no details available about its funding.

We’re told DataOS enables users to treat their data as software, using declarative primitives, in-place automation, and flexible APIs. These enable users to “easily discover, understand and transform data.” It is, TMDC claims, the world’s first multi-cloud, programmable data operating system, simplifying data access and management by decoupling it from tools, pipelines, and platforms.

TMDC is making a massive claim. DataOS connects, it says, with all systems within the data stack without the need for integrations. Taken literally this means mainframes, IBM Power systems running the i operating system, all flavors of Unix and Linux and Windows, all filesystems and all SAN arrays and all object storage systems.

We have asked TMDC if this is true and what connectors it uses, what data access protocols are available to users, how DataOS is programmable and other questions.

TMDC says DataOS enables access across multiple clouds and on-premises systems in a governed fashion, abstracting away data infrastructure complexity and allowing users to manage and access data across any format and any cloud through a single pane of control. It asserts that DataOS can connect to any system and can see everything that is happening to the data, so customers get a near real-time view.

DataOS allows data developers and users to access the data through a knowledge layer using an open standards approach. Developers can work with tools of their choice with respect to programming languages, query engines, visualization tools, and AI/ML platforms. 

Data is delivered in appropriate formats to deliver advanced analytics, power AI/ML, enable rapid experimentation and build data-driven applications. DataOS supports secure data exchange/data sharing with teams, and heterogeneous formats, such as SQL, Excel files, and more. It can extract data and see metadata. 

TMDC declares that “DataOS is a modern, open and composable data management platform-as-a-service (PaaS) that provides total data visibility and turns data into insights that drive actionable intelligence.”

Q and A session

Blocks & Files: TMDC says DataOS connects with all systems within the data stack without the need for integrations. Taken literally, I think this means mainframes, IBM Power systems running the operating system, all flavors of Unix and Linux and Windows, all filesystems and all SAN arrays, and all object storage systems, whether on-premises or in the public clouds. Is this true?

TMDC: DataOS provides a consistent way to access, discover, and govern data that is often managed in a highly fragmented and siloed manner. Through the data depot contract, DataOS delivers this consistency across access, discovery, and governance. 

DataOS does not connect to mainframes or IBM power systems directly but does connect into the DB2s of the world where the data resides. It can also connect to data pipeline tools, data governance, catalogging, quality tools, etc. to build a knowledge layer that is refreshed in near real time.

Blocks & Files: What connectors does DataOS use to link to SAN, NAS and object systems on-premises and in the cloud?

TMDC: See above.

Blocks & Files: Can the DataOS abstraction layer scale, and by how much?

TMDC: DataOS is not a data virtualization play. While DataOS provides business teams with the ability to create logical data models, it does so by intelligently moving data that needs to be moved to meet the SLAs of the use cases that rely on that data. DataOS comes with a storage layer that supports multiple data formats to facilitate intelligent data movement. DataOS was architected to scale both horizontally and vertically, and we have auto scaled up and down to process over a billion data events, from a normal load of 50 to over 100< events per day — without any intervention needed.

Blocks & Files: Can it work across a users’ distributed site and both on-premises and the public clouds? Which public clouds?

TMDC: DataOS delivers a consistent way to work with data that sits across multiple clouds and data centers.

Blocks & Files: What protocols can users have available to access data?

TMDC: Users and applications can access data with DataOS using our standard JDBC, ODBC, ODATA connections. They can also leverage REST API’s and GraphQL interfaces that are available on top of all data products within DataOS. [I, and Reverse ETL.]

Blocks & Files: In what sense does DataOS make data programmable?

TMDC: DataOS makes data “programmable” because business teams can define domain level data lenses that can be composed to create higher order capabilities in an object oriented programming constructs, enabling businesses to take those building blocks (data products and data lenses) and power many types of use cases without needing data engineering support. 

The DataOS architecture starts with core primitives that are the building blocks to realizing any data architecture design pattern (e.g., data fabric, data mesh, etc.). The composable nature of the architecture allows our customers to take the building blocks and compose data experiences instead of building/integrating them.

Blocks & Files: How is TMDC funded?

TMDC: We cannot reveal this information at this time.

These answers reveal that at least some of TMDC’s claims require scrutiny, such as its claim to connect all systems within the data stack. We also note that TMDC does not reveal which public clouds it supports or the protocols (connectors) it uses to connect to storage repositories. Finally, it will say nothing about its funding, which we find quite odd.

Absent more detailed information about TMDC’s funding, technology, customer progress and engineering credentials, Cohesity, CTERA, Hammerspace, Komprise, LucidLink, Nasuni, Panzura and others needn’t start worrying just yet.

‘It’s ugly’: Micron DRAM and NAND revenues down, layoffs coming

Micron has moved deeper into a NAND market revenue trough with no sign the bottom has yet been reached. It’s making production cuts, lowering exec pay and laying off workers – with 10 percent of roles expected to be either cut or left empty in 2023.

Revenues were down 47 percent year-on-year to $4.1 billion in its first fiscal 2023 quarter, which ended December 1. Micron made a $195 million loss – its first loss in five and a half years.

CEO and president Sanjay Mehrotra said: “The industry is experiencing the most severe imbalance between supply and demand in both DRAM and NAND in the last 13 years.”

Wells Fargo analyst Aaron Rakers told subscribers: “It’s ugly, but finding that down-cycle bottom (F2Q23) is what really matters.”

Customers are using up their inventories and buying fewer DRAM and NAND chips, leaving Micron with its own inventory which needs using up. It’s lowering production to stop adding so many chips to that stash.

Financial Summary

  • Gross margin: 22.9 percent compared to 40 percent in prior quarter and 47 percent a year ago
  • Gross margin forecast: 8.5 percent plus/minus 2.5 percent next quarter
  • Cash from operations: $943 million
  • Free cash flow: -$1.5 billion
  • Liquidity: $14.6 billion at quarter end
  • Diluted EPS: -$0.04 compared to prior quarter’s $1.45 and year-ago $2.16

A quarterly revenue and profit history chart make Micron’s revenue and profit drops starkly visible:

Micron revenues

A second chart of revenues by quarter by fiscal year shows the steep sequential fall in revenues this quarter (blue line) following a less deep fall (red line) in the previous quarter:

Micron revenues

If we look into Micron’s DRAM and NAND revenues separately, we see that DRAM revenues declined 49 percent Y/Y to $2.83 billion with NAND falling less sharply with a 41 percent drop to $1.13 million: 

Micron revenues

Mehrotra said: “Across nearly all of our end markets, revenues declined sequentially in fiscal Q1 due to weaker demand and steep decline in pricing. Shipment volumes were impacted by our customers’ inventory adjustments, the trajectory of their end demand, and macroeconomic uncertainty. We believe that aggregate customer inventory, while still high, is coming down in absolute volume, as end market consumption outpaces ship-in.”  

Micron revenues

The storage and embedded segments saw less steep revenue declines: 41 percent to $680 million for storage and a mere 18 percent descent to $1 billion for the embedded market. The need for NAND chips in automotive driver assistance and entertainment systems is still quite strong and likely to grow.

Long story short? Compute (server and PC) and networking DRAM revenues fell off a cliff, NAND not so much, mobile revenues dipped markedly while embedded and storage revenues fared less badly.

Car infotainment rocks

The automobile part of the embedded market is a bright spot, apparently: “In fiscal Q1, auto revenues grew approximately 30 percent year over year, just slightly below our quarterly record in Q4 FY22. The automotive industry is showing early signs of supply chain improvement, and auto unit production continues to increase. The macroenvironment does create some uncertainty for the auto market, but we see robust growth in auto memory demand in fiscal 2023.”

And that should last at least until 2027, the company hopes: “Over the next five years, we expect the bit growth compound annual growth rate (CAGR) for DRAM and NAND in autos to be at approximately twice the rate of the overall DRAM and NAND markets.”

Mehrotra and his team are not forecasting when the revenue trough will bottom out as demand is being seen as continuing at a weak level: “In datacenter, we expect cloud demand for memory in 2023 to grow well below the historical trend due to the significant impact of inventory reductions at key customers.”

And when will the down cycle bottom be reached? Some time in 2023 possibly. Mobile phone demand may improve in 2023: ”We forecast calendar 2023 smartphone unit volume to be flattish to slightly up year over year, driven by improvements in China following the reopening of its economy.”

But Micron’s overall DRAM and NAND outlook is weak: “For calendar 2023, we expect industry demand growth of approximately 10 percent in DRAM and around 20 percent in NAND.”

It’s due to  “reductions in end demand in most markets, high inventories at customers, the impact of the macroeconomic environment, and the regional factors in Europe and China.“ And that means production cuts.

Production and expense cuts

Micron and the DRAM/NAND foundry industry will supply fewer DRAM and NAND bits than previously planned with “an approximately 40 percent reduction year on year, and we expect fiscal 2023 wafer fab equipment (WFE) capex to be down more than 50 percent year on year. We expect fiscal 2024 WFE to fall from fiscal 2023 levels, even as construction spending increases year on year.” 

Technology transitions to denser DRAM and NAND chips are being delayed: “Given our decision to slow the 1ß DRAM production ramp, we expect that our 1γ (1-gamma) introduction will now be in 2025. Similarly, our next NAND node beyond 232-layer will be delayed to align to the new demand outlook and required supply growth.”

There will be short-term pain measures with: “Reductions in external spending, productivity programs across the business, suspension of a 2023 bonus company-wide, select product program reductions and lower discretionary spend.”

Then it gets personal: “Executive salaries are also being cut for the remainder of fiscal 2023, and over the course of calendar year 2023, we are reducing our headcount by approximately 10 percent through a combination of voluntary attrition and personnel reductions.“

It has some 48,000 employees so this means 4,800 of them will hit the streets. 

And that may not be all: “We are prepared to make further changes and remain flexible to exercise all levers to control our supply and manage our cost structure.”

Outlook

Mehrotra added: “While the environment remains challenging, we currently expect second-half fiscal 2023 revenue to improve from the first half. We are confident that the broad advantages enabled by data-centric technologies will create long-term growth for our industry, and we expect the total available market to reach approximately $300 billion by 2030.”

The outlook for next quarter is revenues of $3.8 billion plus or minus $200 million. The $3.9 billion mid point would be a 50 percent drop from a year ago. It might be the bottom of the down cycle.

The NAND foundry canary may be tweeting in distress, but in their view, what’s going down will rise up again. Just not in a flash (ahem.)

Coldago supplier rankings for cloud, enterprise and high-performance file storage

The Coldago research firm has ranked file storage suppliers in the Cloud, Enterprise and High-Performance markets, making for interesting reading alongside rankings from GigaOm.

Coldago looks at file storage overall and then divides it into three categories which allow it to evaluate 35 suppliers in total and compare them in a three-way split. This seems to us to be a more appropriate way of doing it than comparing them all in a single unstructured data category, for example.

Coldago conceives of the market as a map diagram: its take on the Gartner Magic Quadrant approach, placing suppliers in a two-dimensional space to indicate where they are in the market based on their vision and strategy on the one hand and their execution and capabilities on the other. Suppliers are ranked in a low (bottom left corner) to high (top right hand corner) fashion with closeness to the diagonal line representing a balance between the two dimensions.

Then the analysis splits the suppliers into four categories: Niches, Specialists, Challengers and Leaders as we move from left to right in the diagram. The Cloud File Storage case sees 17 suppliers evaluated, with Nasuni and CTERA leading the group being in the Leaders’ category together with third-placed Panzura:

Coldago map

The other suppliers in this group, in alphabetical order, are AWS, Buurst, Cohesity, Egnyte, Hammerspace, JuiceData, LucidLink, Microsoft, Morro Data, NetApp, ObjectiveFS, Peer Software, Tiger Technology and XenData.

Coldago map

The Enterprise file storage supplier ranking places Microsoft at the top followed by NetApp and Pure Storage. IBM, Dell and VAST Data complete the Leaders’ area list. There are no Niche or Specialist suppliers at all in this file storage map, and five Challengers: iXsystems, SUSE, Qumulo and Veritas.

Several of these vendors appear in the High Performance File Storage map: 

Coldago map

There are no Niche players but we see NEC, Fujitsu, Quobyte and Huawei in the Specialists category. ThinkParQ, HPE, Panasas and Quantum are listed as Challengers. Qumulo just enters the Leaders’ section with a closely positioned sextet of vendors ranked higher: IBM, WEKA, VAST Data, Dell, DDN and surprise, Pure Storage leading everyone else with the best balance of execution and capabilities versus vision and strategy. That’s seven leaders overall.

Intel’s DAOS is not included in this look at high-performance file storage suppliers. We understand Intel did not respond to Coldago’s inquiries.

More details of the evaluation criteria and measurement scores are available (for a price) in Coldago’s report looking at the three categories of file storage.

Ahana: Lakehouses need community-governed SQL

It’s not just any old SQL lakehouses need, but open community source SQL, according to Ahana CEO and co-founder Steven Mih, a member of the governing board of the Presto Foundation. Ahana was started in 2020 with $4.8 million seed funding. The company raised $20 million in VC funding last year and is developing software for data lake analytics, using the Presto-distributed SQL query engine.

Update 1. Revised Trino history added; 22 Dec 2022.

Update 2. Linux Foundation and Presto Foundation comments added; 23 Dec 2022.

Stephen Mih, Ahana
Steven Mih

Presto is an open source project created by Facebook and used at Uber, Intel and many more businesses. It’s said to be the de facto standard for fast SQL processing of data lakes.

Mih is a past CEO of Alluxio and Aviatrix Systems. He was worldwide sales VP at NoSQL database supplier Couchbase before that. Alluxio produced open source data orchestration system software for analytics and machine learning in the cloud. He became a Governing Board Member of the Presto Foundation in late 2019. We asked Mih some questions to find out more.

Blocks & Files: Set the data analytics scene for us.

Steven Mih: Enterprises naturally want to amass as much data as possible. By combining the data diversity of a data lake – able to store structured, semi-structured and unstructured data that is streamed, loaded, or transformed for interactive, batch, or in-app workloads – with the management capabilities of a data warehouse, the data lakehouse provides the best way to amass the data.

Blocks & Files: But you see a problem?

Steven Mih: Data lakehouses’ diversity and volume of data has complicated  the world of data analytics. Separate components now handle storage, compute, table types, metadata catalog, and security and permissions. Enterprises can now choose among different fit-for-purpose analytics engines for these components. However, working with various engines requires higher levels of technical expertise and mastery of different query and programming languages. Most organizations have few experts qualified to run sophisticated analytics; therefore, the whole process can become error-prone, slow, and confusing.

Blocks & Files: How do we fix this?

Steven Mih: There’s a need for innovation to make analytics capabilities more streamlined and accessible throughout an organization. One way to accomplish this is to use SQL, which is familiar to many professionals who work with data, as the common query language throughout an organization’s analytics ecosystem.

Blocks & Files: So is it a solved problem or not?

Steven Mih: No, it’s not a solved problem. We need to use open source technologies for the lakehouse components. Unlike proprietary offerings, open source technologies provide the flexibility to adapt and start using tools and technologies without getting locked into a particular vendor or technology platform.

But one more consideration is important: who’s in charge of the technology? When open source technology is managed by a single corporate entity – as with MongoDB/Mongo, Redis/Redis Labs, or Trino/Starburst – the interests of the corporate entity tend to supersede the interests of the users.

Blocks & Files: What is the alternative that you favor?

Steven Mih: Open source technology can be managed by an openly governed community, as with MySQL, Kubernetes, or Linux Foundation Presto. By giving all users and participants “skin in the game,” open communities tend to be more flexible, deliver technical enhancements provided by and specifically beneficial to the whole community, and adopt governance practices that reflect the needs of all its members.

Blocks & Files: Could you describe the features of your ideal query engine?

Steven Mih: To democratize and enhance data analytics for enterprises with data lakehouses, I believe the ideal query engine would have the following characteristics:

  • High price-performance and scalability, to handle lakehouse ever-increasing analytics workload  
  • Use standard SQL as a common query language, to make analytics capabilities more widely accessible throughout an organization’s technical staff
  • Be based on open technology, to provide the flexibility to choose the best tools and technologies for specific needs, now and in the future
  • Governed by a truly open community, not beholden to any single vendor’s whims

Comment

The “governed by a truly open community” is the key aspect here in our view. It’s possibly more a matter of which IT tech open source philosophy you prefer than product features. Certainly the Presto Foundation has good open source credentials.

Ahana and Starburst are competitors. A Trino spokesperson tells us that; “In 2012, Dain Sundstrom, Martin Traverso, David Phillips and Eric Hwang created Presto. They worked on the project for six years at Facebook until 2018 when Facebook made it clear that they wanted tighter control of the project.”

“Maintaining the open source integrity of the project was very important to the original Presto creators, so as Facebook took ownership of the Presto brand as PrestoDB, the Presto creators continued their work on the original Presto, just under the name PrestoSQL (then rebranded to Trino in 2020).”

“So all that is to say that Trino, which is formerly PrestoSQL, is actually the original Presto, and PrestoDB is what Facebook took over and sold to the Linux Foundation.” 

The Trino execs founded Starburst to sell Trino connectors and support.

Linux Foundation Comment

The Linux Foundation took issue with the Trino spokesperson’s comments. Mike Dolan, Linux Foundation SVP & GM of Projects, and Girish Baliga, Presto Foundation Chair, told Blocks & Files: “First, Facebook contributed the Presto project to the Linux Foundation. It was not “sold”. This is an open source project that was transitioned into our nonprofit foundation to open the project’s governance to the community under a neutral entity.”

“The foundations also want to clarify some of the points:

  • Presto Foundation has (every year for years now) invited the creators of the Trino fork to join the project. Trino has not yet accepted the invitation. We understand this is because Trino prefers to retain control versus using an open governance decision-making model employed by the Presto community. That’s their decision and an option that may make sense for Trino.
  • The Presto Foundation (not Facebook) is the entity that governs the project. Facebook continues to participate, but the governance of Presto was transferred into the hands of the project’s community contributors. We refer to this a a “do-ocracy” model whereby the people doing the work in the project make decisions for the project.

“The Linux Foundation ensures that the Presto Foundation remains a community-controlled initiative governed by open and neutral contribution guidelines and policies. Everybody is invited to join the Presto Foundation. Anyone can contribute to the Presto project, regardless of whether their organization is an official member. I hope this helps add clarity to the situation.”

StorONE reshuffles executive team

Multi-purpose storage system supplier StorONE has made some exec changes.

Chief revenue officer Chris Noordyke has been promoted to general manager, and Jenn Null, head of demand generation, is now head of marketing. George Crump, previously chief marketing officer and then chief product strategist, has left the company.

Gal Naor, StorONE co-founder and CEO, said: “Jenn is the right person at the right time, for today and for where the company will be soon. I was impressed by how much impact she had in a short time bringing a new marketing direction based on her deep experience.”

Jenn Null, StorONE
Jenn Null

Null said: “As a marketer, when you’re given the chance to work with a company whose product will change the market, you take it. I am thrilled to be a part of StorONE at this pivotal time in the company.”

Null will lead product marketing, channel marketing, and digital initiatives. The company said she will plan and implement “scalable marketing programs that will be instrumental in continuing to facilitate StorONE’s rapid growth.”

There are strong indications here that this self-funding company is growing despite the general economic situation. 

StorONE, founded in 2011, came out of the same Israeli storage software reinvention crucible that produced Infinidat (founded in 2010), Reduxio (2012) and VAST Data (2016). They all looked at existing SAN and NAS systems and thought they could do better by starting again. 

Infinidat concentrated on the high-end SAN array market with its predictive memory caching. VAST Data rewrote a file-based storage OS with a Disaggregated Shared Everything  (DASE) hardware-software design. Reduxio, now Ionir, rethought storage principles and devised its HX/TimeOS hybrid array, which stores data objects and restores them to any point in time. It offered near-SSD performance from a hybrid flash-disk array but ran into problems. All three took the venture capital dollar in multiple funding rounds and went for growth.

StorONE took its startup funding and $30 million A-round in 2012 and entered a five-year development purdah in which it said little publicly and developed its TRU (Total Resource Utilization) storage OS that provides file, block and object storage from a single array supporting both disks and SSDs. The system was launched in 2017.

Since then it has plugged away, not taking in any more funding and selling systems supporting virtually any storage use case you could think of – Optane-enhanced IOPS, backup, all-flash, containers, whatever. Without a VC-funded dash for growth and eschewing a single file or block or object or other customer market, its array is an all-in-one Swiss Army knife of a system with no single market identity. Over the last five years it has built a channel, emphasizing its combination of performance and general applicability, and grown organically. It’s one to watch.

Having an efficiency standard like SNIA’s is good. But what about public cloud?

Comment: The Emerald energy initiative from the SNIA holds great promise to better measure storage efficiency and lower carbon emissions, but it’s missing a public cloud storage element, and that can only come with the cooperation of the big three hyperscalers.

Energy efficiency is getting more and more important as a storage buying criterion. As yet there is no standardized and transparent way for buyers to compare and contrast potential storage purchases on an energy efficiency basis. A standard comparison metric will be needed if businesses are to meaningfully understand the carbon emission load of their storage systems.

Who better than a storage industry body to produce such a measurement process?

The Storage Networking Industry Association (SNIA) has stepped forward and has its Emerald program looking at storage device energy efficiency and measurement.

The downloadable Emerald Power Efficiency Measurement Specification is a standardized method to assess the energy efficiency of commercial storage products in both active and idle states of operation. v4.0 was published in July 2020, having been developed, released, and maintained by the Green Storage Technical Working Group under the guidance of the SNIA’s Green Storage Initiative (GSI).

The United States Environmental Protection Agency (EPA) Energy Star Program for Data Center Storage references the SNIA Emerald Specification, and publishes storage vendor system test reports for public use to aid IT procurement, planning, and operations.

Since the energy use of storage systems varies widely based on the media type, the Emerald spec divides them into three groups: disk, removable media library (RVML), and non-volatile storage systems (NVSS). 

This is to make product comparisons viable and useful. It is pointless comparing a tape library and disk storage array in energy use terms as they are intended for completely different purposes and energy use is not a practical way of choosing between them.

Each set is further divided into categories for the same reason. The disk category covers online and nearline disk drives. The RVML group covers tape and optical disk systems and Virtual Media Library, which includes disk drive and SSD systems. The third category, NVSS, includes solid state memory access systems and disk access systems, meaning SSD and optional disk drive media types.

An SNIA table details these, together with their access patterns and paradigms: 

SNIA table

The categories are sub-divided again by product type classifications to distinguish between products aimed at different market sectors:

  • Consumer/component – 1
  • JBOD – 1.5
  • Low-end – 2
  • Mid-range – small – 3 – and large – 4 
  • High-end – 5
  • Mainframe – 6 

The spec details rules for measuring the power efficiency of each category and include test sequence, test configuration, instrumentation, benchmark driver, IO profiles, measurement interval, and metric stability assessment. 

The test results for devices are either primary metrics, such as IOs/W, MiB/Watt and GB/W for Block Access, or secondary metrics such as capacity optimization for snapshots, compression and data deduplication. These have yes-no values.

The test results are listed on an Energy Star webpage with the default sort order by brand name. Here is a test result for a Dell EMC PowerStore 1000T array:

SNIA Dell efficiency

You can see that the Product Type is an NVSS Disk Set Online 4. That means it is a non-volatile solid state with optional disk system in the large mid-range category. It has been tested for block IO with a RAID configuration. The single metric is the Trans Optimal Point Hot Band Workload Test (IOPS/W) with a value of 80.73. We think this test name is clumsy and needs shortening. Even the acronym – TOPHBWT – is unwieldy.

A test result for an HPE Primera C650 array has a radically different TOPHBWT value, 7.9 IOPS/W, but then this is a disk array with no solid state drives although it is, like the PowerStore 1000T, a large mid-range category product and tested for block IO:

SNIA HPE efficiency

All in all there are three pages of results filed for Dell, HPE, IBM, Lenovo, NetApp, Seagate, Veritas and Viking Enterprise Solutions (Sanmina).

Comment

It occurs to Blocks & Files that cloud storage services are not covered, and that they introduce a whole new set of complexities. The Emerald program is targeted at on-premises customers buying storage boxes, not storage services from public cloud suppliers such as AWS, Azure, and GCP. It surely needs extending to cover cloud-supplied storage services such as FSx for ONTAP, Glacier Infrequent Access, Azure Blob, Pure Cloud Block Store, and so forth.

That cannot be done unless the public cloud storage service suppliers cooperate. We intend to address this aspect of storage energy efficiency and ask the SNIA, as well as the big three hyperscalers, for their cloud storage energy efficiency views.

At present the SNIA’s Emerald program provides buyers of on-premises storage systems the ability to specify Emerald EnergyStar test levels as requirements for suppliers. For example, we want a small mid-range NVSS product for block access with a TOPHBWT minimum of 50 IOPS/W. In order for this to be useful, IT storage buyers have to specify such EnergyStar test values in their requests to tender. So including public cloud-delivered storage services in SNIA’s Emerald program, if at all possible, makes a great deal of sense.

Aston Uni developing 5nm surface channel storage technology

Aston University claims new polymer surface tech will increase storage capacity and prevent more mega datacenters being built.

The UK’s Aston University says its scientists are starting a two-year research program to tackle the global shortage of digital data storage. The university says that in the next three years the total amount of data in the world – the global datasphere – is predicted to increase by 300 percent. As datacenters account for around 1.5 percent of the world’s annual electricity usage, it has been recognized that building more huge warehouses is not sustainable.

Dr Matt Derry, Aston University
Dr Matt Derry

Lead scientist chemistry lecturer Dr Matt Derry said: “Simply building new datacenters without improving data storage technologies is not a viable solution. Increasingly we face the risk of a so-called data storage crunch and improved data storage solutions are imperative to keep up with the demands of the modern world.”

Good to know. The university, no doubt recognizing the vital importance of the problem, has awarded the research project £204,031 ($247,960) in funding. Derry works in the university’s College of Engineering and Physical Sciences, and is working in collaboration with Specialist Computer Centres (SCC), the science facility Diamond Light Source, and Babeș-Bolyai University, Romania. 

He will be accompanied by Dr Amit Kumar Sarkar, a researcher in materials chemistry, who is being funded by the Engineering and Physical Sciences Research Council.

Diamond Light Source is the UK’s national synchrotron science facility. It produces 32 beams of intense light, 10,000 times brighter than the Sun, 3GeV, in X-ray to far infrared wavelengths. These can be used to study an object’s structure, such as asteroid grains or protein molecules, and, we suppose, features of a surface.

Diamond Light Source is working with Aston University
Diamond Light Source synchroton building

Sarkar said: “I’m delighted to be joining Aston University to develop more efficient data storage technologies. We will be exploiting advanced polymer chemistry as a pathway to increase the amount of data that can be housed on storage media.”

The researched technology involves creating new surface channel features around 10,000 times smaller than a human hair which will be used to increase data space. This, the university’s announcement says, “will enable increased capacity in data storage devices to cope with the mind-blowing amount of data produced around the world each day.”

A human hair is about 70 microns (micrometers) thick, 70,000 nanometers. A surface channel 10,000 times smaller will measure 7 nanometers but the University says its scientists will develop channels less than 5 nanometers in width.

We don’t know the shape of such a channel or how data would be recorded in it – using magnetism, electrical resistance or light, for example. Nor do we have any idea of the inter-feature gap needed, nor latency, IO rates and endurance. We have asked the university about these points and when it replies we’ll update this story.

Sarkar thinks that “increasing the efficiency of existing technologies will significantly reduce the need for costly, environmentally damaging construction of new ‘mega datacenters’.”

The university says this research project has the potential to impact other technologies where performance relies on creating regular patterns on the nanometer scale, such as organic electronics for solar energy.

Bootnote

According to the ENGIE Group, the datacenter industry accounts for around 4 percent of global electricity consumption and 1 percent of global greenhouse gas emissions.

NVMe-oC

NVMe-oCNVMe over CXL. This uses CXL requests to tell an MS-SSD to put data into its controller’s DRAM, which forms part of a CXL memory pool. This enables a host to read data from the SSD’s DRAM without waiting for that data to be fetched from the SSD’s NAND, which takes longer. Jim Handy of Objective Analysis writes: “The basic idea is that many SSD reads are for smaller chunks of data than the standard 4KB delivered by an SSD access. Why move all that data over CXL-io or NVMe over PCIe if the processor only needs one 64-byte cache line of it?  NVMe-oC is expected to both reduce the I/O traffic and the effort spent by the host to move the data itself.”

This enables the SSD to pre-fetch data for the host and place it in the controller’s DRAM ready to be read. NVMe-oC uses uses CXL.io to access the SSD and CXL.mem to access the memory. Handy writes: “Special commands tell the SSD to write data into that memory or to write from the memory into the SSD, without any interaction from the host, to reduce host-device data movement.”The host needs an NVMe-oC driver.