Home Blog Page 229

Astronomer gets $213 million for Airbnb-sourced DataOps development

An outfit called Astronomer has landed $213 million in funding to continue developing its data operations engineering software so users get cleaner data assembled and organised faster for analytic runs. It has also bought data lineage company Datakin.

The software is based on Apache Airflow – code originally developed by Airbnb to automate what it called its data engineering pipelines. These fed data gathered from the operation of its global host accommodation booking business to analytics routines, so Airbnb could track its business performance. The data came from various sources in various formats and needed collecting, filtering, assembling, cleaning, and organizing into sets that could be sent for analysis according to schedules. In effect, this is an Extract, Transform and Load (ETL) process with multiple data pipelines which needed creating, developing, maintaining and scheduling.

Astronomer CEO Joe Otto blogged: “Astronomer delivers a modern data orchestration platform, powered by Apache Airflow, that empowers data teams to build, run, and observe data pipelines. … Airflow’s comprehensive orchestration capabilities and flexible, Python-based pipelines-as-code model [have] rapidly made it the most popular open-source orchestrator available.”

Airbnb used Airflow for:

  • Data warehousing cleansing, organizing, quality checking, and publishing data into its data warehouse;
  • Growth analytics – computing metrics around guest and host engagement and growth accounting;
  • Email targeting – applying rules to target users via email campaigns;
  • Sessionization – computing clickstream and time spent datasets;
  • Search – computing search ranking-related metrics;
  • Data infrastructure maintenance – folder cleanup, applying data retention policies, etc.

Analytics needs are evolving all the time, and many companies have multiple systems that need co-ordinating and integrating.

Getting raw data from its storage on disk or flash drives requires customer ETL procedures, aka data pipelines, which are crafted by data engineers using directed acyclic graphs or DAGs to visualize what’s going on. These link procedures or processing steps in a sequence, with interdependencies between the steps.

Example directed acyclic graph (DAG)

The diagram shows an example DAG, with B or C procedures dependent on the type of data in A and the desired end-point analytic run. If the D step is left out of the sequence than the overall procedure will fail. DAGs enable pipelines to be planned and instantiated. Airbnb created Airflow, using Python (procedure-as-code), to automate its own data pipeline engineering, and donated it to the Apache Foundation. 

There are more than eight million Airflow downloads a month and hundreds of thousands of data engineering teams across a whole swaths of businesses use it – such as Credit Suisse, Condé Nast, Electronic Arts, and Rappi. It’s almost the industry-standard way of doing DataOps.

Astronomer effectively took over its development and maintains it as an open source product while also shipping its own Astro product – a data engineering orchestration platform that enables data engineers, data scientists, and data analysts to build, run, and observe pipelines-as-code.

There is a need to discover, observe and verify the lineage – the sequence – of the data involved and the Datakin acquisition enables Astronomer to have an end-to-end method to achieve it, using lineage metadata. Datakin’s website promises: “With lineage, organizations can observe and contextualize disparate pipelines, stitching everything together into a single navigable map.”

Astronomer has now gained $283 million in total funding. The latest round was led by Insight Ventures with Meritech Capital, Salesforce Ventures, J.P. Morgan, K5 Global, Sutter Hill Ventures, Venrock and Sierra Ventures. 

Nasuni scores more cash to fight file services war

Nasuni has scored $60 million in a ninth round of funding with software development, more international expansion, and potential strategic acquisitions in mind.

The company was started in 2008 as a sync’n’share, collaboration, and file services gateway supplier and has grown into a cloud-based file services platform provider supporting AWS, Azure, and GCP. Responsiveness to users is helped by its use of edge caching devices. CEO Paul Flanagan told us in February: “The new customer growth rate was 82 per cent in 2021 and the year ended with around 600 enterprise customers.”

In the funding announcement, he said: “Nasuni pioneered the cloud-based file data services category almost a decade ago and has led the way in defining the architecture, business model, and user experience. This investment and our continued, rapid growth are validation of our vision.”

Nasuni pointed out that IDC predicts cloud file services adoption will reach a five-year compound annual growth rate (CAGR) of 40 per cent and be a $14 billion market by 2025. The growth rate for file data is higher than both the block and object storage markets. 

The latest cash injection came from new investor Sixth Street Growth, which has bagged itself a Nasuni board seat. Nasuni’s total funding to date stands at $169 million, it said today, and the company has $100 million in cash on its balance sheet. The business did not reveal estimates of its latest valuation but claimed it is 250 per cent higher than 18 months ago, meaning it’s possibly a unicorn with a greater than $1 billion valuation.

Nasuni says that data growth is exponential, and traditional data storage and protection based on hardware is no longer scalable, secure or cost-effective. Businesses need a new approach which can be deployed from anywhere, have unlimited scale, protect data across the enterprise, and deliver it securely as a cloud service. Only solutions built natively in the cloud will be able to successfully deliver on all of this.

Marketing war looms

The message here is that on-premises filers are yesterday’s tech. We see on-premises filers moving into the cloud too. NetApp’s Data Fabric means its file services are available in AWS, Azure, and GCP. Qumulo supports them as well. The on-premises filer suppliers have the equivalent of Nasuni’s edge caching devices – their existing filer boxes – but they don’t have Nasuni’s cloud center base. 

Nasuni File Data Services system can be viewed as a monstrous filer in the cloud which talks to edge devices in customers’ datacenters. Many rivals are focused on premises with each customer’s datacenter being the center of its filesystem universe, and the cloud instantiations viewable as quasi-satellite locations.

An on-premises system may be multiple petabytes in size and support thousands of users, while Nasuni’s cloud-focused platform could be approaching exabytes in size and have the scale to service millions of users. This, we think, is what Nasuni is hinting at when it claims only systems built natively in the cloud will be able to successfully deliver file services at scale.

There is a marketing war looming between the on-premises filer suppliers and the cloud-centric file services group. Nasuni firmly believes this is approaching and needs the cash to build up its operation and withstand an onslaught from competitors as they switch – as Nasuni thinks they must – to a cloud-centric product strategy.

Western Digital wins ethics award months after settling lawsuit over SMR scandal

Western Digital is being recognized as one of the “World’s Most Ethical Companies” by Ethisphere for the fourth year in a row just three months after settling a class action lawsuit for $2.7 million over adding lower rewrite performance SMR disk drives in its Red NAS product line in 2020 without telling customers.

The company admitted to the substitution and promised clearer product technology communications in April that year.

Tiffany Scurry, SVP and Chief Compliance Officer at Western Digital, said today of the award: “We are proud to once again receive this recognition from Ethisphere, further underscoring the ethical responsibilities we have with our stakeholders, including employees, shareholders and the broader market.”

Ethisphere is a for-profit company based in Scottsdale, Arizona, led by CEO Tim Erblich. Apart from providing annual “World’s Most Ethical Companies” awards, it runs a Business Ethics Leadership Alliance (BELA) with members paying a subscription to participate, and various business ethics-focused events and benchmarking services.

Companies such as WD that want to apply for the recognition take Ethisphere’s Ethics Quotient survey – a proprietary rating system that collects and objectively scores self-reported data in five weighted categories: governance, leadership and reputation, ethics and compliance program, ethical culture, and environmental and societal impact.

Other storage-related companies on the Ethisphere list include Dell Technologies, HPE, IBM, Intel, Micron, Microsoft, and Teradata.

Ethisphere comms VP Anne Walker told us: “We strive to honor organizations based on their body of work and not just one particular policy or program that is groundbreaking or admirable. Likewise, we don’t believe that one particular controversy, settlement, fine, regulatory action, or lawsuit necessarily disqualifies an organization from being honored.”

Walker added: “That being said, these outcomes are taken into consideration during the evaluation time frame. How organizations respond to unethical or illegal actions once uncovered, and their general culture and policies are also considered in our evaluation.”

She told us more about Ethisphere’s rating methods: “In short, we evaluate an organization’s (i) Ethics and Compliance Program, (ii) Culture of Ethics, (iii) Corporate Citizenship and Responsibility, (iv) Governance, and (v) Leadership and Reputation. Each category is evaluated through a combination of answers to our Ethics Quotient (EQ) questionnaire, submitted supplemental documentation, and where necessary, independent research and follow up with a candidate. Evaluation of the Leadership and Reputation category also includes a review of publicly available information with a bearing on an organization’s reputation for acting ethically.”

Qumulo opens the door to Kubernetes with CSI

Scale-out file system supplier Qumulo has made its file services available to Kubernetes-orchestrated containers via a CSI driver plug-in.

CSI, the Container Storage Interface, enables stateless containers orchestrated by Kubernetes to request and consume storage services such as volume provisioning, data read and writes, and protection from external storage. They effectively support statefulness. Qumulo’s Core product provides scale-out file services and runs in its own on-premises appliances, third-party servers in its Server Q form, and also, in its Cloud Q incarnation, in the public clouds – AWS, Azure, and GCP.

Sean Whalen, Qumulo senior cloud product marketing manager, wrote in a blog: “Now, customers innovating using Kubernetes don’t have to set up a storage interface each time a cluster is set up or knocked down – the process is automatic and provides the containerized application maximum exposure to the Qumulo analytics so that customers can easily understand what’s happening across their stored data.”

The CSI driver is Qumulo production preview software and provides exposure to Qumulo analytics for containerized applications so that customers can understand what’s happening across their stored data.

Kubernetes operates a cluster of machines, starting and stopping containers on behalf of its users. CSI allows the Kubernetes orchestrator and individual containers to connect to external (persistent) storage. Qumulo storage will automatically deploy inside a new container and supports the movement of storage from container to container and machine to machine.

Ben Gitenstein, VP of product at Qumulo, said: “Qumulo’s new CSI driver enables customers to store unstructured data once but serve it to an infinite number of both native applications and container-based microservices – all without moving data, copying it to disparate systems, or changing their workloads. Customers who store their data on Qumulo can now focus their time on building modern applications, not on moving or managing their data.”

Qumulo is not alone here. CSI driver support is table stakes for external storage suppliers. Dell’s PowerScale/Isilon already supports CSI as do HPE’s Primera and Alletra products, IBM’s FlashSystem, NetApp’s ONTAP software, Pure Storage, and Weka with its scale-out, parallel filesystem software.

Storage containers

Beyond CSI, external storage software can be made into a container itself. Examples are Pure’s Portworx, MayaData’s OpenEBS Mayastor product, Ondat (rebranded StorageOS), and Robin.io’s cloud-native storage. These storage containers execute inside a server’s environment and link to the server’s own physical storage or to external storage. 

StorageOS, for example, aggregates the local disk storage in a cluster of servers (nodes) into one or more virtual block storage pools. Storage in the pool is carved out into virtual volumes and app containers in the nodes mount and access these virtual volumes via the storage container.

When executing in the public clouds, they would use the CSP’s storage services. Either on-premises or in the public clouds Kubernetes will be used to orchestrate and manage the storage containers as well as the application containers for DevOps users.

A storage container runs like any other app container with no dependencies on proprietary kernels, hardware, storage protocols or other layered services – customers are freed from lock-in to these things. In theory, a storage container should respond more quickly to app container requests for storage services as the link is direct rather than hopping across network links to an external storage system. The storage container should also scale out beyond the limit of, for example, a dual-controller array.

Storage consultant Chris Evans has said: “I doubt any storage array could cope with creating and destroying hundreds of volumes per hour (or more), whereas on (Ondat) StorageOS, those constructs are mainly in software on the same node as the application, so can be created/destroyed in milliseconds.”

It seems possible that there will be a phase 2 in Qumulo’s support of containerization, with its Core software eventually becoming cloud-native itself.

Storage news ticker – March 24

Storage news
Storage news

Arcserve’s StorageCraft DRaaS service degradation (outage) continues. A March 23 status message read: “Our engineers continue to actively work on the resolution for this issue.” The issue has been ongoing since March 9 or 10 and currently affects DRaaS availability in Australia, Canada, GCP-Canada, Ireland, US-UT, and US West.

ChaosSearch has been included as a Representative Vendor in the 2022 Gartner Market Guide for Analytics Query Accelerators. The report offers guidance on vendors that provide structured query language (SQL) or SQL-like query support on a broad range of data sources to deliver business intelligence (BI) dashboards, interactive query capabilities, and support for data modeling. ChaosSearch enable users to perform both log analytics and SQL queries concurrently and in situ from their cloud object storage, without the data pipelining, transformation, or movement that most other data lakes, data warehouses, and data lakehouses require today.

Germany-based RNT Rausch, an IT supplier, has announced two Yowie-brand storage appliances running Cloudian’s HyperStore object storage software with S3 object lock immutability: the Yowie 1100 (2TB start level) and 1200 (8TB start level). They are single-node appliances with six hot-swap disk drives and provide enterprise-grade ransomware protection to organizations with data capacities of 2–100TB. The appliances prevent encryption or deletion of objects for the duration of a user-defined retention period. Immutable S3 objects are protected by configuring WORM and retention attributes at the object or bucket level.

DataOps prepper Delphix has appointed Robert Stevenson as VP of its Japan operations, reporting to Steven Chung, President, Worldwide Field Operations at Delphix. Stevenson has served various leadership positions within the Japanese market at BEA, EMC-Dell, Lenovo, and Avaya, and spearheaded the growth of startups such as Documentum, Tanium, and Sumo Logic. Delphix says it’s only scratching the surface in Japan and has a lot of potential there.

GoodData has announced new dashboard plugins to further customize the data visualization experience. They enable widespread data accessibility by providing more tools to integrate data/UI widgets into the low-code GoodData.UI and help produce a fully composable UI. The plugins allow customers to decide on project-specific features to add to each dashboard and search the GoodData library to select the correct plugin to customize with added text, images, charts, videos, or anything else they can think of.

MariaDB has released Xpand 6 – its fully distributed SQL database with a shared-nothing architecture and fast operational analytics. Xpand uses columnar indexes for real-time operational analytics directly on transactional data without losing consistency or missing the latest transactions. A new columnar feature with a cost-based optimizer enables companies to run ad hoc queries in Xpand with speeds up to 50x faster. Xpand 6 also has parallel replication for asynchronous replication of data across locations.

Object storage software supplier MinIO has announced two new advisors: Sanjay Poonen, former VMware COO and SAP president, and Tony Werner, former president of Comcast. Both will provide counsel on growth and go-to-market strategy. MinIO has promoted Kris Inapurapu from VP of corporate and business development to chief business officer. This role encompasses all things commercial from customer engagement to renewal and growth, as well as strategic partnerships.

Nvidia has unveiled Spectrum-4 – the next generation of its Ethernet platform and the first 400Gbit/sec end-to-end networking platform, providing 4x higher switching throughput than previous generations at 51.2Tbit/sec. It consists of the Spectrum-4 switch family, ConnectX-7 SmartNIC, BlueField-3 DPU, and the DOCA datacenter infrastructure software to help run cloud-native applications at scale.

Quantum has announced a Unified Surveillance Platform (USP) for recording and storing video surveillance data, and introduced a line of Smart Network Video Recording Servers (NVRs) which combine the Quantum Unified Surveillance Platform software with a purpose-built NVR server to create an integrated appliance. USP software runs on any standard server, simplifies video recording infrastructure, and lowers total cost of ownership by consolidating the server’s compute, storage, and networking resources into a single scalable system that hosts video management system (VMS) and other common security applications.

The Smart NVR can run multiple physical security applications on a single server, Quantum says, unlike other NVRs, to reduce costs and complexity for security integrators and their customers. The USP is based on software technology that Quantum acquired last year from EnCloudEn.

RedHat‘s OpenShift Kubernetes automation software is being ported to Nvidia’s BlueField DPUs. Nvidia says OpenShift and BlueField provide a consistent, cloud-native application platform to manage hybrid cloud, multi-cloud, and edge deployments with enhanced orchestration, automation, and security. The DPU offloads container management and networking, delivering more CPU power to run tenant workloads. Sign up here to learn more.

To build a machine learning application, you need to transform raw data into features. Tecton’s software does this. It has announced a partnership with public cloud data warehouser Snowflake to help put ML applications into operations. The two have collaborated to integrate Tecton and open source Feast with Snowflake’s Data Cloud. The combo provide a simple and fast path to building production-grade features to support a broad range of operational ML use cases including fraud detection, product recommendations, and real-time pricing, the companies say.

Swissbit has launched the M.2 SATA SSD X-78m2 with endurance ratings of up to 80 DWPD (Drive Writes Per Day). It uses the latest generation 3D TLC NAND, configured in pSLC mode, and with a SATA 6Gbit/sec connector. This SSD achieves data rates of up to 560MB/sec for sequential reads and 490MB/sec for sequential writes, and exceeds 73,000 and 86,000 IOPS for reads and writes respectively. 

The drive is meant for especially write-intensive applications in industrial PCs, point-of-sale systems, embedded and surveillance systems, and applications in the transportation, medical, networking, and communications sectors. It uses the M.2 2242 form factor and is immediately available with storage capacities ranging from 40 to 320GB. The X-78m2 is designed and specified for industrial use at operating temperatures ranging from -40 to 85°C, including stability against the “cross temperature effect.” The product line will also be available for a commercial temperature range of 0 to 70°C. 

An Amazon Kendra connector for FSx for Windows File Server enables secure and intelligent search of information scattered in unstructured content. The data is securely stored on file systems on FSx Windows File Server with ACLs and shared with users based on their Microsoft AD domain credentials. Users can now use the Amazon Kendra connector to index documents (HTML, PDF, MS Word, MS PowerPoint, and plain text) stored in their Windows file system on FSx for Windows File Server and search for information across this content using intelligent search in Amazon Kendra. Read more in this blog.

The Argus journal reports that Western Digital is partnering Kioxia in the latter’s Fab2 3D NAND wafer building fab construction project, thus extending the pair’s joint venture. The new fab will cost ¥1 trillion ($8.3B). Apparently Kioxia is contemplating an IPO again after major shareholder Toshiba requested one.

Storage news ticker – March 23

Storage
Ticker tape women in Waldorf Astoria

Storage powerhouse DDN is working on a reference architecture with Nvidia to have its A3I A1400X2 EXAScaler array, announced last November, integrated with Nvidia’s DGX A100 GPU server.

HPE has announced availability of HPE GreenLake for Microsoft Azure Stack HCI as an integrated system, pre-built and configured for faster deployment and easier integration. This is a cloud service that allows users to run Windows and Linux virtual machines in a hybrid cloud environment, on-premises and at the edge, and leverages Microsoft Azure tools and services with a pre-configured, pay-per-use service from HPE GreenLake.

Worldline, a European-based global payments provider, has selected the HPE GreenLake edge-to-cloud platform to meet the accelerated growth of online transactions. Through HPE GreenLake’s flexible as-a-service model and HPE Financial Services’ asset renewal program, funding approximately 25 per cent of the platform refresh, Worldline achieved this upgrade with no upfront investment.

HPE today announced that BMW Group, the manufacturer of automobiles and motorcycles, has selected the HPE GreenLake edge-to-cloud platform to streamline and unify data management across its global locations and the cloud. As part of the agreement, HPE will provide cloud services for big data, backup, recovery, and compliant archiving, enabling BMW Group to manage distributed data via a single platform with a globally consistent cloud experience.

GigaIO has started production on the CDI testbed for the Lonestar6 system at the University of Texas in Austin. Lonestar6 is a 600-node system utilizing AMD Milan-based servers from Dell Technologies and A100 GPUs from Nvidia, and is the first platform at the Texas Advanced Computing Center to incorporate GigaIO’s Composable Disaggregated Infrastructure (CDI) to benefit from decentralized server infrastructure. CDI pools compute and hardware accelerators over a software-defined, PCIe-based memory fabric, providing access to more processing power and storage when needed, and allowing for easy sharing of those resources, thereby increasing their utilization. 

ionir has hired two new executives: Tad Lebeck as SVP of product development, and Barak Azulay as VP of engineering. Prior to ionir, Lebeck founded and led Nuvoloso and served as chief technology officer at Huawei, Symantec Technologies and Legato Systems. 

Ceph storage software and hardware builder SoftIron has set up a wholly owned subsidiary and established an office in Singapore’s central business district. It  will enable SoftIron to provide support for its regional customers and serve as an anchor point as the company explores plans to launch a dedicated supply chain tower facility to support its globally distributed manufacturing operations. SoftIron currently operates across the UK, US, Germany, the Czech Republic, Australia, and New Zealand. 

WANdisco has announced a deal worth $1.5 million with a top ten global telco to use the company’s LiveData Migrator and LiveData Migrator for Azure to automate the migration of Hadoop data to AWS and Azure. This contract is structured as a Commit-to-Consume transaction. Currently the telco collects data from smart meters and loads it into an on-premises Hadoop cluster. WANdisco tech will be used to migrate 8PB of data from Hadoop to the cloud with the potential to extend this use case across architectures and geographies.

NetApp is increasing targets thanks to CloudOps biz

NetApp
NetApp CloudJumper

The Cloud Operations (CloudOps) business looks so good for NetApp that the company has increased its overall revenue targets.

This was revealed at a NetApp Investor Day conference on March 22. NetApp thinks its public cloud annual recurring revenue (ARR) will be $2 billion by its fiscal year 2026, up from $1 billion in 2025. NetApp should now reach its $1 billion public cloud target by 2024 – a year sooner than predicted.

Analysts Aaron Rakers and Jason Ader told their subscribers that NetApp’s total addressable market (TAM) has increased, up to $96 billion by 2025, and is growing at 8 per cent compound annual growth rate (CAGR). Back in September, NetApp calculated that it was looking at a $90 billion TAM by 2023, with a rough 7 per cent CAGR.

It has set itself new revenue targets as a result. Ader said NetApp’s hybrid cloud business TAM has a 4–6 per cent revenue CAGR with fiscal 2025 revenue targets of $6.6 to $7 billion. Public cloud revenues should be $1.3 billion to $1.4 billion with an expected 50 per cent public cloud revenue CAGR.

Add the two together and NetApp expects between $7.9 and $8.4 billion in revenue by fiscal 2025.

NetApp projected revenue

NetApps steady growth in selling to a largely installed base has been boosted by its Cloud Operations (CloudOps) business.

NetApp is the only systems and storage company with a CloudOps portfolio geared toward helping customers move applications to the public cloud in an automated, manageable, and application-optimizing way. Public cloud usage can be incredibly complicated. There are, for example, 475 compute instance types in AWS, and customers have to cope with excessively complicated prices, both compute and storage instances, and Lambda functions. 

This is a tall order to manage manually and there is a shortage of people skilled enough to do it. NetApp’s CloudOps tools are intended to help automate the process. The company is seeing a market in which virtually all businesses and organizations are moving some applications to the public cloud. It sees companies in general moving to become AI-driven, digital organizations, spanning hybrid and multi-cloud environments. In other words, its CloudOps portfolio has a horizontal, cross-market appeal.

NetApp said CloudOps creates an opportunity to find new buyers and expand NetApp’s overall customer base, enabling land-and-expand sales strategies, as well as cross-selling opportunities.

Its CloudOps portfolio has not been replicated by Dell, HPE, IBM, Pure or any other suppliers, and gives NetApp an opportunity to gain new customers and then cross-sell its other products and services to them.

Ader summarised NetApp’s CloudOps portfolio:

  • Cloud Insights: Monitoring and observability, noted to be deployed at ~750 top 1,000 companies;
  • Spot: Application cost optimization, public cloud integration. NetApp highlighted this as a key cornerstone of NetApp’s CloudOps portfolio;
  • StratCloud: Reservation optimization;
  • Cloud Jumper: Virtual desktop optimization;
  • Cloudhawk.io: Cloud configuration security management;
  • Data Mechanics: Apache Spark optimization;
  • CloudCheckr: Cloud cost and security management, now part of Spot to enable greater visibility. The company highlighted CloudCheckr alignment with MSPs and systems integrators (Accenture, Tech Data, Deloitte, Navisite, WWT, Eplexity, Ingram, and more);
  • Fylamynt: Most recent acquisition focused on automation and integration.

NetApp has also moved its core storage services to the main public clouds under its Data Fabric rubric, claiming it is the only storage vendor with fully managed, first-party services on AWS, Azure, and Google Cloud, with an integration in IBM Cloud as well. It can also sell its CloudOps services to its existing customers when they move applications to the cloud. Only 12 per cent of its installed base use its public cloud services currently, giving it a lot of growth headroom.

Inside NetApp’s base, all-flash array (AFA) penetration is 35 per cent and this should grow to somewhere between 40 and 45 per cent by fiscal 2025.

Rakers provided a chart comparing NetApp’s AFA revenues with Pure Storage’s:

It shows Pure Storage’s (flash-based) revenues catching up with NetApp’s AFA revenues, with Pure bringing in 88.4 per cent of NetApp’s total in the latest quarter. NetApp’s revenue in its latest quarter was $1.6 billion compared to Pure’s $709 million. It will be interesting to see if NetApp can widen the gap with Pure using its public cloud and CloudOps portfolio.

Kioxia announces PCIe 5 SSD and new fab plan

Kioxia CD8
Kioxia CD8

Kioxia has announced a faster datacenter SSD using PCIe 5 and revealed plans for a new fab so it can up drive production capacity.

It is sampling the CD8 – an updated and faster version of the CD6 using denser 112-layer NAND and replacing the PCIe 4 interconnect. PCIe 5 operates at 4GB/sec per lane, double PCIe 4’s 2GB/sec per lane, which, again, is double PCIe 3’s 1GB/sec per lane.

Greg Wong, founder and principal analyst at Forward Insights, commented in a statement for Kioxia: “Next-generation PCIe 5.0 SSDs provide twice the level of performance, and will continue to propel the PCIe/NVMe SSD market – which is expected to grow at a CAGR of over 20 per cent out to 2026.”

The CD series single-port datacenter drives use Kioxia flash firmware and controllers. The CD6 was announced in February 2020 and its PCIe 4 interface provided a big jump in speed for the prior PCIe 3 CD5 product. The drives come in read-intensive and mixed-use versions, with similar performance but different capacities and endurance. The read-intensive capacity points are 960GB and 1.92, 3.84, 7.68, and 15.56TB with a 1 drive write per day (DWPD) endurance rating. The mixed-used drives have a 3 DWPD rating and lower capacities due to over-provisioning; 800GB and 1.6, 3.2, 6.4, and 12.8TB.

A potted history of the CD series looks like this:

  • CD5: PCIe 3 – 64-layer TLC, 2.5-inch format
  • CD6: PCIe 4 – 96-layer TLC, 2.5-inch
  • CD7: PCIe 5 – 96-layer TLC, ES3 ruler
  • CD8: PCIe 5 – 112-layer TLC, 2.5-inch

The CD8 is a successor to the CD6, not the CD7, as the CD7 is a CD6-type (96-layer) drive in the ES3 ruler format. A table shows how the maximum IOPS and bandwidth have risen with each generation: 

Kioxia CD series

The CD8 has 25 per cent more maximum random read IOPS than the CD6, more than twice as many maximum random write IOPS in the read-intensive variant but 20 per cent less in the mixed-use model. That suggests Kioxia has found it harder to get the 112-layer NAND to do random writes than working with the 96-layer flash. The sequential performance improvement is more consistent. Overall Kioxia says there is an up to 135 per cent performance increase over the CD6.

The CD8 is also generally faster than the CD7, apart from random write IOPS. Both use the PCIe 5 interface and the CD8’s controller does a better job in terms of speed than the CD7.

The CD8 is optimized for hyperscale datacenter and enterprise server-attached workloads such as high-performance computing, artificial intelligence, caching layer, content streaming, and financial trading and analysis. The drive supports the Open Compute Project (OCP) Datacenter NVMe SSD 2.0 standard.

The CD8 series is now available for customer evaluation.

Samsung also has an enterprise PCIe 5 SSD – the PM1743 – and this is a much faster dual-port device, with up to 2.5 million random read IOPS and 13GB/sec sequential read speed.

Fab expansion

The NAND market faces steady growth as demand for chips and SSDs increases in enterprises, hyperscaler customers, and the automotive market. In light of that, Kioxia is increasing its 3D NAND fab capacity at its Kitakami Plant in Iwate Prefecture, Japan. Construction of a second fab there will start next month and complete some time in 2023. Kioxia will fund the Fab 2 capital investment from its operating cash flow.

Fab 2 will have an earthquake-resistant design and use energy-saving manufacturing equipment and renewable energy sources, Kioxia said.

Kioxia said it will use AI-based manufacturing to increase the production capacity of the entire Kitakami Plant, and it also plans to hold discussions with Western Digital about expanding their flash joint venture to include this Fab 2 investment.

Apache Kafka

Apache Kafka – open source distributed event streaming software built for large-scale, real-time data feeds used in processing, messaging, and analytics. Data in Kafka is organized into topics. These are categories or channels, such as website clicks or sales orders. A source “producer” writes events to topics, and target “consumers” read from them.

Apache Iceberg

Apache Iceberg – an open source table format for large-scale datasets in data lakes, sitting above storage systems like Parquet, ORC, and Avro, and cloud object stores such as AWS S3, Azure Blob, and the Google Cloud Store. It brings database-like features to data lakes, such as ACID support, partitioning, time travel, and schema evolution. Iceberg organizes data into three layers:

  • Data Files: The actual data, stored in formats like Parquet or ORC
  • Metadata Files: Track which data files belong to a table, their schema, and partition info
  • Snapshot Metadata: Logs every change (commit) as a snapshot, enabling time travel and rollbacks

Iceberg format tables, are used in big data and enable SQL querying. Query engines such as Spark, Trino, Flink, Presto, Hive, Impala, StarRocks, and others can work on the tables simultaneously. The tables are managed by metadata tracking and snapshotting changes.

Apache Spark

Apache Spark – an open-source, distributed computing framework intended for fast, large-scale data processing and analytics. It’s used for big data workloads involving batch processing, real-time streaming, machine learning, and SQL queries.

Nyriad releases SAN arrays with CPU+GPU controllers

Nyriad
Nyriad

Newcomer Nyriad has launched two SAN array systems with combined CPU+GPU controllers to provide fast recovery from drive failures and a maximum sustained IO rate of 18GB/sec for writes and reads.

The company uses 18TB disk drives with erasure coding and the IO workload is spread across all the drives to achieve the high IO rate. Nyriad says the arrays are good for data-intensive use cases such as HPC, media and entertainment, and backup and archive. Users will get consistent high performance without the long rebuild times and associated performance degradation that are common with today’s RAID-based storage systems, it says.

Nyriad CEO Derek Dicker said: “RAID has been the de facto standard for storage for more than three decades, but the demands of today’s data-intensive applications now far exceed its capabilities. With the UltraIO system, Nyriad built a new foundation for storage that addresses current problems and creates new opportunities for the future – without requiring organizations to make disruptive changes to their storage infrastructure.”

There are two H series systems, both with dual active-passive controllers, the H1000 and H2000, with 99.999 per cent data availability. They both use dual Intel Xeon CPUs in each of their two controllers (head nodes) and Nyriad plans to move to a single-processor design in both head nodes. On the GPU front they use dual Nvidia RTX A6000 GPUs in each of their two head nodes, again with a plan to move to a single GPU in each node.

The dual controllers feature dual HBAs, which are attached to drive enclosures via SAS4 cabling. The H1000 supports up to 102 disk drives and the H2000 supports up to 204 disk drives. The detailed specs are below:

Nyriad specs

The H Series are classified as block storage systems and customers can non-disruptively connect UltraIO as a POSIX Compliant Block Target into their existing infrastructure. File and object protocol support will be added in the future – the table above mentions SMB support with H1000-EXT and H2000-EXT models.

Customers can specify a number of parity drives, up to 20. An example was given of an H2000 system with 200 18TB drives configured at 20 parity. It can continuously write at 18GB/sec and also continuously read at 18GB/sec. If 20 of the 200 drives were to fail concurrently, it would continue to operate with just 5 per cent performance degradation.

When a drive fails, the system’s performance doesn’t decline noticeably and the failed drive doesn’t need urgent replacing, according to the company. Nyriad says the UltraIO OS utilizes all the drives for all the work all the time with intelligent data block placement mechanisms and use of a large persistent cache. It can utilize the entire drive array to provide write operations concurrently, placing data to optimize both writing and subsequent concurrent reading performance. It also does write deduplication when high-frequency block updates take place.

In general, though, the UltraIO systems do not do data reduction. A Nyriad spokesperson told us: “Our initial target market segments are HPC, media and entertainment, and backup, restore, and archive. These segments seem to value absolute performance first. In addition, their data tends not to benefit from data reduction techniques. If we discover that there is a demand for these techniques, the computational power we have will allow us to address it.” 

The OS stores an erasure coding integrity hash with every block written. It then integrity checks each block as it is read and ensures it is correct. If not, it recreates it. We think that GPU processing power is used to do the hashing and integrity checks, which would overburden a dual-x86 controller design.

H Series systems can scale up within configuration limits and then scale out by adding more H Series arrays. Nyriad said the UltraIO system is designed to support multiple media types, including SSDs. It will make decisions about which media types it will support, and when, based on market demands.

There is an administration GUI, a RESTFul API, and a Secure Shell for operating storage and other services securely over a network.

Check out the H Series datasheet here. The system is sold exclusively through Nyriad’s channel and you can contact Nyriad to find out more.

StorONE

Nyriad competitor StorONE does fast RAID rebuilds. Its vRAID protection feature uses erasure coding and has data and parity metadata striped across drives. Its S1 software uses multiple disks for reading and writing data in parallel, and writes sequentially to the target drives. In a 48-drive system, if a drive fails, vRAID reads data from the surviving 47 drives simultaneously, calculating parity and then writing simultaneously to those remaining 47 drives. A failed 14TB disk was rebuilt in 1 hour and 45 minutes, and failed SSDs are rebuilt in single-digit minutes.