Home Blog Page 148

HPE Alletra storage sales a bright spot

HPE storage contributed to strong first fiscal 2023 quarter results as all business units grew and collectively drove revenues, leading the company to raise its outlook.

Revenues in the quarter ended January 31 were $7.8 billion, a 12 percent increase, beating guidance, with a $501 million profit, slightly less than last year’s $513 million. Compute revenues provided the bulk at $3.5 billion, up 14 percent, with storage bringing in $1.2 billion, 5 percent higher, and the Intelligent Edge (Aruba networking) contributing $1.13 billion, up 25 percent. HPC and AI provided $1.1 billion, a 34 percent rise, helped by a large contract associated with Frontier, the world’s first exascale system.

Antonio Neri, HPE
Antonio Neri

President and CEO Antonio Neri said: “HPE delivered exceptional results in Q1, posting our highest first quarter revenue since 2016 and best-ever non-GAAP operating profit margin. Powered by our market-leading hybrid cloud platform HPE GreenLake, we unlocked an impressive run rate of $1 billion in annualized revenue for the first time. These results, combined with a winning strategy and proven execution, position us well for FY23, and give us confidence to raise our financial outlook for the full year.”  

HPE revenues

EVP and CFO Tarek Robbiati said: “In Q1 we continued to out-execute our competition, despite uneven market demand, and produced more revenues in every one of our key segments, with our Edge business Aruba being a standout.” 

HPE storage revenue

Financial summary

  • Annual Recurring Revenue: >$1 billion and up 31 percent year/year
  • Diluted EPS: $0.38 down $0.01 from last year
  • Operating cash flow: -$800 million
  • Free cash flow: -$1.3 billion reflecting seasonal working capital needs

Robbiati said in the earnings call: “We expect to generate significant free cash flow in the remainder of fiscal year ’23 and reiterate our guidance of $1.9 billion to $2.1 billion in free cash flow for the full year.”

HPE’s Alletra storage array revenue growth increased well above triple digits from the prior-year, helped by stabilizing supply. That means at least 100 percent; doubled revenue at a minimum. HPE is altering its storage business to focus on higher margin, software-intensive as-a-service revenue. It is continuing to invest in R&D for its own IP. 

Neri teased earnings call listeners by saying: “Alletra … is the fastest [growing] product in the history of the company. It has grown triple digits on a consistent basis and you will see more announcements about this platform going forward …  It was conceived to be a SaaS-led offer. And that’s why it’s fueling also the recurring revenue as we go forward.

“Our order book at the start of Q1 was larger than it was a year ago. And as we exit that quarter, it is more than twice the size of normalized historical levels.”

The macroeconomic background is having an effect: “Demand for our solutions continue though it is uneven across our portfolio. We also see more elongated sales cycles, specifically in Compute that we have seen in recent quarters.”

Robbiati said: “Deal velocity for Compute has slowed as customers digest the investments of the past two years though demand for our Storage and HPC & AI solutions is holding and demand for our Edge solutions remains healthy.

AI and megatrends

The AI area with Large Language Model (LLM) technology is seen by HPE as an inflexion point it can use to its advantage. Neri said HPE is ”assessing what is the type of business model we can deploy as a part of our as-a-service model by offering what I call a cloud supercomputing IS layer with a platform as-a-service that ultimately developers can develop, train, and deploy these large models at scale.

“We will talk more about that in the subsequent quarters, but we are very well-positioned and we have a very large pipeline of customers.”

An analyst asked why HPE was doing well, mentioning “the contrast in your outlook on storage and compute versus some of your peers.”

Neri said this was due to HPE having a diversified portfolio unlike competitors who “don’t have the breadth and depth of our portfolio. Some of them are just playing compute and storage. Some of them play just in storage. Some of them only play in the networking … We have a unique portfolio, which is incredibly relevant in the megatrends we see in the market.”

The megatrends are around edge, cloud, and AI. They are reshaping the IT industry and HPE offers its portfolio through the GreenLake subscription business: “GreenLake is a winning strategy for us because it’s very hard to do.”

Also, unlike Dell, HPE now has no exposure to the downturned PC market.

The company’s director declared a regular cash dividend of $0.12 per share on the company’s common stock, payable on April 14.

The outlook for the next quarter is for revenue to be in the range of $7.1 billion to $7.5 billion. At the $7.3 billion midpoint that is an 8.7 percent increase on the year-ago Q2. Robbiati commented: “While many tech companies are playing defense with layoffs, we see fiscal year ’23 as an opportunity to accelerate the execution of our strategy.”

Dell storage closes FY 2023 with record revenue

The fourth quarter is always a belter for Dell and its latest results show a record $5 billion in storage revenues.

Total revenues in the quarter ended February 3 were $25.04 billion, 11 per cent down annually due to lower PC sales, but beating Wall Street estimates and Dell’s own guidance. There was a profit of $606 million, an increase on the year-ago $1 million, which was far lower than usual due to VMware dividend and debt repayment issues. Full fiscal 2023 revenues were $102.3 billion, up just 1 percent, with a profit of $2.422 billion, down 56 percent.

Storage is in Dell’s ISG business unit, which recorded Q4 revenues 7 percent higher than last year at $9.9 billion, its eighth consecutive growth quarter. The PC-dominated CSG unit reported in with $13.4 billion revenues, a 26 percent drop on a year ago, as the PC market slowed in June and then fell precipitously in the fourth quarter. Storage revenues, which rose 10 percent annually, pipped the $4.9 billion of server revenues in ISG, 5 percent up year-on-year.

Dell revenues

Co-COO Chuck Whitten said in prepared remarks: “We are pleased with our FY23 execution and financial results given the macroeconomic backdrop. FY23 was ultimately a tale of two halves with 12 percent growth in the first half and revenue down 9 percent in the second half as the demand environment weakened over the course of the year.” 

He added: “We delivered record FY23 revenue of $102.3B, up 1 percent on the back of 17 percent growth in FY22 … ISG in particular had a strong year with record revenue of $38.4B, including record revenue in both servers and networking and storage, and record operating income of over $5 billion … We expect to gain over a point of share in mainstream server and storage revenue when the IDC calendar results come out later this month. ”

Dell storage revenues
Storage revenues in Dell’s Q4 are consistently higher than in the other quarters

Q4 financial summary

  • Gross margin: 23 percent
  • Operating cash flow: $2.7 billion
  • Diluted EPS: $1.80
  • Remaining performance obligations: $40 billion
  • Recurring revenue: c$5.6 billion, up 12 percent year-over-year
  • Cash and investments: $10.2 billion

The company more than doubled the number of active APEX subscribers over the course of the year. 

Dell’s storage results, together with those of Pure Storage, up 14 percent, contrast markedly with NetApp, which has just reported a 5 percent decline in revenues.

Whitten said that in storage, Dell has “gained four points of share in the key midrange portion of the market over the last five years … [There was] demand growth in PowerFlex, VxRail, Data Protection and PowerStore. We are pleased with our momentum in storage – the investments we’ve made over the years strengthening our portfolio are paying off and have allowed us to drive growth and share gain in what was a resilient storage market in 2022.”  

The ISG performance looks impressive considering that Dell reacted to the PC market downturn in Q4 with cost controls, an external hiring pause, travel restrictions, lower external spend and layoffs.

Outlook

Whitten said: “Though Q4 was a very good storage demand quarter, we saw lengthening sales cycles and more cautious storage spending with strength in very large customers offset by declines in medium and small business. Given that backdrop, we expect at least the early part of fy24 to remain challenging. In other words PC sales aren’t going to rebound soon and business is cautious about buying IT gear.”

CFO Tom Sweet added: “We expect Q1 revenue to be seasonally lower than average, down sequentially between 17 percent and 21 percent, 19 percent at the mid-point.” That would be $20.3 billion, a 27.5 percent drop on fiscal 2022’s Q1. He expects growth from that low point throughout the rest of fy24.

The full fy24 revenue amount is being guided down between 12 and 18 percent, 15 percent at the midpoint, meaning $86.96 billion. 

Reuters reports this tepid outlook sent Dell shares down 3 percent in trading after the results statement was issued.

CFO Tom Sweet is retiring at the end of Q2 2024. Dell says Yvonne McGill, currently corporate controller, will be its new CFO effective the start of Q3 fiscal 2024.

Kubernetes storage provider Ondat acquired by Akamai

Content delivery player Akamai has agreed to buy Kubernetes storage startup Ondat.

Akamai Technologies has a massively distributed Connected Cloud offering for cloud computing, security, and content delivery. It provides services for developers to build, run, and secure high performance workloads close to where its business users connect online. Akamai is adding core and distributed sites on top of the underlying backbone that powers its existing edge network. This spans more than 4,100 locations across 134 countries. The network megalith aims to place compute, storage, database, and other services close to large population, industry, and IT centers.

Adam Karon, Akamai’s COO and cloud technology group GM, said: “Storage is a key component of cloud computing and Ondat’s technology will enhance Akamai’s storage capabilities, allowing us to offer a fundamentally different approach to cloud that integrates core and distributed computing sites with a massively scaled edge network.”

Ondat’s employees, including founder and CTO Alex Chircop, will join Akamai’s cloud computing business. No acquisition price information was provided.

Ondat recently partnered with CloudCasa, which provides containerized application backup. Sathya Sankaran, founder and GM of CloudCasa, said: “The acquisition of Ondat by Akamai is another indication that Kubernetes is entering the mainstream for enterprises deploying stateful business applications on Kubernetes environments in public clouds.”

In his view: “Ondat fills the distributed storage management gap in the Linode Kubernetes Environment (LKE) for the Akamai Connected Cloud. The best-of-breed CloudCasa and Ondat offering provides Akamai customers with a unified solution to run their stateful applications on Kubernetes without worrying about availability, performance, protection, or data management and recovery. Akamai Connected Cloud customers will now be able to migrate their Kubernetes applications and data from on-premises environments and alternative public clouds to LKE.”

Alex Chircop.

Ondat was founded as StorageOS in 2015 by original CEO Chris Brandon in New York, along with CTO and one-time CEO Alex Chircop and VP Engineering Simon Croome in the UK. It raised $2 million seed funding and then went through an $8 million A-round in 2018 and a $10 million B-round in 2021 – $20 million in total. Brandon resigned to join Amazon in 2019. 

The Ondat name was adopted in October 2021. Croome left in November 2021 and later  joined the Microsoft Azure storage team. Richard Olver, who joined Ondat as COO in May 2021, became the CEO in September last year, after Ondat went through a layoff exercise in July, under tough business conditions.

Robin.io, another Kubernetes storage startup, was acquired by Japan’s Rakuten Symphony in March 2022. Pure Storage acquired Portworx, a third Kubernetes storage startup, in September 2020.

IBM reboots storage portfolio

IBM reckons its customers face three pressing data challenges and is realigning its storage product portfolio to measure up to them.

Update. SVC and Storage Virtualize note added at end of story. 5 March 2023.

The three challengers are adoption of AI/ML and HPC workloads; hybrid cloud app transformation and data movement from edge-to-core-to-cloud; and ensuring data resiliency in the face of malware. The Spectrum brand prefix is being dropped as part of this following a IBM Storage process involving market research, talking to industry heads, and working with customers.

Denis Kennelly, IBM
Denis Kennelly

Ahead of publication, IBM gave us sight of a blog written by Denis Kennelly, GM of IBM’s storage division, in which he talks about the effect of this exercise: ”The result was to simplify our offerings by reducing the number of products we promote to a more manageable number with a bias on solution-led conversations that is software-defined, open and consumable as a service. Also, it was important to make it plainly obvious that IBM is very much in the storage business – leading to a decision to drop ‘Spectrum’”’ in favor of what we do incredibly well: ‘Storage’.”

His blog includes a diagram depicting this realignment: 

The AI/ML/HPC (Data and AI) market will be presented with IBM Storage Scale (Spectrum Scale as was) and also Storage Ceph, the software that came with IBM’s Red Hat acquisition. Kennelly said: “We are investing in industry-leading open source based Ceph as the foundation for our software defined storage platform.“ Ceph provides block, file and object access protocol support and will be used for general-purpose workloads in this market.

The storage hardware product associated with these two software products is the Storage Scale System, previously called the Elastic Storage System or ESS line.

IBM storage rejig

The hybrid cloud market will have data-orchestrating Storage Fusion software focused on Red Hat OpenShift environments. This enables customers to discover data across the organization, and store, protect, manage, govern, and mobilize it. The software was previously called Spectrum Fusion and combined Spectrum Scale functionality with Spectrum Protect Plus data protection software.

The Storage Fusion HCI System is purpose-built hyperconverged infrastructure for OpenShift with container-native data orchestration and storage services.

Storage Defender

Storage for data resilience relies on new Storage Defender software; a combination of IBM Storage Protect (Spectrum Protect as was), FlashSystem, Storage Fusion and Cohesity’s DataProtect product. This will run with IBM Storage’s DS8000 arrays, tape and networking products.

Storage Defender is designed to use AI and event monitoring across multiple storage platforms through a single pane of glass to help protect organizations’ data layers from risks like ransomware, human error, and sabotage. It has SaaS-based cyber vault and clean room features with automated recovery functions to help companies restore their most recent clean copy of data in, so IBM tells us, in hours or minutes from what used to take days.

The FlashSystem product contributes its Safeguarded Copy for logical air gap facilities.

IBM says Storage Defender is its first offering to bring together multiple IBM and third-party products unifying primary, secondary replication, and backup management. Cohesity provides, IBM says, world-class virtual machine protection managed in the hybrid cloud though a cloud-based control plane supporting a multi-vendor strategy optimized for data recovery.

Storage Defender will, according to IBM’s sales pitch, allow companies to take advantage of their existing IBM investments while significantly simplifying operations and reducing operating expenses. IBM says this is the first of such ecosystem integrations.

Kennelly writes “When we looked at the overall market, we were impressed by the Cohesity platform and team’s differentiated focus on scalability, simplicity, and security. By integrating our leading software-defined technologies, I am excited to bring essential cyber resiliency capabilities to IBM clients. Cyberattacks are on the rise, but data can be protected and restored when you are prepared.”

IBM plans to make the Storage Defender offering available in the second quarter of calendar 2023 beginning with Storage Protect and Cohesity DataProtect. Storage Defender will be sold and supported by IBM as well as through authorized IBM Business Partners.

Comment

With this resell deal, Cohesity now has the advantage of IBM’s sales force and channel working in its favor.

The IBM Storage rebranding and realignment exercise apparently means some Spectrum-branded products have been left behind. Spectrum Connect, Spectrum Discover, Spectrum Virtualize (the old SAN Volume Controller or SVC) and Spectrum Virtualize for Public Cloud are examples. But not so. IBM’r Barry Whyte, Principal Storage Technical Specialist and IBM Master Inventor, tells us: “SVC stays SVC, and Virtualize becomes Storage Virtualize but with more of a focus on the FlashSystem branding itself. SVC turns 20 this year and is still going strong at some major accounts.”

Pure Storage growth fails to impress Wall Street

Pure Storage grew Q4 revenues by 14 percent year-on-year, reported its biggest ever profit, and overtook NetApp for all-flash array run rates. Yet Wall Street analysts expected higher revenues and a stronger outlook, so the stock price sank more than 10 percent in trading after the results were posted.

Revenues in the quarter ended February 5 were $810.2 million and net profit was $74.5 million, up from $15 million a year ago. Q4 subscription revenues of $1.1 billion were 30 percent higher.

Full year revenues totaled $2.75 billion, a 26 percent jump, with a profit of $73 million, Pure’s first ever annual profit recorded.

CEO Charlie Giancarlo said in his prepared remarks: “We were pleased with our Q4 year-over-year revenue growth of 14 percent and were very pleased with our annual revenue growth of 26 percent, and annual subscription ARR growth of 30 percent – especially considering the challenges of the steadily increasing global economic slowdown.”  

Pure Storage revenues
Note the great profits jump in Pure’s latest quarter

Pure’s revenues grew faster than the market and it also outpaced NetApp in all-flash array revenue run rate, with a 14 percent year-on-year increase to $3.24 billion versus NetApp’s $2.8 billion, down 12 percent annually:

NetApp vs Pure all-flash run rates

The Pure AFA run rate number includes service revenues, as does NetApp’s. Strict product revenues were $545 million, a $2.2 billion annual run rate.

Pure gained more than 490 new customers in the quarter, taking its total past 11,000. International market revenues grew 39 percent to $258 million while the US could only manage 6 percent growth to $552 million, reflecting customer caution. There was no new business from Meta, which is about halfway through a 1EB Pure system deployment.

Q4 financial summary

  • Gross margin: 69.3 percent
  • Operating cash flow: $233 million; up 85 percent
  • Free cash flow: $172.8 million
  • Total cash, equivalents and marketable securities: $1.6 billion
  • Headcount: 5,100 

The general economic situation is affecting Pure and Giancarlo said: “We are … well aware of the challenges of the current economic environment and the strains that it places on our customers.” 

Pure – like others – is seeing longer sales cycles and customers, particularly enterprise customers, being cautious about large purchases. These, described as near-term headwinds, have affected its growth outlook for the year, which also served to disappoint Wall Street.

To counter this, Giancarlo said: “We have already taken actions to reduce spending across the company and have reduced our spending and budgetary growth plans for FY 24 until we see improvements in the environment.” 

Pure is emphasizing electricity and operational savings in its sales messages and suggesting customers could save cost by moving from disk or hybrid arrays to Pure’s flash gear. Giancarlo referred to this: “We’re … changing the way our sales teams go about working with the customer on evaluating our products [with] a much greater focus on near-term operational costs as a justification for making the choice to proceed forward with a project versus maybe other projects that they have in their consideration.

“Pure’s Flash-optimized systems generally use between two and five times less power than competitive SSD-based systems, and between five to ten times less power than the hard disk systems we replace. Simple math then shows that replacing that 80 percent of hard disk storage in data centers with Pure’s flash-based storage can reduce total data center power utilization by approximately 20 percent. ” 

Step forward the new FlashBlade//E as Pure’s disk array-killing standard bearer, released this week. Giancarlo said: “While our development of FlashBlade//E was not done in anticipation of a recession, it couldn’t come at a better time. It’s operating costs [are] well below the operating cost of a hard disk environment that it will be replacing …  FlashBlade//E, I think, is going to be a barn burner.”

Outlook

CFO Kevan Krysler said: “We expect that Q1 revenue this year will be flat at $560 million when compared to Q1 of last year.“ This bleak outlook surprised analysts and investors who had been expecting $681.4 million, according to Wells Fargo’s Aaron Rakers. 

Giancarlo answered a question about this in the earnings call, saying there “was a slowing down of progression of the pipeline of the staged opportunities, meaning the progression that we had typically seen in earlier quarters of movement from early stage to later stage… That has slowed down since the beginning of the year. And we have to assume that, that will be true for at least a couple of quarters going forward. And so that’s changed the outlook … for the year as we go forward.”

Krysler expects things to improve later in the year and full FY 24 guidance is for mid to high single digit year-on-year percentage growth, with no Meta sales included. Wall Street analysts had expected 13-14 percent growth, hence the 10 percent stock price drop.

Pure claims 300TB flash drives coming 2026

Pure Storage says it will build a 300TB flash drive by 2026. The company manufactures its own SSDs, called Direct Flash Modules (DFMs), which are basically a collection of NAND chips with Pure’s FlashArray and FlashBlade operating system, Purity, providing system-wide flash controller functions. The FlashBlade//S and //E systems use either 24 or 48TB DFMs.

Pure CTO Alex McMullan briefed Blocks & Files and presented a chart showing Pure’s DFM capacity expansion roadmap out to a 300TB module:

Pure flash vs disk density

He said: “The plan for us over the next couple of years is to take our hard drive competitive posture into a whole new space. Today we’re shipping 24 and 48TB drives. You can expect … a number of announcements from us at our Accelerate conference around larger and larger drive sizes with our stated ambition here to have 300TB drive capabilities, by or before 2026.”

This far outstrips other disk drive capacity roadmaps. For example, Toshiba sees its MAS-MAMR and HAMR technology taking it to 40TB in 2026:

Toshiba hard drive roadmap

Seagate has said its HAMR tech should enable a 50TB HAMR drive in its fiscal 2025 and a 100TB drive “at least by 2030.” Pure Storage will have a 5 to 6x capacity per drive advantage by then if its engineers can deliver the DFM goods.

McMullan also showed a chart depicting better and steadily declining TCO vs HDD systems between now and 2026:

Pure annualized TCO

A FlashBlade chassis can hold 10 blades. Each blade can be fitted with up to four DFMs. A 10-blade x 4 x 300TB DFM FlashBlade//E chassis would have 12PB of raw capacity, compared to today’s FlashBlade//E’s 1.92PB. McMullan said: “For customers … this opens up a whole new suite of capabilities. So we admire the persistence of hard drive vendors, but I don’t realistically think that they have a plan or a strategic goal [that matches this].”

McMullan said there could be intermediate drive sizes in between todays’ 48TB and 2026’s 300TB DFMs.

How will Pure get to a 300TB DFM? McMullan again: “All the chip fabs are shipping us somewhere between 112 and 160 layers. All the fab vendors have a plan and a path to get to 400-500 layers over the next five years. And the whole point of that will help us of course, on its own.”

Comment

That’s well and good but five years takes us to 2028, two years after McMullan’s 2026 and 300TB DFMs. That means 3D NAND layer count increases won’t get Pure to a 300TB DFM by 2026 on their own. 

A 300-layer 3D NAND chip, double today’s 150 or so layers, might make a 100TB DFM possible but in our thinking there needs to be some other capacity booster, such as increasing the physical size of the DFMs so they can hold more chips.

You could fit physically larger DFMs on a FlashBlade if the compute and DRAM components were removed, as is the case with FlashBlade//S storage-only EX blades:

Pure FlashBlade//S

A longer DFM with 2.5x more capacity would be 120TB using today’s QLC 150-160 layer chips. Double that to a 300-plus layer chip and we’d be looking at a 240TB DFM, still 60TB short.

PLC (5bits/cell) NAND would add another 20 percent – 288TB, 12TB short – but, at this point, we are in the right area. Our best estimate is that Pure is relying on increased NAND chip count in physically larger DFMs, with chips built from more layers, and using PLC formatting to get to the 300TB level. 

Rajiev Rajavasireddy, Pure’s FlashBlade VP Product Management, said: “PLC is a work in progress. This is why we have R&D on our hardware platforms. Yes, we have more than one way to skin that cat.”

Pure swings FlashBlade//E at unstructured data

Pure Storage says it has a new FlashBlade//E product optimized for capacity, designed to replace mainstream disk-based file and object stores with an acquisition price comparable to disk-based systems, and lower long-term cost of ownership, costing less than 20 cents per GB with three years’ support.

Amy Fowler, Pure Storage
Amy Fowler

The FlashBlade//E has evolved from the existing FlashBlade//S and adds storage-only blades to the existing compute+storage blades, thus expanding capacity and lowering cost. Pure says it uses up to 5x less power than the disk arrays it is meant to replace and has 10-20x more reliability.

Amy Fowler, FlashBlade VP and GM, said: “With FlashBlade//E, we’re realizing our founders’ original vision of the all-flash datacenter. For workloads where flash was once price-prohibitive, we are thrilled to provide customers the major benefits Pure delivers at a TCO lower than disk.”

FlashBlade//E

FlashBlade//E starts with 4PB of capacity, using Pure’s 48TB Direct Flash Modules (DFM) and scales up in 2PB increments. For context, note that FlashBlade//S is for high-performance access to file+object data, with the S500 providing the fastest access and S200 providing slower access. Both can scale out to 10 chassis, meaning a max capacity of 19.2PB (10 chassis x 10 blade x 4 x 48TB DFM.)

The system starts with a head node, a 5RU EC control chassis with 10 compute+storage blades. Each holds a CPU with DDR4 DRAM DIMMs plus two or four 48TB QLC (4bits/cell) DFMs accessed via NVMe across a PCIe gen 4 bus. The Purity//FB OS runs the system. There will be two 1RU external fabric modules (XFM) to network blade chassis together.

Pure Storage FlashBlade//E
FlashBlade//E with EC head node on top and EX storage node below

There can be one or more EX expansion chassis, each with 10 storage blades, again fitted with two or four DFMs. There can be a maximum of 10 chassis in a cluster but this will contain more than 1 head node.

Pure Storage FlashBlade//E storage chassis
FlashBlade//E storage chassis showing blades

Customers tell a sizing tool how much capacity they need and it returns a configuration with an appropriate head-node/expansion storage chassis ratio. Pure’s Rajiev Rajavasireddy, VP Product Management, told B&F: “We give them a configuration that says, for this capacity, here’s the combination [of CPU and storage] that you’re going to need … We determine what that ratio is going to be for optimal performance as well as the capacity.” 

Pure FlashBlade expansion differences
Pure FlashBlade expansion differences between //S and //E models

He added: “What we’re effectively doing with E is we are actually increasing the amount of the storage each compute node actually manages, relative to the FlashBlade that somebody had … But there are limits beyond which it becomes counterproductive. The architecture is such that we can actually add more compute nodes as needed.”

FlashBlade//E DFMs cannot be used in //S systems.

Here’s the power draw picture;

  • The Control Chassis with EC Blades consumes 2,300W
  • The Expansion Chassis  with EX Blades consumes 1,600W 
  • The two XFMs: 150W each

For a starting configuration of FlashBlade//E it’s about 1.06 W/TB, and for every expansion node with EX blades, 0.83 W/TB.

Performance

Rajavasireddy said: “Our performance is going to be just as good or even better than disk and disk hybrid system that we’re targeting. If you think about performance you have sequential throughput, you have random IO, you have large files, small files, and metadata.

“For sequential data, so for large files and sequential throughput, these disk-based systems are reasonably good because the heads are not seeking too much, they’re just running with the head in one position, for the most part. Our sequential throughput is going to be just as good or better.” 

He said FlashBlade//E will do better than disk-based arrays with small files, random IO and metadata.

IO performance

IO access to these systems will be across faster versions of Ethernet, with 100 gigE today moving to 200 and then 400 gigE. PCie 4 can be expected to advance to PCIe 5. CTO Alex McMullan said: “We have no concerns on that side of things. We already have the engineering plan laid out in terms of network, in terms of PCIe links, to make sure that that is not a concern for us.”

These systems will have to be more reliable than today’s. McMullan said: “If you’re effectively collapsing five or 10 physical systems into one FlashBlade, it also has to drive up reliability in exactly the same way. And a big part of this for us has been about observability. It’s been about exporting a lot more metrics into things like [Pure analytics application] Prometheus for customers to be able to see, but also for us to monitor.”

Green flash and gunning for disk

The //E is more power-efficient than //S FlashBlade systems, drawing <1W/TB. McMullan told us: “We’re holding to the same power budget for drives as they get bigger and bigger, which nobody else can claim at this point in time … We’ll be getting below one watt per terabyte very quickly in the next few months. And our plan is to get into a discussion where it’s milliwatts per terabyte over the next two years.” In general, Pure says it’s delivering green flash at the price of disk.

It’s also better in e-waste terms as organizations tend to shred and send disk drives to landfill when they are replaced. DFMs don’t produce such e-waste as they can be recycled, Pure says.

Rajavasireddy told us: “With this FlashBlade//E we are targeting that volume of data that is currently running on disk and disk hybrid systems … We feel we finally have something that can go straight at the disk-based high capacity storage market.” A positioning diagram shows it in a 2D space defined by performance and capacity axes relative to FlashBlade//S:

FlashBlade comparisons
Pure’s diagram deliberately shows no overlap between the FlashBlade//S200 and the FlashBlade//E systems

Pure sums this up by claiming that, compared to HDD alternative systems, FlashBlade//E needs a fifth of the power, a fifth of the space, is 10-20x more reliable, has 60 percent lower operational cost and generates 85 percent less e-waste.

It is basically saying it’s going to invade the hybrid disk array market for mainstream bulk unstructured data that requires reasonably fast access and high capacity for workloads like data lakes, image repository, and video surveillance records – everyday file and object workloads. And it’s going to be doing it with drives that are larger than disk, require less energy than disk, and are more reliable than disk. And that when you throw them away, you don’t need a landfill site to hold the disk drive shreds. 

Pure really does believe it can blow disk drive unstructured data arrays out of the water with FlashBlade//E. The system will be generally available by the end of April 2023 and can be supplied as a new service tier in Pure’s Evergreen//One Storage as-a-tier (STaaS) subscription.

Canada wants to phase out data copying

The Standards Council of Canada has approved an IT-related standard which would eliminate new data silos and copies when organizations adopt or build new applications, thus apparently preventing any data copies being made with new applications. However, last-time integration could bring in legacy data via copies.

The CAN/CIOSC 100-9, Data governance – Part 9: Zero-Copy Integration standard can be downloaded here

It states: “The organization shall avoid creation of application-specific data silos when adding new applications and application functionality. The organization shall adopt a shared data architecture enabling multiple applications to collaborate on a single shared copy of data. The organization should continue to support the use of application-specific data schema without the need to generate application-specific data copies.”

This standard was created by two Canadian organizations – the nonprofit Digital Governance Council trade body (DG, previously the CIO Strategy Council) and the nonprofit Data Collaboration Alliance.

Keith Jansa, DGC Canada
Keith Jansa

The DCG told us: “The CIO Strategy Council Technical Committee 1 on Data Governance comprises policy makers, regulators, business executives, academics, civil society representatives and experts in data management, privacy and related subjects from coast-to-coast-to-coast.”

Keith Jansa, Digital Governance Council CEO, issued a statement saying: “By eliminating silos and copies from new digital solutions, Zero-Copy Integration offers great potential in public health, social research, open banking, and sustainability. These are among the many areas in which essential collaboration has been constrained by the lack of meaningful control associated with traditional approaches to data sharing.”

The problems the standard is meant to fix center on data ownership and control, and compliance with data protection regulations such as California’s Consumer Privacy Act and the EU’s General Data Protection Regulation (GDPR). The creation of data copies, supporters of the standard say, transfers data control away from the original owners of the data.

Dan DeMers, CEO of dataware company Cinchy and Technical Committee member for the standard, said: “With Zero-Copy Integration, organizations can achieve a powerful combination of digital outcomes that have always been elusive: meaningful control for data owners, accelerated delivery times for developers, and simplified data compliance for organizations. And, of course, this is just the beginning – we believe this opens the door to far greater innovation in many other areas.”

DGC says viable projects for Zero-Copy Integration include the development of new applications, predictive analytics, digital twins, customer 360 views, AI/ML operationalization, and workflow automations as well as legacy system modernization and SaaS application enrichment. It is developing plans for the advancement of Zero-Copy Integration within international standards organizations.

Implications

We talked to several suppliers, whose products and services currently involve making data copies, about the implications for them.

WANdisco’s CTO Paul Scott Murphy told us: “At first glance, the standard may appear to push against data movement, but in fact, it supports our core technology architecture. Our data activation platform works to move data that would otherwise be siloed in distributed environments, like the edge, and aggregates it in the cloud to prevent application-specific copies from proliferating. 

“Our technology eliminates the need for application-specific data management and ensures it can be held as a single physical copy, regardless of scale.”

He added: “Notably, there’s a particularly important aspect of the new guidance on preventing data fragmentation by building access and collaboration into the underlying data architecture. Again, our technology supports this approach, dealing directly with data in a single, physical destination (typically a cloud storage service). Our technology does not rely on, require, or provide application-level interfaces. 

“In response to the standard, Canadian organizations will need to adopt solutions and architectures that do not require copies of data in distributed locations – even when datasets are massive and generated from dispersed sensor networks, mobile environments, or other complex systems.” 

A product and service such as Seagate’s Lyve Mobile depends upon making data copies and physically transporting the copied data to an AWS datacenter or customer’s central site. Both would be impacted for new apps if the Canadian Zero-Copy Initiative was adopted.

A Seagate spokesperson told us: “Seagate is monitoring the development and review of Zero-Copy Integration Standard for Canada by technical committee and does not speculate on potential adoption or outcome at this time.”

What does the standard mean for backup-as-a-service suppliers Clumio or Druva, makers of backup data copies stored in a public cloud’s object store.

W Curtis Preston, Chief Technical Evangelist at Druva, told us: “This is the first I’ve heard of the standard.  However, I believe it’s focusing on a different part of IT, meaning the apps themselves. They’re saying if you’re developing a new app you should share the same data, rather than making another copy of the data. The more copies of personal data you have the harder it is to preserve privacy of personal data in that data set. I don’t have any problem with that idea as a concept/standard.

“I don’t see how anyone familiar with basic concepts of IT could object to creating a separate copy of data for backup purposes. That’s an entirely different concept.”

Poojan Kumar, co-founder and CEO of Clumio, told us: “This is an important development; companies are being encouraged to use a single source of truth – such as a data lake – to feed data into apps, analytics platforms, and machine learning models, rather than creating multiple copies and using bespoke copy-data platforms and warehouses.”

He added: “Backup and DR strategies will evolve to focus on protecting the shared data repository (the data lake), rather than individual apps and their copies of data. We have maintained that backups should be designed not as ‘application-specific’ copies, but in a way that powers the overall resilience of the business against ransomware and operational disruptions. This validates our position of backups being a layer of business resilience, and not simply copies.”

DGC view

We have asked the DGC how it would cope with examples above and what it would recommend to the organizations wanting to develop their IT infrastructure in these ways. Dan DeMers told us: “One of the most important concepts within the Zero-Copy Integration framework is the emphasis on data sharing via granting access for engagement on uncopied datasets (collaboration) rather than data sharing via the exchange of copies of those datasets (cooperation).

“But as you point out, many IT ecosystems are entirely reliant upon the exchange of copies, and that is why Zero-Copy Integration focuses on how organizations build and support new digital solutions.”

Legacy apps can contribute, he said. “One capability that is not defined in the standard (but we are seeing in new data management technologies such as dataware) is the ability to connect legacy data sources into a shared, zero-copy data architecture. These connections are bi-directional, enabling the new architecture to be fueled by legacy apps and systems on a ‘last time integration’ (final copy) basis.

“It’s like building the plane while it’s taking off in that sense – you’re using a Zero-Copy data architecture to build new solutions without silos or data integration, but it’s being supplied with some of its data from your existing data ecosystem. 

“It’s all about making a transition, not destroying what’s working today, so the scenarios you outlined would not be an issue in its adoption.”

 

Storage news roundup – March 1

Newspaper sellers on Brooklyn Bridge

Avery Design Systems has announced a validation suite supporting the Compute Express Link (CXL) industry-standard interconnect. It enables rapid and thorough system interoperability, validation and performance benchmarking of systems targeting the full range of versions of the CXL standard, including 1.1, 2.0 and 3.0. The suite covers both pre-silicon virtual and post-silicon system platforms.

Avery CXL Validation Stack.

Data lake supplier Dremio has rolled out new features in Dremio Cloud and Software. Expanded functionality with Apache Iceberg includes copying data into Apache Iceberg tables, optimizing Apache Iceberg tables and table roll back for Apache Iceberg. Customers can create data lakehouses by performantly loading data into Apache Iceberg tables, and query and federate across more data sources with Dremio Sonar. It is also accelerating its data as code management capabilities with Dremio Arctic –  a data lakehouse management service that features a lakehouse catalog and automatic data optimization features to make it easy to manage large volumes of structured and unstructured data on Amazon S3.

Sheila Rohra.

Hitachi Vantara has appointed Sheila Rohra as its Chief Business Strategy Officer, reporting to CEO Gajen Kandiah and sitting on the company’s exec committee. CBSO is a novel title. Kandiah said: “Sheila has repeatedly demonstrated her ability to identify what’s next and create and execute a transformative strategy with great success. With her industry expertise and technical understanding of the many elements of our business – from infrastructure to cloud, everything as a service (XaaS), and differentiated services offerings – I believe Sheila can help us design a unified corporate strategy that will address emerging customer needs and deliver high-impact outcomes in the future.” Rohra comes from being SVP and GM for HPE’s data infrastructure business focused on providing primary storage with cloud-native data infrastructure and hyperconverged infrastructure to Fortune 500 companies. 

Huawei has launched several storage products and capabilities at MWC Barcelona. They include a Blu-ray system for low-cost archiving; OceanDisk, “the industry’s first professional storage for diskless architecture with decoupled storage and compute and data reduction coding technologies, reducing space and energy consumption by 40 percent”; Foiur-layer data protection policies with ransomware detection, data anti-tamper, air-gap isolation zone through the air-gap technology, and end-to-end data breach prevention; and a multi-cloud storage solution, which supports intelligent cross-cloud data tiering and a unified cross-cloud data view. OceanDisk refers to two OceanStor Micro 1300 and 1500 2RU chassis holding 25 or 36 NVMe SSDs respectively, with NVMeoF access. We’ve asked for more information about the other items.

Data migrator Interlock says it is able to migrate data from any storage (file/block) to any storage, any destination and for any reason. Unlike competitors, Interlock can migrate data from disparate vendors as well as across protocols (NAS to S3). It is able to perform data transformation necessary to translate data formats and structures of one vendor/protocol to another. Interlock says it can extract data from an application if given access to storage. This allows Interlock to migrate data at the storage layer, which is faster than through the application.

Interlock migrates compliance data with auditability and can “migrate” retention from before. Typically, when migrating data across different storage systems, built-in data protections like snapshots are lost. But with Interlock, snapshots, labels, as examples, may be migrated with data. Migrations are complicated by lack of resources such as bandwidth and CPU/memory bottlenecks in the system. Interlock is able to track utilization (when the system is busy etc.) and adjust number of threads accordingly. This also helps reduce required cutover time.

Nyriad, which supplies disk drive-based UltraIO storage arrays with GPU controllers, is partnering SI DigitalGlue with its creative.space platform making enterprise storage simple to use and manage. Sean Busby, DigitalGlue’s President, said: “DigitalGlue’s creative.space software coupled with Nyriad’s UltraIO storage system offers high performance, unbeatable data protection, and unmatched value at scale.” Derek Dicker, CEO, Nyriad, said: “Performance rivals flash-based systems, the efficiency and resiliency are equal to or better than the top-tier storage platforms on the market – and the ease with which users can manage multiple petabytes of data is extraordinary.” The UltraIO system can reportedly withstand up to 20 drives failing simultaneously with no data loss while maintaining 95 percent of its maximum throughput.

Veeam SI Mirazon has selected ObjectFirst and its Ootbi object storage-based backup device as the only solution that met all its needs. Ootbi was racked, stacked, and powered in 15 minutes, built on immutable object storage tech. Mirazon says that, with Object First, it can shield its customers’ data against ransomware attacks and malicious encryption while eliminating the unpredictable and variable costs of the cloud. 

Data integrator and manager Talend has updated its Talend Data Fabric, adding more AI-powered automation to its Smart Services to simplify task scheduling and orchestration of cloud jobs. The new release brings certified connectors for SAP S/4HANA, and SAP Business Warehouse on HANA, enabling organizations to shift critical workloads to these modern SAP data platforms. The release supports ad platforms such as TikTok, Snapchat, and Twitter, and modern cloud databases, including Amazon Keyspaces (for Apache Cassandra), Azure SQL Database, Google Bigtable, and Neo4j Aura Cloud. The addition of data observability enables data professionals to automatically and proactively monitor the quality of their data over time and provide trusted data for self-service data access. More info here.

Veeam has launched an updated SaaS offering – Veeam Backup for Microsoft 365 v7 – enabling immutability, delivering advanced monitoring and analytics across the backup infrastructure environment, along with increased control for BaaS (backup as a service) through a deeper integration with Veeam Service Provider Console. It covers Exchange Online, SharePoint Online, OneDrive for Business and Microsoft Teams. Immutable copies can be stored on any object storage repository, including Microsoft Azure Blob/Archive, Amazon S3/Glacier and S3-compatible storage with support for S3 Object Lock. Tenants have more self-service backup, monitoring and restore options to address more day-to-day requirements. Veeam Backup for Microsoft 365 v7 is available now and may be added to the new Veeam Data Platform Advanced or Premium Editions as a platform extension or operate as a standalone offering.

AIOps supplier Virtana has announced a Capacity Planning offering as part of the infrastructure performance management (IPM) capabilities of the Virtana Platform. Companies get access to real-time data for highly accurate and reliable forecasts. Jon Cyr, VP of product at Virtana, said: “You’ll never be surprised by on-prem or cloud costs again.”

A Wasabi Vanson Bourne survey found 87 percent of EMEA respondents migrated storage from on-premises to public cloud in 2022, and 83 percent expect the amount of data they store in the cloud to increase in 2023. Some 52 percent of EMEA organizations surveyed reported going over budget on public cloud storage spending over the last year. Top reasons for EMEA orgs exceeding budget included: storage usage was higher than anticipated (39%); data operations fees were higher than forecast (37%); additional applications were migrated to the cloud (37%); storage list prices increased (37%); higher data retrieval (35%); API calls (31%); egress fees (26%) and more data deletion (26%) fees than expected. Overall, EMEA respondents indicate that 48 percent of their cloud storage bill is allocated to fees, and 51 percent allocated to storage capacity, on average.

WCKD RZR intros silo-busting data platform

UK startup WCKD RZR has unveiled Data Now software at Mobile World Congress in Barcelona that it says catalogs different databases anywhere in a customer’s network and gives them instant access to all their databases’ content.

A book library has a catalog – a single source of truth about the books it stores. It has a few buildings, one type of thing to catalog, and a homogeneous user population. A multinational organization like a bank is light years away from that happy state. WCKD RZR wants to move it from silo proliferation and data ignorance to the comfort of data asset knowledge and governed access.

Chuck Teixeira, founder and CEO of WCKD RZR, explained in a statement: “In Data Now, we’ve created the universal ‘master key’ for data discovery. Our goal is to revolutionize the way businesses manage and access their data, and our solution does just that. It’s truly disruptive and can benefit every large organization on the planet. Whether its multinational banks, government institutions or regional retailers; Data Now acts as their supercharged data connector and access accelerator.”

Data Now provides a central location for businesses to see, search and access data across an entire organization. Once discovered, the software enables users to view and download data from multiple databases, in any environment around the world, seamlessly and in full compliance with all global data regulations.

Why this is a big deal?

A bank like HSBC, Barclays or Citibank has hundreds of separate data silos holding data for specific applications that the bank has developed for specific applications in specific geographies with individual regulations and data access rules. It can be dealing with multinational customers whose myriad operations have a presence in some or many of these silos.

If we ask the question “What data does the bank hold on that customer?” the typical answer is: “It doesn’t know” – because it can’t find out. Each data silo is its own micro-universe with its own access rules, data element names, data formats, storage system and its own management.

John Farina and Chuck Teixeira, WCKD RZR
John Farina and Chuck Teixeira

The WCKD RZR story began when HSBC’s UK and US operations entered a deferred prosecution agreement with the US Justice Department in 2012 because it had failed to maintain an effective anti-money laundering program and to conduct appropriate due diligence on its foreign account holders. 

It forfeited $1.256 billion, paid $665 million in civil penalties, and had to update its records so it could monitor potential money laundering attempts. CTO Jon Farina told us: “You want to make sure that you can monitor transactions, credit card payments and flows of cash across our internal systems to make sure that something nefarious is not being done.”

This involved collating some 10PB of data covering 1.6 million clients in 65 legal jurisdictions. Teixeira and Farina, who were working at HSBC at the time, had the job of combing through the many data silos involved and creating, as it were, a single and accessible source of truth.

It was as if they were standing on top of Nelson’s Column in London’s Trafalgar Square, surveying hundreds of different buildings, and saying they had to get into each and every one and find the data inside.

They built a Hadoop system on-premises at HSBC with tens of thousands of tables and machine learning software to detect transactions across different clients to spot potential financial crimes. This was an absolutely massive extract, transform and load (ETL) operation, and they wanted it automated. But there was no software to do that. They realized that it was, in fact, a general problem, not one unique to HSBC.

They also thought it could be automated if connectors were built to the underlying silos and their contents cataloged and indexed, as well as their access routes and restrictions discovered. All this metadata could be entered into a single database, an abstraction layer lens, through which the underlying data silos could be virtualized into a single entity without their data contents being moved or migrated anywhere.

This realization triggered Teixeira into starting WCKD RAZR – named after his pet bulldog – in 2020, with Farina joining as CTO in May last year, when the company raised $1.2 million in a pre-seed round.

Farina briefed us on this background and on WCKD RZR’s software development. We constructed a diagram showing the basic structure of WCKD RZR’s Data Watchdog technology:

Diagram of WCKD RZR technology
Blocks & Files’ DataWatchdog/Data Now diagram

Clients access the catalog, search for data then request it. Their access status is checked and, if valid, DataWatchdog will fetch the data from the underlying sources and deliver it to them. 

There are three aspects to this: find, govern and access. Data Watchdog enables customers to find and govern their data in each country, in real time, and be fully compliant with relevant data sharing, privacy and governance rules. It spiders through the underlying data sources – in minutes it’s claimed – and adds them to the catalog, without touching, transforming or duplicating the original data sources. The Data Now software provides access to the data located by Data Watchdog and can mask sensitive information such as debit card numbers.

Farina said: “Data Now is a full-service data access accelerator. We are revolutionizing the way organizations can search multiple databases for information that they hold. Now they can search it, see it, find it, use it, monetize it. Mobile phones were transformed by the iPhone, video rentals were redefined by Netflix, and data access is now being revolutionized by Data Now.”

There is no need to migrate data from multiple sources into a single mega-database in a digital transformation project. There are aspects of this which are similar to the data orchestration provided by Hammerspace, but WCKD RZR is focused more on source databases rather than raw file or object data storage systems.

Pinecone: Long-term memory for AI

Pine cones from Wikipedia public domain image - https://upload.wikimedia.org/wikipedia/commons/e/ec/Sulkava.vaakuna.svg

Startup Pinecone, which provides a vector database that acts as the long-term memory for AI applications, has hired former Couchbase CEO and executive chairman Bob Wiederhold as president and COO after 15 months of acting as advisor and board member.

A long-term memory for AI apps sounds significant, but is it? Why do such apps need a special storage technology? Pinecone’s software applies to vector databases which are used in AI and ML applications such as semantic search and chatbots, product search and recommendations, cybersecurity threat detection, and so forth. After one year of general availability Pinecone says it has 200 paying customers, thousands of developers, and millions of dollars in annual recurring revenue (ARR).

Edo Liberty, Pinecone founder and CEO, said in a statement: “To maintain and even accelerate our breakneck growth, we need to be just as ambitious and innovative with our business as we are with our technology. Over the past 15 months I’ve come to know Bob as one of the very few people in the world who can help us do that.”

The key Pinecone technology is indexing for a vector database.

A vector database has to be stored and indexed somewhere, with the index updated each time the data is changed. The index needs to be searchable and help retrieve similar items from the search; a computationally intensive activity, particularly with real-time constraints. That indicates the database needs to run on a distributed compute system. Finally, this entire system needs to be monitored and maintained.

Edo Liberty, Pinecone
Edo Liberty

Liberty wrote: “There are many solutions that do this for columnar, JSON, document, and other kinds of data, but not for the dense, high-dimensional vectors used in ML and especially in Deep Learning.” The vector database index – the reason he founded Pinecone was to create an indexing facility – needed to be built in a way that was generally applicable and facilitated real-time search and retrieval.

When AI/ML apps deal with objects such as words, sentences, multimedia text, images, video and audio sequences, they describe them with numeric values that can describe a complex data object, such as color, physical size, surface light characteristics, audio spectrum at various frequency levels and so on.

These object descriptions are called vector embeddings and stored in a vector database, where they are indexed so that similar objects can be found in the database through index searching. A search is not run based on direct user-input data such as keywords or metadata classifications for the stored objects. Instead, we understand, the search term is processed into a vector using the same AI/ML system used to create the object vector embeddings. A search can then look for identical and similar objects. 

Pinecone was founded in 2019 by Liberty, an ex-AWS director of research and one-time head of its AI Labs that led to the creation of Amazon SageMaker. He spent just over two and half years at AWS after almost seven years at Yahoo! as a research scientist and senior research director. Pinecone raised $10 million in seed funding in 2021 and $28 million in an A-round in 2022.

In a 2019 blog, Liberty wrote: “Machine Learning (ML) represents everything as vectors, from documents, to videos, to user behaviors. This representation makes it possible to accurately search, retrieve, rank, and classify different items by similarity and relevance. This is useful in many applications such as product recommendations, semantic search, image search, anomaly detection, fraud detection, face recognition, and many more.”

Pinecone’s indexing uses a proprietary nearest-neighbor search algorithm that is claimed to be faster and more accurate than any open source library. The software’s design provides exceptional performance regardless of scale, with dynamic load balancing, replication, name-spacing, sharding, and more.

Bob Wiederhold, Pinecone
Bob Wiederhold

Vector databases are attracting a lot of attention. Zilliz raised $60 million for its cloud vector database technology in August last year. And we wrote about Nuclia, the search-as-a-service company in December. Wiederhold’s transition from advisor and board member to a full-on operational COO role indicates he shares that excitement.

He said: “There is incredibly rapid growth across all business metrics, from market awareness to developer adoption to paying customers using Pinecone in mission-critical applications. I am ecstatic to join such an elite company operating in such a critical and growing market.”

WekaIO’s stance on sustainable AI puts down roots

WekaIO wants us to be aware of datacenter carbon emissions caused by workloads using its software technology – AI, machine learning and HPC – and aims to counter those emissions with a sustainable AI initiative.

It says that although these technologies can power research, discoveries, and innovation, their use is also contributing to the acceleration of the world’s climate and energy crises. WekaIO wants to collaborate with leaders in the political, scientific, business, and technology communities worldwide to promote more efficient and sustainable use of AI. What it is actually doing now, as a first step, is planting 20,000 trees in 2023 and committing to plant ten trees for every petabyte of storage capacity it sells annually in the future, by partnering with the One Tree Planted organization.

Weka president Jonathan Martin said: “Our planet is experiencing severe distress. If we do not quickly find ways to tame AI’s insatiable energy demands and rein in its rapidly expanding carbon footprint, it will only accelerate and intensify the very problems we hoped it would help us solve.”

Is WekaIO just greenwashing – putting an environmentally aware marketing coat around more or less unchanged carbon-emitting activities?

Weka’s software indirectly contributes to global warming through carbon emissions caused by the  servers on which it runs using electricity for power and cooling. How much carbon do they emit?

One estimate by goclimate.com says emissions from a Nordic on-premises or datacenter server are 975kg CO2-eq/year, assuming the servers don’t use green electricity from wind farms and solar power. How many trees are needed to absorb that?

The average tree absorbs an average of 10kg, or 22lb, of carbon dioxide per year for the first 20 years, according to One Tree Planted. So Weka would need to plant 97.5 trees per year to absorb it for the working life time of this petabyte of storage. If that petabyte runs across eight servers, emitting 7,800kg of carbon per year, then WekaIO would need to plant 780 trees a year.

But WekaIO is planting 20,000 trees in 2023, which would absorb 200,000kg of CO2 per year. This is real.

Previously object storage supplier Scality has got involved in reforestation. It seems a good idea and perhaps there’s scope here for organized storage industry action.

Martin said: “We also recognize that reforestation is only one piece of the decarbonization puzzle. It would be tempting to stop there, but there is much more work to be done. The business, technology, and scientific communities must work together to find ways to make AI and the entire enterprise data stack more sustainable. Weka is committed to doing its part to help make that happen. Watch this space.”

Learn more about Weka’s thinking here and here.

Comment

The Storage Networking Industry Association has its Emerald initiative to reduce storage-caused datacenter carbon emissions. Pure Storage also has a strong emphasis on carbon emission reduction via flash drive use, although its all-flash product sales are obviously improved if storage customers buy fewer disk drives.

Weka’s green stance is not compromised by such concerns. Balancing storage supplier business interests and environmental concerns, without demonizing particular technologies, is going to be a hard nut to crack and getting a storage industry-wide consensus may be impossible. But a storage or IT industry-wide commitment to reforestation, via perhaps an agreed levy on revenues, might be feasible. Let’s watch this space, as Martin suggests.