Home Blog Page 141

It’s not about money: MinIO vs WEKA – round 2

MinIO claims WEKA was wrong in its response to MiniO’s accusations of open-source license infringements.

Open-source object software producer MinIO licensed its software originally under the Apache v2 scheme and then moved to GNU’s AGPL v3 license. It has alleged that both Nutanix and WEKA use its software without correctly and publicly acknowledging that it is MinIO open-source SW. WEKA previously brushed off the accusation when CPO Nilesh Patel told us last month:

“Weka’s software only uses MinIO open source software licensed under Apache 2.0. All code forked by Weka was derived exclusively from Apache 2.0 licensed code – no MinIO code subject to the AGPL v3 license is being used in Weka’s software.”

Garima Kapoor

MinIO COO Garima Kapoor has now published a blog which states: “Weka made three assertions in their post:

  1. They were compliant with the Apache v2 notice and attribution requirements.
  2. We don’t have the right to revoke Weka’s Apache v2 license.
  3. They only use MinIO open source software licensed under Apache v2. Which is to say they don’t use any GNU AGPL v3 code. 

“They are wrong in all three of the assertions.”

Regarding Apache v2 compliance: “Weka claims that there was a disclosure file on their website. … We could not find the MinIO copyright notice and LICENSE file in their binary distribution and their website.” Kapoor then claims that after MiniIO’s initial blog about infringement was published the license acknowledgement appeared, according to her searches on the Wayback machine.

“It shows up AFTER our blog came out. Hidden disclosures are not in compliance with the FOSS license requirements and completely defeats its purpose. We could not find any LICENSE notices in their binary distribution either. The claim that they were compliant simply isn’t supported by the evidence.

License revocation

A previous blog by Weka’s Patel’s states: “In the Grant of Copyright License (Section 2 in the license terms), each open-source contributor grants a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.”

In his view: “The written terms of the Apache 2.0 license agreement are clear: MinIO cannot revoke WEKA’s license.”

Kapoor disagrees, claiming: “An open source license is a type of contract. By releasing code under the Apache v2 license, we offer a contract to anyone who agrees to our terms. The way they accept that contract is by complying with the terms of the license. However, if they don’t uphold their part of the bargain, including by not providing us attribution, they haven’t accepted our terms. That means they haven’t entered into the contract that allows them to use our code. This is established law, and is the reason why violating open source license conditions constitutes copyright infringement and contract breach.”

She does include a MinIO correction: “Regarding revocation, our language may have suggested that Apache v2 licensors can unilaterally and permanently withdraw a company’s right to use the code. We don’t believe they can. If, and when, a company complies with the license, it may use the code. We apologize if we conveyed a different message.”

However Kapoor insists MinIO can revoke WEKA’s license to use its code: “We had terminated the license due Weka’s years-long non-compliance. Weka may regain the ability to use our software if they come into compliance.” Customers are okay in this regard though as: “termination only applies to Weka and not their customers.”

This area seems legally complex and lawyers will need to examine the relevant texts and say who can and cannot revoke licenses, and what constitutes a contract breach that means a license doesn’t apply.

MinIO AGPL v 3 code use

Patel’s post says: “We can confirm that all code forked by WEKA has been derived exclusively from Apache 2.0 licensed code – no MinIO code subject to the AGPL v3 license has ever been used in WEKA’s software. Therefore, only the terms of the Apache 2.0 license apply.”

Kapoor disagrees again, claiming Weka had included “MinIO WARP performance benchmarking utility in their MinIO bundle (along with the Server and Client).” She adds that “WARP has ALWAYS been licensed under AGPL v3.” A screenshot, included in MinIO’s first blog post, was included as a supposed demonstration of WEKA’s WARP use. B&F cannot confirm its veracity. 

Not about money

There has been comment that MinIO is coupling its AGPL v3 licensing with trying to sell 7-figure sum support services or other deals to system SW suppliers such as Nutanix and WEKA.

Garima Kapoor told B&F in a Zoom session that this was wrong: “There is absolutely no money involved from our side.”

She said WEKA wants to ensure that “whoever is using my product … to make sure that they respect our licences and that’s pretty much it… the compliance is what we are after.”

Comment

MinIO believes WEKA hasn’t fully complied with the terms of its Apache and AGPL v3 licenses and it seems as though the spat will run for some time yet, with neither side admitting defeat.

The company told WEKA it was using AGPL v3-licensed code, the WARP software, and yet WEKA’s Patel, its chief product officer, said it was not using any AGPL v3-licensed code from MinIO.

An issue with the AGPL v3 license is that it aims to ensure “modified source code becomes available to the community. It requires the operator of a network server to provide the source code of the modified version running there to the users of that server.” The intent is that modified open source code is made available to the open source community and does not become proprietary. 

What is “modified source code”? MInIO says: “Creating combined or derivative works of MinIO requires all such works to be released under the same license. If MinIO source code is included in the same executable file, they are definitely combined in one program. If modules are designed to run linked together in a shared address space, that almost surely means combining them into one program.”

But “pipes, sockets, RESTful APIs, and command-line arguments are communication mechanisms normally used between two separate programs. So when they are used for communication, the modules normally are separate programs. But if the semantics of the communication are intimate enough, exchanging complex internal data structures, that too could be a basis to consider the two parts as combined into a larger program.”

There is much scope for detailed legal analysis here.

A WEKA spokesperson told us: “MinIO has, once again, made public allegations about WEKA on its blog without first communicating with us or giving us an opportunity to address their claims in private. Their decision to force a dispute in the public domain without corroborating their statements as factual is reckless and, frankly, unethical.”  

“We reviewed their latest statement, and nothing cited changes WEKA’s original position. Therefore, we maintain that their allegations are patently false and unfounded. The only part of MinIO’s post we agree with is that WEKA’s customers and partners can continue to use the software:” 

“Weka’s customers are most certainly able to continue to use MinIO software as long as they comply with the Apache License v2 and GNU AGPL v3 licenses.” 

Equalum talks up its ETL, change data capture-fed, real-time data streamer

Israeli startup Equalum, an ETL company, talked about its tech at a recent IT Press Tour, explaining how it captures database and other changes in realtime and pushes them out to data warehouses and lakes, both on-prem and in the cloud, so operational processes can work on the latest data.

Data integrator Equalum has built connectors to 32 data sources, such as Kafka, MySL, Oracle, PostreSQL and SQLServer for data ingestion, and to 27 targets such as AWS, Azure, GCP, Snowflake and Yellowbrick. It uses change data capture, binary log parsing, and also replication technologies to get new data and stream it to target destinations, at scale, after enriching and transforming it.

Eyal Perlson.

VP Marketing Eyal Perlson told B&F that Equalum’s technology is: “the only way to get changed data in volume in real time… Customers want to move operational on-prem database data to the public cloud data warehouse in real-time.”

Binary log parsing refers to Equalum’s agent software accessing binary redo logs of change data at Oracle, SQLServer and other databases and so getting access to changed data faster than by using, for example, Oracle’s LogMiner. Oracle’s GoldenGate uses the same data source, log data from Oracle on-prem, and sends it to Oracle’s cloud, with Perlson claiming: “Oracle GoldenGate is incredibly expensive.”

Equalum technology diagram.

He said: “We took years to work out how to do this binary log parsing better. … It’s customised unique code.“ Not all sources are accessed using binary log parsing as a presentation slide illustrates:

Overall Equlaum claims its edge is that it helps businesses avoid building unique ETL (Extract, Transform and Load) data pipelines for each data source to the destination target they need with dedicated data transformation steps. Customers are finding that they need to extract data from more and more sources and send it, after any transformations, to more targets. Equalum says customers can use its no-code technology to build ETL data pipelines in as little as 15 minutes using drag-and-drop technology with pre-configured source and target connectors.

This eliminates, it says, customized ETL coding and tool sprawl. Equalum sees part of its job as building more extractors and connectors.

Equalum dashboard screenshot

Perlson told us businesses are using Equalum’s tech to make real-time recommendations to customers during online purchases or browsing sessions.

He said: “We are going to be added to the Azure and GCP marketplace in next few weeks.” Asked about the AWS marketplace he said there were no plans. We understand that an issue affecting entry in to AWS’ marketplace is that AWS has tools of its own.

Yellowbrick is a strategic Equalum partner but there is no relationship with Databricks.

Asked about live data replicator WANdisco, Perlson claimed Equalum hadn’t encountered it in any sales deals.

Background

Equalum was founded in Tel Aviv in 2015 by Chief Product Officer Erez Alsheich, current board member and original CEO Nir Livneh, and CTO Ofir Manor. All three were former members of the Israeli Defence Forces. Manor left in January 2019  and joined Microsoft as Senior Program Manager. Equalum’s CEO is Guy Eilon and he joined in 2021 when the company rebranded and tuned its focus around capturing real-time data and sending it cloudwards. 

Guy Eilon

There are around 50 employees with half in Tel Aviv and half in the US: it has an office in Boston, and the CEO has an office in London. Equalum has dozens of customers. including Walmart, Siemens, T-Systems and GSK, but is not yet cash flow-positive. Browse its blog to get more background information.

Equalum’s funding history:

  • 2015 – Seed Round
  • 2017 – Non-equity assistance
  • 2017 – A-Round – $5 million
  • 2019 – B-round – $18 million
  • 2022 – C-round – $14 million

The company has raised a total of $39 million over five funding events from Planven, United Ventures, Innovation Endeavours, Saints Capital, and SpringTide Ventures.

Impossible Cloud skips crypto in Web3 storage

German startup Impossible Cloud says it wants to bridge the traditional and Web3 storage worlds with its generally available, decentralized product – paid for in real money, not cryptocurrency – which uses spare capacity in enterprise datacenters instead of private computer owners.

Updates. Impossible Cloud has changed its pricing. 20 April 2023. Egress policy and discount info’ added. 12 May 2023.

Web3 or decentralized storage stores data across myriad IT systems and not in a single datacenter. A distributed network of private or very small business IT shops contribute spare storage capacity on a peer-to-peer basis using a shared protocol that handles the storage I/O, verifies their existence and trustworthiness using blockchain transactions, and incentivizes them via cryptocurrency payments. Consumers buy capacity the same way unless a translation mechanism exists to convert their dollars, euros or pounds to cryptocurrency.

Impossible Cloud wants to make its decentralized storage enterprise class and consumable using traditional currency.

CEO and co-founder Kai Wawrzinek said: “Our ultimate goal is to create a decentralized, cost-effective, enterprise-grade cloud platform that will revolutionize the way businesses utilize cloud services, providing enhanced efficiency, elasticity, and security.

Co-founder and COO Christian Kaul added: “Businesses of all sizes, including global corporations, have been largely ignored in the push to Web3. Our solution delivers all the benefits of Web3, but without the technical complexity that has held back mainstream business adoption. This solution is designed to unlock the B2B potential of Web3 and has never been more timely, as traditional cloud providers continue to raise pricing while underperforming in their delivery.”

Kai Wawrzinek, Christian Kaul, and Daniel Baker of Impossible Cloud
Impossible Cloud founders from left: CEO Kai Wawrzinek, COO Christian Kaul, and CTO Daniel Baker

Impossible Cloud’s founders wanted to improve on AWS, Azure and GCP cloud storage services. Wawrzinek said: “Cloud services have quickly disrupted countless industries, but today’s systems have already become ‘legacy’ and are riddled with limitations. We’ve reimagined what cloud storage can do and the value it delivers, accomplishing what many considered impossible.”

The Impossible Cloud storage service is a version of object storage that is S3 and object lock-compliant. It provides capacity through deals with enterprise-grade datacenters that are fully compliant with standards such as ISO27000, SOC 2 Type II, and others. 

The company says its storage is designed especially for enterprises and SMBs that utilize centralized public cloud providers, such as AWS S3, as well as on-premises private clouds. As it uses enterprise-class datacenters, it says it offers better storage than other Web3 capacity providers, with lower latency, millisecond-class throughput speeds with rapid uploads and downloads. The system is designed for 100 percent durability and 99.95 percent availability.

It supports identity and access management (IAM) features, versioning for files (stored as objects), encryption and immutable buckets – the object lock feature.

It has been developed for deployment within minutes with a single line of code and features 24/7 engineering support. 

Pricing

Prices according to its website’s pricing section:

Impossible Cloud storage prices

Update 1. Three days after we published this story Impossible Cloud altered its pricing, lowering its pay-per use rate and hiding the business plan rates. It now says it costs $7.99/month for pay-per-use, with no egress price revealed. The business plans are renamed as reserved capacity plans with no pricing numbers disclosed. Instead it claims to have discounted pricing, compared to the pay-per-use scheme, with a 25TB to unlimited capacity range, and 1 to 5 year contract terms.

Impossible Cloud has published a comparative pricing chart;

This chart is based on a 10TB/month storage and 10TB/year egress rate, which is not listed in Impossible Cloud’s rate table. StorJ also offers decentralized storage purchased with actual money.

Update 2. Impossible Cloud says it does not charge for egress. Only a fair-use policy applies, and should a client wish to use excessive egress, Impossible Cloud will contact them.  Re Business plan pricing; business clients committing to reserved capacity and upfront for a number of months (12, 24, etc.), can negotiate exclusive discounts.

Background

Impossible Cloud was started in Hamburg, Germany, in late 2021 by CEO Dr Kai Wawrzinek, COO Dr Christian Kaul and CTO Daniel Baker. They raised $7 million in seed funding in March and said that took total funding to $10.9 million.

The seed funding round was co-led by  HV Capital and 1kx, joined by Protocol Labs, TS Ventures, and very early Ventures. The oddly named very early Ventures is a Web3 investor based in Germany.

Impossible Cloud intends to incorporate in the United States, build an elastic network of enterprise-grade storage hubs, and expand the capabilities of its platform.

Wawrzinek, Kaul and Baker have a history in working on what are now publicly traded unicorn companies, including Goodgame Studios, Stillfront, Airbnb, and Iron Mountain. 

Kaul said: “We are excited to be at the forefront of commoditizing Web3 technology for mainstream business adoption, and confident that our decentralized, enterprise-grade cloud platform will be a game-changer in the cloud industry. We believe Impossible Cloud will play a major role in the way the internet will look in 5 to 10 years and be a major force in the cloud services market.”

Rookout: our DevOps tools help squash bugs faster

Israeli startup Rookout supplies dynamic instrumentation for cloud-native apps that it says lets developers find and fix bugs faster, and is used by developers at storage companies Backblaze, NetApp and Seagate.

During an IT Press Tour briefing, B&F was told the whole idea is to help developers deliver correct code faster by enabling them to react better and quicker when code fails. The main way this bug hunting is done is through live logging and dynamic instrumentation and it can be carried out in production environments. Live logging enables developers to increase the granularity of log data at a specific point in their code so that they can zoom in on a problem area.

Dynamic instrumentation is made possible by bytecode (portable or pcode) manipulation. Bytecode is an intermediate type of software code between developer-written source and the machine code that runs on the hardware. It is compiled and executes on a virtual machine, such as a Java Virtual Machine, which acts as an interpreter to turn it into executable machine code.

Rookout graphic

What Rookout does is to use bytecode manipulation to insert new instrumentation into cloud-native apps at run time. It means the developer is not limited to only seeing instrumentation results at preset breakpoints in the source code. They can see as deep as they want into the code when they need to do so, we’re told.

Rookout Smart Snapshots capture application memory contents when a cloud-native app fails. Log data only capture a few hundred bytes, with app traces getting a few thousand. Snapshots include everything that happened at a specific moment in time: stack traces, variable values, request context, and the global state of the application.

Liran Haimovitch, CTO and co-founder of Rookout, said: “Snapshots are the next level for those moments when something goes very wrong and metrics and logs don’t provide enough context for developers. If a picture is worth a thousand words, a Snapshot is worth a thousand log lines.”

A Dynamic Observability offering is designed to allow developers to collect any metric they want on-demand and for no additional cost. Developers can click on any line of code, visualize the data directly from production, and connect it to a business value.

Background

Rookout was founded in 2017 by CTO Liran Haimovitch and Or Weiss, who is on the board but is also the co-founder and CEO of Permit.io. Shahar Fogel is Rookout’s CEO and joined in February 2020. It has raised $28.4 million in total: a $4.2 million seed round in April 2018; an $8 million A-round in August 2019; and a $16.2 million B-round in August 2022. There are 45 full-time employees and R&D takes place in Tel Aviv with a business office in Palo Alto.

Rookout CEO and CTO

Rookout’s software is deployed as an agent and incurs a 2 percent or so execution overhead, we’re told. The company defines four pillars of observability:

  • Metrics – application response measurement on demand,
  • Traces – app event timeline,
  • Logs – app event information,
  • Smart Snapshots – app memory content capture.

Rookout told us it has hundreds of customers and more than 100,000 instances of its agent deployed. Customers include Adobe, Jobvite, and Informatica, it said.

Attackers try to shake down WD for eight figures, claim to have 10TB of data

Western Digital internal systems were broken into on April 3, with data exfiltrated and its My Cloud services going offline. Now some individuals claiming to be the attackers are reportedly demanding an eight-figure payment, threatening to publish sensitive information if WD doesn’t cough up.

An SEC filing at the beginning of the month by Western Digital confirmed: “An unauthorized third party gained access to a number of the Company’s systems.” Western Digital detected the event and started an investigation into what happened with external security and forensic experts. It is coordinating with law enforcement agencies and took certain services offline, such as MyCloud.

The company said it “is actively working to restore impacted infrastructure and services.” It “believes the unauthorized party obtained certain data from its systems and is working to understand the nature and scope of that data.”  Western Digital’s response to the incident “has caused and may continue to cause disruption to parts of the Company’s business operations.” For example, online product ordering is suspended in the UK.

Now TechCrunch reports that the supposed attackers told it they had extracted 10TB of data, including customer information, from Western Digital’s systems and want to extort a minimum eight-figure payment (at least $10,000,000) to stop them publishing it. Western Digital has apparently not been cooperating with the individuals.

The outlet said it was given samples of the stolen information that included what appeared to be Western Digital exec phone numbers, code-signing certificates, internal emails, a Western Digital Box account screenshot and other data. The miscreants claimed to have copied data from Western Digital’s SAP Backoffice eCommerce facility and from a PrivateArk instance. All this suggests the group potentially had access to Western Digital’s internal systems for quite some time and gained privileged access.

WD did not comment on this.

WD’s CISO is Phil Malatras although he only took up that position in March, replacing Geoffrey Aranoff. Malatras was previously Chief Security Officer for its Federal division and a senior director of Global Information Security. He’s certainly having a baptism of fire in his CISO role.

We have contacted WD and asked if it has any comment to make about its progress in recovering from the incident and helping to prevent similar attacks on other organizations in the future.

MyCloud services came back online on April 12.

Burlywood software aims to supercharge SSDs

SSD controller software startup Burlywood is claiming its FlashOS software outperforms Kioxia, Micron, Samsung, SK hynix and Solidigm drives in write speed and latency consistency, saying it achieves vastly more drive writes per day as well.

Tod Earhart
Tod Earhart

Burlywood co-founder and CTO Tod Earhart told B&F in a briefing that existing SSD benchmarks were based on hard disk concepts and do not predict real-world SSD performance. Neither do IOPS and bandwidth numbers based on 100 percent random or sequential read and write workloads. Real-life workloads are mixed – reads and writes, random and sequential IO.

A processor’s performance does not degrade over time but commercially available SSDs do. Drives from all the major manufacturers exhibit pronounced declines in their write bandwidth performance as they age and experience more drive fills. Earhart provided a chart he said showed this:

Burlywood write performance

The chart appears to indicate drives from Kioxia and others exhibit a 20 to 75 percent decline in bandwidth as they age, measured in drive fills. In contrast, Burleywood claims its own drive, the DC-2122 (light blue curve on the chart), has a higher, steady, and more predictable write bandwidth state. 

Write amplification factor

These measurements are taken using an asymmetric hot/cold mixed workload. Hot/cold refers to an SSD workload characteristic that an SSD has so-called hot regions of cells which are written more frequently than cold regions. Internal SSD wear-leveling processes counteract this but use SSD controller time. Because of an SSD’s internal processes and its block-erase-write property, an SSD can need to write more data internally than is sent to the drive. As NAND cells wear out this is not a good thing.

Wear leveling and garbage collection cause additional writes such that 1MB of data sent to the drive can cause, for example, 1.1 to 1.25 to 1.5MB of actual data to be written. The amount of write data is amplified. The write amplification factor number should be as close to 1 as possible, with 1 indicating that 1 unit of data sent to the drive causes 1 unit of data to be written inside the drive. But this is rarely the case, and certainly not in real-life production workloads.

Earhart measured the write amplification factor of different manufacturers’ drives using a simulated production workload and charted the differences:

Burlywood write amplification

This chart hows the write amplification factor (WAF) changes as a drive ages through drive fills, a variation on the classic fresh-out-of-the-box performance phenomenon. It is pretty much the inverse of the previous chart, albeit with Kioxia missing. Earhart says Burlywood-powered drives have a write amplification factor advantage which results in a 1.5x to 4x longer drive life than the other manufacturers on the chart.

This is like having an electric vehicle with, say, a 250-mile range limit. Give it a controller software upgrade and the range increases. That’s what Burlywood claims its controller software achieves with SSDs – extends their endurance.

Earhart looked at how his controller software would lower the WAF and so increase the number of drive writes per day with different classes of NAND in Samsung drives:

  • Enterprise TLC – Samsung – 1.6 DWPD with Burlywood delivering 4.9 DWPD
  • Consumer TLC – Samsung – 0.5 DWPD with Burlywood delivering 1.6 DWPD
  • QLC – Samsung – 0.2 DWPD with Burlywood delivering 0.8 DWPD

He also measured Samsung and Burlywood-powered drives on how many TB/day of writing they could support during a five-year warranty period:

Burlywood TB written per day

The differences appeared to be significant. A 32TB Samsung drive built from enterprise-class TLC NAND supported 51TB/day for five years but Burlywood’s software apparently increased this to 156TB, a more than 3x increase.

That would mean a buyer of needing a 45TB/day capability could save money by using consumer-grade TLC flash instead of costlier enterprise-grade flash. Or, with an 8TB/day need, could use even cheaper QLC flash than either eTLC or cTLC.

Latency

An SSD’s ability to respond quickly to read and write requests, its latency, can also be affected by its internal processes. The quicker they are carried out, the lower the SSD’s latency and the more consistent it is as a drive ages. Earhart measured latency time over a drive’s life with the same manufacturers and discovered a spiky picture:

Burlywood write latency

The Burlywood-controlled SSD provided consistently low latency past 800 drive fills. Solidigm was close, being very much better than SK hynix, Micron, Samsung and Kioxia, which were each progressively worse. A CSP/MSP or SaaS provider needing consistently low latency (fast responsiveness) to data access needs for its customer-facing applications would look at this chart and blanch.

Is this for real? Burlywood uses a simulated production workload after all. But if customers are experiencing lower and inconsistent performance over time and this continues for hundreds of drive fills then it is worth examining SSD latency with the real-world workload.

Customer characteristic

Customers would need to be deploying a hundred or more drives to make using Burlywood software worthwhile as they have to effectively customize drives by buying them and then loading the Burlywood controller software. They could buy SSDs from a Burlywood OEM such as Swissbit or they could have their own drives built by an intermediary, a contract manufacturer, for example, but that could require, say, a thousand-plus drives to make it worthwhile.

Commercially available SSDs are not presented with real-life performance data over the life of the drive. Presuming the data stands up, if they used Burlywood controller software with a sufficient number of drives then they would have longer endurance, consistently lower latency and their total cost of ownership would be much lower.

Read more about Burlywood’s thinking here.

Storage news ticker – April 14

Managed infrastructure solutions provider 11:11 Systems announced GA of the fully-managed 11:11 Managed SteelDome in partnership with SteelDome Cyber. This provides secure, scalable and cost-efficient storage of customers’ unstructured, on-premises data and uses SteelDome’s InfiniVault storage gateway for on-premises data storage, protection and recovery. 11:11 Managed SteelDome is initially available in the United States and United Kingdom, anywhere 11:11 Object Storage 2.0 is offered. Learn more about 11:11 Managed SteelDome here.

Aiven has launched free plans for its open source database services, Aiven for PostgreSQL, Aiven for MySQL, and Aiven for Redis. They are available to anyone with technical support through a newly launched community forum. The free plans include dedicated instances, daily backups, Terraform integration and more. Once initiated, Aiven free plans can be upgraded to larger, paid plans as required.

eBook by data storage virtualizer Alluxio

Data storage virtualizer Alluxio has produced a 14-page eBook called “The Ultimate Guide to Saving Data Egress Costs in the Cloud“. Alluxio asks: “Did you know major cloud service providers encourage you to put data in the cloud and charge you to get it out? Discover everything you need to know about data egress costs and never be surprised by a bill again.” It suggests five ways to somewhat reduce egress costs; they won’t go away.

Alluxio cloud egress charges

Data storage virtualizer Arcitecta has launched its  Point in Time Ransomware Rapid Recovery Solution for the media and entertainment industry. It provides studios with instant recovery from a ransomware attack, works across a studio’s existing production storage systems, is cost-effective, and is easy to deploy and use. Arcitecta will showcase the Point in Time Ransomware Rapid Recovery Solution at this year’s NAB Show, April 17-19, in Las Vegas.

Catalogic’s CloudCasa unit has launched CloudCasa for Velero, and says Velero is the most popular and battle-tested open source data protection software in the market. CloudCasa for Velero has a single console giving enterprises and service providers multi-cluster management for Velero backups, across all Kubernetes distributions and hybrid and public cloud environments. Velero users can subscribe to the CloudCasa service and catalog their existing setup for centralized management within minutes. They can centrally manage and monitor their current backups and configuration across multiple clusters and cloud providers, perform guided recoveries from both existing and new recovery points, perform cross-cloud recoveries, and address enterprise management, governance, and compliance requirements for their Kubernetes data protection environments.

Data protector Cobalt Iron has updated its Compass enterprise SaaS backup offering with data governance capabilities; policy-based controls and an approval framework for decommissioning systems and deleting data. It says these are unique to Compass and will deliver defensible data deletion and system retirement. They allow automated and auditable systems decommissioning and deletion of associated data – bringing simplicity, data governance, historical tracking of all operations, and an approval framework to the decommissioning process.

Data protector Commvault and Microsoft have run an independent study from Enterprise Strategy Group showing real world cost savings based on their joint software/service use. The two say that Commvault/Azure protects more than 1 exabyte of data. The study, titled “Analysing the Economic Benefits of Data Protection with Commvault on Microsoft Azure,” found that customers report:

  • 30 percent cost savings
  • A 38 percent reduction in data footprint
  • Thousands of hours of staff time saved 
  • Improved data security. 

DIGISTOR says that the National Security Administration has added its FIPS 140-2 L2 SSDs to the Commercial Solutions for Classified list. The DIGISTOR drives are the first and only commercial SSDs to achieve this listing, making them the only affordable, secure storage option for building DAR solutions that meet top-secret NSA-level requirements.

Hammerspace has qualified as a data orchestration system and global file system to create a global data environment for the Autodesk Flame finishing and visual effects (VFX) product family. Hammerspace is demoing this at NAB 2023 (booth N3034) and showing an Autodesk Flame artist running multi-site finishing workflows in the cloud across multiple AWS cloud instances on the Hammerspace parallel global file system with automated content orchestration. By unifying access to all content via a single global namespace, workflows can be rapidly provisioned with unprecedented flexibility to adapt to changing performance and cost requirements, even to span multiple cloud regions when needed.

RecoveryMoney blogger Dimitris Krekoukias has written his third Alletra MP blog about its file storage capabilities. He says the VAST Data SW architecture matched HPE’s architectural vision and HPE decided to OEM it. “Other offerings from the competition, like Dell EMC’s PowerScale (the artist formerly known as Isilon) or Pure’s FlashBlade, follow the architecture of having capacity per node, similar to how HCI works – and susceptible to similar challenges:

  • Lose the node, or even upgrade it – you lose the capacity on that node, and any resiliency tied to that capacity
  • No ability to have dissimilar type compute and/or capacity nodes
  • No ability to scale things independently
  • Poor data reduction due to the complexities of handling cluster-wide metadata
  • No ability to share hardware with other solutions (for example, a Pure FlashBlade can’t share hardware with a Pure FlashArray, and one can’t ever convert from one to another if there’s some large excess capacity available).

Read the post for more insight.

High-end block array supplier Infinidat’s InfiniBox and InfiniGuard products have been integrated with Veeam’s Kasten K10 Kubernetes data backup software for container-based workloads. InfiniGuard is integrated with Veeam Backup & Replication v12 and is selectable as a deduplication storage appliance directly from the Veeam console.

NAND fabber Kioxia says it aims to be achieve net-zero by its fiscal 2050 in terms of its Scope 1 greenhouse gas (GHG) emissions (direct emissions from its business sites) and in terms of Scope 2 emissions resulting from its use of purchased energy. That gives it 27 years to achieve this. It has set a target of procuring 100 percent of its energy from renewable sources by fiscal 2040.

GPU-enhanced storage array shipper Nyriad says Trinity Broadcasting Network (TBN) has chosen Nyriad’s UltraIO storage and DigitalGlue creative.space for its editorial and media asset management. The system supports more than 100 users and 1.5Mbps data streaming speed.

Other World Computing (OWC) has announced Jellyfish XT, a shared and portable NAS using flash-based storage with up to 360TB (720TB with extension) usable storage (up to 300TB of all flash storage in a single head unit, expandable to 1.5 Petabytes of total flash storage) and both 100Gb and 10Gb Ethernet connectivity. It supports 4k/8k/12k, VR, or AR  bitrates and is being shown at NAB 2023 in Las Vegas, April 15-19. New Innergize software is designed to monitor the health of Atlas memory cards, with tools to sanitize cards and field upgrade the firmware.

The Certified by SANBlaze test suite update, available via v10.5 and v10.5.1, supports all aspects of NVMe qualification with the addition of Open Compute Project (OCP) 1.0a and Low Power Sub-states validation. Capabilities also include PCIe Single Root I/O Virtualization (SR-IOV), ZNS, VDM, TCG, SRIS/SRNS clocking, OPAL, and T10/DIF. A variety of updates for APIs and Linux are included.

Cloud data warehouser Snowflake has launched a Manufacturing Data Cloud for companies in automotive, technology, energy, and industrial sectors. They can collaborate with partners, suppliers, and customers in a secure and scalable way. It offers a single, fully managed, secure platform for multi-cloud data consolidation with unified governance and elastic performance that supports virtually any scale of storage, compute, and users. It allows manufacturers to break down data silos by ingesting both IT and OT data and analyzing it alongside third-party partner data. It enables data sharing and collaboration with partners for downstream and upstream visibility across an organisation’s entire supply chain coupling its own data with data from third-party partners and data from the Snowflake Marketplace.

Research house TrendFocus estimates that there were ~10.2 million nearline disk drives shipped in the first quarter compared to 10.51 million in the prior quarter and ~18.8 million a year ago. That’s a heck of a drop. Nearline capacity shipped was ~157 exabytes, down 36% y/y and -1% q/q. Total HDD ships in the quarter were 33.5-34.9 million, down 35% at the mid-point. Other HDD market segments are shiwn in the table below;

Thanks to Wells Fargo analyst Aaron Rakers for the source numbers

Veritas has hired Oliver Norman as VP for Channels and Alliances in EMEAI (I for India). Norman was previously Area VP at BMC Software heading up EMEA channel sales.

UK ISP Glide has partnered with virtualDCS and to offer its CloudCover 365 and Veeam Cloud Connect to customers. CloudCover 365 is a Veeam-powered self-service Microsoft 365 backup portal. It backs up all aspects of Microsoft 365, including email, contacts, public folders, Teams, documents in One Drive and the intranet service, SharePoint. Glide has spent two years using Veeam Cloud Connect for its own in-house backup.

RAID software supplier Xinnor (xiRAID) is partnering with Tuxera (Fusion File Share) and Cheetah (RAID Raptor hardware) to provide protected NAS for media and entertainment customers. The three provide high performance for sequential and multi-thread workloads over SMB Direct protocol and integrity of media content. Fusion File Share by Tuxera is a high-performance, scalable, and reliable alternative to Samba and other SMB server implementations. The Cheetah RAID Raptor 2U (below) is a high-performance server. 

Cheetah RAID satorage

xiRAID software was tested on the Cheetah platform through Tuxera’s SMB Direct implementation on relevant media loads:

  • Up to 10GBps writes and reads across the network using the SMB Direct share
  • Up to 21GBps writes and up to 33.5GBps reads on the local test to the filesystem

The test used Frametest 4 stream, 4K video and sequential read/write across the network and locally using SMB-shares on client (xiRAID RAID 5 with 5 NVMe drives and NVIDIA Mellanox ConnectX 5 100GbE adapter). See the setup at NAB 2023, Las Vegas, April 15-29.

Samsung 176-layer NAND chip
Yole image of Samsung 176-layer NAND chip

Semiconductor analyst Yole has looked into Samsung’s 176-Layer 3D NAND memory and produced a report with detailed photos, precise measurements, materials analysis, physical comparison of Generations 6 and 7, manufacturing cost analysis, supply chain evaluation, cost estimation and cost comparison. Buy it for €7,990 ($8,840) here.

Seagate arrives at the 22TB disk capacity level

Seagate has announced its first 22TB hard disk drive, nine months after Western Digital’s 22TB drives started shipping.

The IronWolf Pro 22TB was revealed by Seagate along with a QNAP partnership involving Seagate’s IronWolf Pro drives, Exos E JBODs and Lyve Cloud offerings.

Seagate’s new 22TB spinner uses conventional, non-shingled, non-HAMR, magnetic recording and is targeted at NAS, direct-attach and RAID disk drive environments. The drive has 10 x 2.222TB platters encased in a helium-filled enclosure. That additional platter, compared to the previous nine-platter 20TB model, is reflected in the increased weight – 690g/1.512 lb compared to 680g/1.499 lb. 

Power consumption has risen as well. Average operating power is now 7.9W compared to 7.7W for the prior 20TB drive. Idle power is 6W; it was 5W.

Seagate 22TB IronWolf Pro

The new drive has substantial advances in reliability and workload compared to the prior 20TB product: 2.5 million hours MTBF (Mean Time Between Failures) rather than 1.2 million hours, and a 550TB/year workload limit compared to 300TB/year. Upgraded 20TB IronWolf Pros share these new capabilities.

But the sustained transfer rate of up to 286MBps is the same as before, as are the 7,200rpm spin speed and 6Gbps SATA interface.

The new drive features dual-plane balancing and time-limited error recovery, both said to help in RAID array deployments, as will the rotational vibration sensors. They also come with three years of complimentary Seagate Rescue Data Recovery Services to help recover data if the drives fail.

This 2TB incremental capacity addition of the IronWolf Pro comes as we wait for Seagate’s big capacity jump with HAMR (Heat-Assisted Magnetic Recording) technology. We might see 30TB HAMR drives next quarter.

We expect Seagate to announce 22TB Exos and SkyHawk variants soon as 20TB versions of these were announced along with the 20TB IronWolf Pro in 2021.

Toshiba is now the only HDD manufacturer without a 22TB model; its range tops out with the 20TB MG10. We expect an MG11 model to be announced in a few weeks or months to fill that hole. WD has been sampling 26TB shingled drives (with overlapped write tracks for increased capacity) and it would be no surprise for Seagate to announce a 24TB version of the IronWolf Pro using shingled recording.

QNAP will use the IronWolf Pros in its NAS products. QNAP’s enterprise ZFS-based QuTS hero NAS systems will support select models of Seagate Exos E series JBOD systems. The QNAP NAS systems will use Lyve Cloud’s HybridMount and Hybrid Backup Sync to store backups in the cloud.

The integration of QNAP NAS and Seagate Exos E Series JBOD systems will be available soon. The integrated Seagate-QNAP cloud and NAS offerings are available now, as are IronWolf Pro 22TB disks plus other variants with 20, 18, 16, 14, 12, 10, 8, 6, 4 and 2TB capacities. 

Download an IronWolf Pro 22TB datasheet here.

Databricks wheels in Dolly chatbot

Lakehouse shipper Databricks has updated its open-source Dolly ChatGPT-like large language model to make its AI facilities available for business applications without needing massive GPU resources or costly API use.

ChatGPT is a chatbot launched by OpenAI which is based on a machine learning Large Language Model (LLM)  and generates sensible-seeming text in response to user requests. This has generated an extraordinary wave of interest despite the fact that it can generate wrong answers and make up sources for statements it wants to use. ChatGPT was trained using a large population of GPUs and input parameters; the GPT-3 (Generative Pretrained Transformer 3) model on which it is based used 175 billion parameters.

Databricks sidestepped these limitations to create its Dolly chatbot, a 12 billion parameter language model based on the EleutherAI pythia model family. Its creators write on a company blog: “This means that any organization can create, own, and customize powerful LLMs that can talk to people, without paying for API access or sharing data with third parties.”

The Databricks team did this in two stages. In late March they released Dolly v1.0, an LLM trained using a 6 billion parameter model from Eleuther.AI. This was modified “ever so slightly to elicit instruction following capabilities such as brainstorming and text generation not present in the original model, using data from Alpaca.”

They say Dolly v1.0 shows how “anyone can take a dated off-the-shelf open source large language model (LLM) and give it magical ChatGPT-like instruction following ability by training it in 30 minutes on one machine, using high-quality training data.”

They open sourced the code for Dolly and showed how it can be re-created on Databricks, saying: “We believe models like Dolly will help democratize LLMs, transforming them from something very few companies can afford into a commodity every company can own and customize to improve their products.”

Now Dolly 2.0 has a larger model of 12 billion parameters – “based on the EleutherAI pythia model family and fine-tuned exclusively on a new, high-quality human generated instruction following dataset, crowdsourced among Databricks employees.” Databricks is “open-sourcing the entirety of Dolly 2.0, including the training code, the dataset, and the model weights, all suitable for commercial use.”

The dataset, databricks-dolly-15k, contains 15,000 prompt/response pairs designed for LLM instruction tuning, “authored by more than 5,000 Databricks employees during March and April of 2023.”

The OpenAI API dataset has terms of service that prevent users creating a model that competes with OpenAI. Databricks crowdsourced its own 15,000 prompt/response pair dataset to get around this additional limitation. Its dataset has been “generated by professionals, is high quality, and contains long answers to most tasks.” In contrast: “Many of the instruction tuning datasets released in recent months contain synthesized data, which often contains hallucinations and factual errors.”

To download Dolly 2.0 model weights, visit the Databricks Hugging Face page and visit the Dolly repo on databricks-labs to download the databricks-dolly-15k dataset. And join a Databricks webinar to discover how you can harness LLMs for your own organization.

+ Comment

A capability of Dolly-like LLMs is that they can write code, specifically SQL code. That could lead to non-SQL specialists being able to set up and run queries on the Databricks lakehouse without knowing any SQL at all. 

This leads on to two thoughts: one, SQL devs could use it to become more productive and, two, you don’t need so many SQL devs. Dolly could reduce the need for SQL programmers in the Databricks world.

Extend that thought to Snowflake and all the other data warehouse environments and SQL skills could become a lot less valuable in future.

WANdisco wins two contract renewals

Crisis-hit WANdisco has announced two contract renewals for its live data replication software and has revealed customer names and revenue numbers.

WANdisco is investigating “massive” and “potentially fraudulent” sales reporting for 2022, with its shares suspended from AIM in March. The company said in a statement earlier this month it had seen a 90 percent shortfall in bookings of $127 million – they were actually $11.4 million. Revenue for 2022 was expected to be $24 million but was actually $9.7 million, the statement added.

Against the background of the AIM suspension, recent resignations of its CEO and CFO, and the forensic accounting investigation, the news of contract renewals will be welcome.

The two renewals cover usage of WANdisco SVN MultiSite, source code management software that delivers active-active replication and LAN-speed performance over Wide Area Networks for global collaboration.

The company said yesterday that BMW Group, a WANdisco customer since 2016, had agreed to a continued multi-year license deal covering usage of WANdisco source code management tools, Subversion MultiSite Plus and Git MultiSite.

Analog Devices subsidiary Maxim Integrated, a customer since 2018, has agreed a five-year subscription license renewal.

Taken together, these two contract renewals are expected to deliver $1.5 million in revenue to WANdisco over the next five years. Approximately $1.0 million will be recognized in the 2023 financial year, with around $0.2 million to be recognized in fiscal 2024, and about $0.1 million recognized in each of the remaining three financial years.

Set against a potential bookings shortfall of $100 million-plus, these are small-scale deals but encouraging nonetheless.

ExaGrid claims uptick in customers in Q1

Privately held ExaGrid claims it had a record-breaking first 2023 quarter, with a customer count that’s now risen past 3,800.

These customers are using ExaGrid’s Tiered Backup Storage, purpose-built and scale-out backup appliances, with backups stored in a fast ingress and restore landing zone until being deduplicated for longer-term retention. ExaGrid competes with Dell’s PowerProtect and similar systems from HPE (StoreOnce), Quantum (DXi) and Veritas (NetBackup Flex) and, latterly, all-flash backup systems such as Pure’s FlashBlade.

President and CEO Bill Andrews said: “ExaGrid prides itself on having a highly differentiated product that just works, does what we say it does, is sized properly, is well supported, and just gets the job done. We can back up these claims with our 95 percent net customer retention, NPS score of +81, and the fact that 92 percent of our customers have our Retention Time-Lock for Ransomware Recovery feature turned on, and 99.2 percent of our customers are on our yearly maintenance and support plan.” 

ExaGrid growth history
ExaGrid’s growth history

We don’t get to see actual revenue numbers but ExaGrid claimed it experienced:

  • A 74.5 percent competitive win rate for the quarter
  • 141 new customers taking the customer count past 3,800
  • Over 60 six-figure new customer deals and three seven-figure new customer deals
  • Sales and support teams in 30 countries and customer installations in over 80 countries
  • Company has been cash, EBITDA, and P&L positive over the last 10 quarters
  • 40 percent of revenue is ARR

Andrews said: “ExaGrid is continuing to expand its reach and now has sales teams in over 30 countries worldwide and has customer installations in over 80 countries. We have also added dedicated sales teams for the large enterprise and large IT Outsourcer organizations. Outside of the United States, our business in Canada, Latin America, Europe, the Middle East, Africa, and Asia Pacific is rapidly growing.”

ExaGrid customer count, large deals

Andrews sent out an internal memo claiming ExaGrid has “the fastest ingest for the fastest backups and the shortest backup window.” It features the “Veeam Data Mover, Landing Zone (no inline dedupe), job concurrency for parallel backup jobs, Veeam SOBR (Scale-Out Backup Repository) for front-end performance load balancing, encryption with self-encrypting drives versus encryption in software, optimized file system for large backup jobs, etc.”

This means backup and restore speed, with Andrews claiming it beat a rival “in side by side testing by 2X,” adding: “They are not optimized for large backup jobs, they don’t use the Veeam Data Mover, they don’t do job concurrency, etc. etc. etc. It is not about the storage, it is about the software before the storage.”

OpenAI operating on Cohesity data structures

Data protector Cohesity is going to provide its data structures to OpenAI so that generative AI can be applied to threat and anomaly detection, potentially combating the ransomware plague.

Microsoft has invested in OpenAI and this Cohesity involvement is part of a suite of data protection integration activities between Cohesity and Microsoft, aimed at strengthening the appeal of Azure cloud services and Cohesity’s SaaS activities to enterprise customers. Microsoft has said it will invest $1 billion in Open AI and has gained exclusive rights to use OpenAI’s GPT-3 technology in its own products and services. Cohesity and Microsoft have also unveiled three product/service integrations and made two Cohesity services available on Azure.

Sanjay Poonen

Cohesity president and CEO Sanjay Poonen supplied a quote: “Cohesity is integrating with Microsoft’s broad platforms across security, cloud and AI – all in order to help joint customers secure and protect their data against escalating cyberattacks.

“This expanded collaboration will make it simple for thousands of Microsoft customers and ecosystem partners to access Cohesity’s award-winning platform.”

The three integrations are:

  • Cohesity’s DataProtect backup and recovery on-prem product and backup as a service offering, boh integrate with Microsoft Sentinel, a cloud-native security information and event management (SIEM) platform.
  • Cohesity integrates the Azure Active Directory and multi-factor authentication (MFA) to manage and access Cohesity products, including Cohesity Data Cloud and Cohesity Cloud Services.
  • Cohesity data classification is powered by BigID and BigID has integrated with Microsoft Purview to provide data discovery, privacy, security and governance intelligence.

Cohesity’s Fort Knox SaaS cyber vaulting offering is now available on Azure in preview form with general availability due in months. The Cohesity DataProtect Backup-as-a-Service (BaaS) offering supports Microsoft 365 and customers’ M365 data can be backed up to an Azure target destination.

Poonen’s company says it is already using AI to help customers detect anomalies indicative of an operating cyberattack, and believes AI can be used to analyze vast volumes of data at speed. It could enable IT and security operations to detect and respond to a security breach faster, with improved accuracy, and in a more rounded way, the firm claims.

Having an AI tool operate within a Cohesity data structure to better detect and respond to ransomware and other malware attacks could be a large improvement on current approaches.

Phil Goodwin, an IDC research VP, said: “We think integrating with Microsoft Azure will help Cohesity and its customers to stay a step ahead of cyber criminals through more intelligent security now and with other interesting use cases to follow.” 

It will also help Cohesity make progress against competition from Rubrik and other cyber-security suppliers. A Cohesity-Microsoft webinar will provide more information about the two’s integration and joint AI activities.