Open source data lakehouse Dremio has a new boss

By

-

July 6, 2023

Open source data lakehouse supplier Dremio has hired ex-Splunk chief cloud officer Sendur Sellakumar as CEO and president.

Dremio was founded in 2015 and has grown rapidly as the need for analytics has become widespread and data warehouses were found to be too restrictive. It has taken in $410 million in VC funding with three consecutive rounds in 2020 ($70 million), 2021 ($135 million) and 2022 ($160 million). But, as we’ll see below, Databricks has raised a whole lot more.

The large language model hype should help Dremio grow. It has been led by Billy Bosworth since February 2020 but he quietly departed in February this year to become a managing director at Vista Equity Partners.

Tomer Shiran, Dremio’s co-founder and CPO (co-founder Jacques Nadeau left in 2021), said: “We are thrilled to welcome Sendur Sellakumar to Dremio as our new CEO. Sendur’s exceptional leadership skills and extensive background in the technology sector make him the perfect fit to lead Dremio into its next phase of growth. His strategic mindset and customer-centric approach align with our vision for the company.”

Sendur Sellakumar, Dremio — *Sendur Sellakumar*

Sellakumar said: “I am honored and excited to join Dremio as its CEO. Dremio’s innovative approach to enterprise analytics and its commitment to empowering organizations with fast, flexible, and reliable access to their data is truly impressive. I am looking forward to working closely with the talented Dremio team to further accelerate the company’s growth and deliver exceptional value to our customers. We are committed to helping enterprise customers realize the value of their data in driving business outcomes.”

He spent nine years at Splunk and was a VP for investment banking at Morgan Stanley before that. There was a 17-month period at ServiceTitan in the SVP Commercial and Construction role, between Splunk and the new Dremio gig. ServiceTitan sells software for home and commercial service and construction contractors.

Billy Bosworth, formerly of Dremio — *Billy Bosworth*

Dremio competes intensely with Databricks, which just bought generative AI startup Mosaic for $1.3 billion. Databricks has vastly more funding than Dremio – an astonishing $3.6 billion with $2.6 billion of that raised in two rounds in 2021. In the 2020-2022 period Dremio pulled in $365 million, still nothing to sneeze at.

Dremio told us: “In February of 2023, due to personal reasons, Billy Bosworth transitioned from the CEO role back into an Executive Board Member role, which is where he started with Dremio several years ago. Between February 2023 and Sendur’s appointment, Edward Sharp, Dremio’s CFO, was Dremio’s acting CEO in the interim.”

After Bosworth left, Dremio’s chief revenue officer, Alan Gibson, went in March, turning up at data science and AI learning system supplier DataCamp as its CRO in May. Dremio CFO Edward Sharp is the interim head of sales. Sellakumar has a CRO recruitment process to complete as well as devising a strategy to compete with rivals.

WANdisco delays footwork to raise funds

By

Chris Mellor

-

July 5, 2023

WANdisco has had to delay completion of its fundraising share placement because it discovered its company registration in the British Crown Dependency of Jersey doesn’t let it issue as many shares as it needs to without further approval.

The company needs $30 million from the share placement, which is now delayed until the end of July if shareholders approve. All of this because its sales leadership failed to detect a rogue senior rep allegedly making out vast fantasy orders during 2022.

The company is registered such that it has an authorized share capital of £10 million divided into 100 million ordinary shares of £0.10 each. The fundraising involved issuing 114,726,069 ordinary shares, 14,725,069 too many. These extra shares would be invalid if the fundraising went ahead without an increase in the authorized share capital.

Jersey company law says such a share increase needs approval by a formal general meeting of the shareholders. So WANdisco will run a formal general meeting on July 24 at 10:30am to seek approval of a resolution to raise the authorized share capital to £30 million divided into 30 million ordinary shares of £0.10 each. It will the amend its formal Jersey company memorandum accordingly. Then the fundraising share placement can go ahead.

The timing for the lifting of its AIM stock market suspension moves back to July 25 at 7:30am, with market admission and share trading starting later that day.

The fundraising is conditional on publication of WANdisco’s 2022 report and accounts, the lifting of the AIM suspension, and the general meeting voting in favor of the share capital increase.

That vote will surely be a formality. Hopefully nothing else will go wrong and WANdisco will get its $30 million to help the company move on.

Storage old guard needs to have its cache and eat it too

By

Chris Mellor

-

July 5, 2023

Comment: Legacy external hardware-based storage suppliers need to adapt to technology threats to their businesses from three incomers – Infinidat, Pure Storage and VAST Data.

Update. Dell PowerMax gen 2 uses SCM for metadata. 6 July 2023.

Dell, Hitachi Vantara, HPE, Huawei, IBM and NetApp have proved adept at adopting or colonizing threats to their products. The NVMe all-flash array threat was dealt with by adopting NVMe technology and not a single standalone NVMe array startup has survived.

Public cloud storage has not killed on-premises external storage either. In fact legacy suppliers are colonizing it with their own file and block offerings as well as absorbing its pay-as-you-go business model for their on-premises users. They are all either looking at the public cloud as another place to run their storage software or tiering data to it. We can think of NetApp as leading this cloud colonization charge with Dell set on the same idea of having a storage environment span the on-premises and public cloud worlds.

Dell (APEX), HPE (GreenLake) and NetApp (Keystone) are all adopting public cloud-style business models, cloud management facilities as well as porting their storage array software to the public cloud.

Object storage has not limited file storage’s market share. File storage suppliers have added object storage technology to their product set and object storage suppliers have brought file interface layers to their products and life goes on as before.

HCI (server SAN) has not taken over the storage world. It has built its niche – witness VMware vSAN and Nutanix – and there it stays, co-existing with the external storage its evangelizing originators intended to replace.

But now the six main legacy storage players face three competitors – Infinidat, Pure Storage and VAST Data – and are not responding to them in technology adoption terms at all, with a single exception.

Infinidat uses memory caching and clever cache pre-fetch technology to produce its highly performant InfiniBox arrays, either disk or SSD-based, and has built a great high-end array business that no one is directly responding to. None of the six legacy players have embraced Infinidat’s memory caching or pre-fetch software and Infinidat is basically left alone to grow its business, apart from the normal tactical per-bid competitive moves.

Similarly for Pure and its technology. A mainstay is its use of proprietary Direct Flash Module (DFM) flash drives whereas the legacy players, with one exception, use off-the-shelf SSDs in their all-flash arrays. Hitachi Vantara used to have its own flash drive technology but reverted to commercial SSDs.

IBM has its own proprietary flash drives as well, FlashCore Modules, but these are not being used as Pure uses its DFMs to take IBM’s FlashSystem sales higher. We say that because IBM’s storage hardware market share is flat or falling and has been overtaken by Pure Storage.

Pure is aggressively growing its business with things like non-disruptive upgrades, the Evergreen business model and QLC flash adoption. Suppliers in the legacy six are adopting elements of this but the core differential, the DFMs, remain unaffected. And Pure promises to ramp up their density faster than off-the-shelf SSDs, thus strengthening its advantage.

Like Infinidat, Pure’s core technology does not face much competition. IBM has the hardware base, the FlashCore Modules, to provide strong competition but does not seem to be doing so.

VAST Data has sprung onto the external array stage in the last few years and is growing faster than Pure did at the same stage in its development. It relies on its DASE (DisAggregated Shared Everything) architecture, single-tier QLC SSDs, and use of storage-class memory (SCM) to store metadata and buffer incoming writes, and is making huge inroads into the market. DASE and SCM use have not been adopted by the legacy six and so VAST, like Infinidat and Pure, is left alone with its technology advantage to win many more bids than it loses.

Except by HPE, which is now OEMing VAST technology.

Interestingly, both Pure and VAST have a disadvantage porting their array software to the public cloud. Neither AWS, Azure nor Google support Pure’s Direct Flash Module Drives so Pure’s software has to run on cloud instances using commodity hardware. Similarly, none of the cloud titans offer storage instances based on storage class memory, hence VAST software ported to the cloud could not use it.

The legacy players could adopt memory caching and pre-fetch algorithms for their existing arrays. It’s only software, but switching to proprietary flash drives would be a major change. It’s virtually impossible. Pure will surely not find that part of its technology advantage adopted by the legacy players, apart from IBM, which has it already. The other legacy players could adopt host-level drive management, though. Again, it’s software and hence more feasible.

VAST is in a similarly defensible position as having existing filers adopt a DASE architecture involves wholesale redesign. More likely is that the legacy vendors will explore development of their own DASE/SCM technologies and, if successful, bring out a new product line.

Such things can be done. Look at Quantum which has only recently introduced its Myriad unified file and object storage software running atop commodity all-flash hardware.

Intuitively, we would expect Dell and Huawei to be among the first to do this.

Bootnote

A source close to Dell told me: “The 2^nd gen Dell PowerMax uses SCM memory for metadata. This is documented in the Product Guide. It is interesting given the announced death of Optane memory. I assume Powermax does this to avoid the penalty of increasing battery requirement to support the vaulting architecture as inherited from EMC Symmetrix and VMAX.”

Researchers devise even faster 3D DRAM

By

Chris Mellor

-

July 5, 2023

Tokyo Institute of Technology scientists have designed a 3D DRAM stack topped by a processor that they say can provide four times more bandwidth and one fifth of the bit access energy than High Bandwidth Memory (HBM)

HBM sidesteps the CPU-restricted socket count limitation on DRAM capacity by connecting small DRAM stacks to the CPU via an interposer layer. An individual DRAM die is connected to ones above or below by microbumps (connectors) with connecting holes (through-silicon vias or TSVs) passing through a die to connect microbumps, in a Bumpless Build Cube 3D (BBCube3D) concept.

Professor Takayuki Ohba, the research team lead, said: “The BBCube 3D has the potential to achieve a bandwidth of 1.6 terabytes per second, which is 30 times higher than DDR5 and four times higher than HBM2E.”

The researchers thinned each DRAM die and did away with the microbumps in their BBCube3D wafer-on-wafer (WOW) design. This can enable a memory block running at a higher speed and lower energy than either a DDR5 or an HBM2E (High Bandwith Memory gen 2 Extended) design as they run hotter and bumps add resistance/capacitance and delay.

HBM microbumps take up space and a die has to be stiff enough to withstand the pressure when the stack layers are bonded together. By eliminating them, each memory die can be made thinner and the TSVs shorter, which aids thermal cooling. There is no need for an interposer in the BBCube3D design as a processing unit, CPU or GPU, is bonded directly to a cache die which itself is bonded to the top of the DRAM stack.

The researchers say: “TSV interconnects with a short length provide the highest thermal dissipation from high-temperature devices such as CPUs and GPUs … high-density TSVs act as thermal pipes, and, hence, a low temperature, even in a 3D structure, can be expected.”

The BBCube “allows high bandwidth with low power consumption because of the short length of TSVs and high-density signal parallelism.”

Crosstalk in the layered DRAM was reduced by ensuring adjacent IO lines were out of phase with each other by adjusting their timings. This is called four-phase shielded inputs/outputs and means an IO line is never changing its value at the same time as its immediately neighboring lines.

Its speed and energy use were compared to those of DDR5 and HBM2E memory technologies. The chart shows a 32x bandwidth increase over DDR5 memory and a 4x speedup over HBM2E. At the same time the BBCube 3D design achieved a lower access energy rating than both DDR5 and HBM2E as well.

Ohba said: “Due to the BBCube’s low thermal resistance and low impedance, the thermal management and power supply issues typical of 3D integration can be relieved. As a result, the proposed technology could reach a remarkable bandwidth with a bit access energy that is 1/20th and 1/5th of DDR5 and HBM2E, respectively.”

This BBCube 3D is a university-level research project. A whole lot of detailed background information about the project can be found in an MDPI Electronics paper, “Review of Bumpless Build Cube (BBCube) Using Wafer-on-Wafer (WOW) and Chip-on-Wafer (COW) for Tera-Scale Three-Dimensional Integration (3DI). It says: “BBCube allowed stacking of 4-times more dies than HBM. This allowed the memory capacity to reach 64GB using 16Gb DRAM dies.”

It also said “terabit-capacity 3D memory can be realized by stacking 40 layers” of DRAM.

The BBCube 3D concept is described in a paper entitled “Bumpless Build Cube (BBCube) 3D: Heterogeneous 3D Integration Using WoW and CoW to Provide TB/s Bandwidth with Lowest Bit Access Energy” presented at a June 2023 IEEE 2023 Symposium on VLSI Technology and Circuits.

WANdisco raises funds for new dancefloor turn

By

Chris Mellor

-

July 4, 2023

Ailing WANdisco has pulled off the key event necessary for its recovery – raising $30 million – and is applying to have its share trading suspension lifted.

The replication software company faced running out of cash later this month following a devastating deception by a senior sales exec who inflated their sales orders in 2022 such that reported $24 million revenues were actually $9.7 million. The discovery of this in March caused exec exits, share trading suspension, forensic accountancy, and the recruitment of CEO Stephen Kelly and CFO Ijoma Maluza.

Investors faced the company collapsing and losing their cash. Incoming chairman Ken Lever brought in Kelly, a major WANdisco investor himself, and Maluza, to set about discovering the true state of the business, stabilizing it, and working out how, and if, it could recover and move forward.

Stabilization included a 30 percent headcount cut, about 50 people, and other cost-saving measures to reduce its cost base from $41 million to around $25 million.

Top brass also had to persuade the existing investors that recapitalizing the company was the most practical way to not only restore the value of their shares but potentially increase it. Prior to suspension, shares traded at £13.10 ($16.68). Persuasion worked and a total of 21,566,527 new Ordinary shares, priced at 50p ($0.64) each, were placed, the company said today. The stock was not available to the general public as WANdisco is a Jersey-incorporated company and Jersey could not provide the necessary documents for a retail (public) cash raise to take place before WANdisco ran out of money.

Kelly himself bought 850,000 of the new shares. Chairman Ken Lever put in for 200,000 while CFO Maluza bought 3,000. Co-founder, ex-chairman and ex-CEO Dave Richards, who led the company until earlier this year, owned 2.7 percent of WANdisco’s ordinary shares, 1,836,867 of them, at the end of 2022. It is not known if he contributed to the fundraising.

WANdisco now has to pick itself up by the bootstraps. It will publish its formal 2022 accounts in a day or so and expects share trading to resume on July 7.

Kelly and Maluza are working on a turnaround plan “to reposition the business for the future and which aims to deliver organisational stability, credibility, customer satisfaction and revenue growth.” They say WANdisco has two differentiated product sets: Application Lifecycle Management (ALM) and Data Integration (live data replication) which includes migrating data to the public cloud.

“The key growth for the Group is expected to come from Data Migration with Tier 1 Cloud Partners,” the company said.

The Data Migration products have a claimed petabyte scale, zero latency, heterogeneous cloud environment capability and a fast time to value. They are supported by 91 patents (registered and pending).

Fiscal 2023 revenues will be impacted by the turnaround plan but “strong year on year growth [is] expected in FY24, and cash break even targeted to occur during FY24 to FY25.“

Kelly is working on renaming the company as the WANdisco brand is now tarnished by recent events. The results of this work should be revealed by the company’s next AGM.

Bootnote

WANdisco has never made a profit in the 20 years since its 2005 founding. It went public on the UK AIM market in June 2012 with shares priced at £199 ($253), rising to an incredible £1,520 ($1,934) in December 2012. The hype started dissipating with a price plunge starting in December 2014. Shares fell to £284 ($361) a year later, and £176 ($224) in October 2016. Those seem like heady days now.

Storage news ticker – July 4

By

Chris Mellor

-

July 4, 2023

Data protector Acronis has released its Mid-Year Cyberthreats Report, From Innovation to Risk: Managing the Implications of AI-driven Cyberattacks. It found that there were 809 publicly mentioned ransomware cases in Q1 2023, with a 62 percent spike in March over the monthly average of 270 cases. 30.3 percent of all received emails in the quarter were spam and 1.3 percent contained malware or phishing links. Phishing remained the most popular form of stealing credentials, making up 73 percent of all attacks. Business email compromises (BECs) were second at 15 percent.

…

Analytics supplier Amplitude has launched a Snowflake-native product that utilizes Snowpark Container Services (private preview) to enable analysis inside a customer’s Snowflake instance. Capabilities and integrations include:

Automated instrumentation and reporting: Instantly track user activity and baseline metrics like daily active users, average session length, and geolocation with Amplitude’s enhanced SDK.
Industry-specific reporting templates: Customers can discover product insights tailored to their business with reporting templates that include industry-specific key metrics.
AI-assisted data governance: Maintain and continually improve data quality with automation and intelligent suggestions.
Expanded partner integrations: Quickly send data to other parts of your stack with expanded integrations with Braze, Hubspot, Intercom, Marketo, and more.

Sign up for the Early Access Program for Snowflake-native Amplitude here.

…

Data protector Cobalt Iron has been awarded US Patent 11636207 which enables automated health remediation of various failures and conditions affecting storage devices and backup operations. The patent will enable the discovery of interdependencies between various components of a backup environment (such as storage devices at multiple locations including the cloud), monitoring of failures and threat conditions, impact analysis to interrelated components, and automated health remediation actions. It incorporates health issues specific to storage devices that will automatically trigger remediation actions, and will be implemented in Cobalt Iron’s Compass enterprise SaaS backup offering.

…

Data integrator Crux has expanded its partnership with Databricks by adding pre-integration services to its existing Partner Connect integration. It is connecting 75 financial and alternative data sources to the Databricks Marketplace, making it easier for consumers to access analytics-ready external datasets. Through Databricks Delta Sharing, data suppliers partnered with Crux are sharing traditional financial datasets (including credit risk, stock exchange/equity pricing, corporate actions reference, and index data) and alternative datasets (including sentiment and ESG) securely with customers.

…

Data lakehouse supplier Dremio has announced new features for querying, performance, and compatibility enhancements that include:

Iceberg table optimization with SQL commands such as OPTIMIZE, ROLLBACK and VACUUM to optimize performance and streamline data lake management.
40 percent better data compression with native Zstandard (zstd) compression.
Tabular UDFs: Tabular User-Defined Functions enable users to extend the native capabilities of Dremio SQL and provide a layer of abstraction to simplify query construction.
New mapping SQL functions: CARDINALITY returns the number of elements in a map or list and helps customers moving array workloads from Presto and Athena; ST_GEOHASH returns the corresponding geohash for the given latitude and longitude coordinates; FROM_GEOHASH returns the latitude and longitude coordinates of the center of the given geohash. Both geohash functions help customers move workloads from Snowflake, Amazon Redshift, Databricks, and Vertica. Geohashing guarantees that the longer a shared prefix between two geohashes is, the spatially closer they are together.
Dremio now supports multiple Delta Lake catalogs including Hive Metastore and AWS Glue, providing a unified data lake experience across the organization.

…

Struggling data protector FalconStor is to move from the OTCQB tier to the Pink Current tier, both operated by the OTC Market Group. The public compliance costs are too high compared to its revenues. FalconStor’s board has determined that the burdens associated with operating as a registered public company outweigh any advantages to the company and its stockholders. It will continue to provide sufficient information to its shareholders in order to continue enabling a trading market for its common stock within the OTC Pink Current trading market. This means FalconStor will no longer have to issue quarterly revenue reports.

…

Tom’s Hardware reports that Intel has extended the end-of-life period for its Optane PMem 100 drives from September 30 to December 29 this year. An Intel statement to customers said: “Customers are recommended to secure additional Optane units at the specified 0.44 percent annualized failure rate (AFR) for safety stock. Intel will make commercially reasonable efforts to support last time order quantities for Intel Optane Persistent Memory 100 Series.”

…

SaaS cloud data protector Keepit, with a blockchain-verified system, has announced a backup and recovery service for Microsoft Azure DevOps. It protects Azure DevOps Boards (including work items, boards, backlogs, team sprints, queries, and delivery plans), Pipelines data and metadata. Keepit provides immutable storage of data in its ISO 2700-certified private cloud, providing long-term immutable archive or escrow copy of sensitive ADO data.

…

Information management supplier M-Files says it’s been recognized by customers as a Strong Performer in the June 2023 Gartner Peer Insights ‘Voice of the Customer’: Content Services Platforms report. It’s one of four vendors in the report to receive the distinction. You can download a complimentary copy of the report here (registration required).

…

The Protocol Labs/Filecoin Distributed Storage Alliance (DSA) has launched Golden Gate, an advanced software and reference configuration which it says reduces the costs associated with decentralized file storage by almost half (40 percent). The Filecoin Network relies on “sealing”; computations performed on the data being onboarded in order to ensure network security and enable cryptographic proving, verifiability, and data immutability. Open source Golden Gate directly addresses this issue and reduces the server costs associated with sealing by up to 90 percent. The only known DSA members are Protocol Labs, the Filecoin Foundation, AMD, Seagate, and consultancy EY.

…

Storage supplier Solidigm is talking up its coming 60TB QLC SSD in a E3.S enclosure, saying Supermicro’s Petascale server platform will use it to provide a petabyte in its 1RU form factor. These Supermicro servers have 16 x E3.S slots. Therefore we can say that an individual Solidigm E3.S QLC drive will have a 62.5TB capacity or, more likely, 60TB with the petabyte claim being a rounding-up calculation. DDN talked about coming 60TB SSDs in May. Solidigm has clearly been talking to its OEM partners. We expect it to formally announce its 60TB SSDs in a few weeks.

…

Swissbit has introduced e.MMCs and SD memory cards with capacities ranging from 4 to 8 GB using 3D-NAND in TLC format. It says they’re optimized for power failure protection, industrial reliability, and high endurance, and rival the performance of more expensive SLC models and outperform corresponding MLC variants. There are EM-30 and S-56(u) Series products.

Swissbit SD storage — *Swissbit’s EM-30 and S-56(u) Series products – not to scale*

The EM-30 product series (153 ball BGA) complies with e.MMC-5.1 specifications, and is designed for an extended temperature range of -40°C to 85°C and is suitable for use in harsh environmental conditions. The range offers sequential data rates of up to 280MBps read and 120MBps write, as well as 11k IOPS random read and 16k IOPS random write.

The S-56(u) SD and microSD memory cards attain sequential read and write data rates of 95MBps and 74MBps respectively. The cards are optimized for high-performance applications with random read IOPS of 2,200 and random write IOPS of 1,300. With advanced over-provisioning and pSLC technology, the S-56 series offers the highest endurance for write-intensive workloads. Combined with up to 100,000 PE cycles, its endurance for small data logging even surpasses that of an SLC card for the first time.

…

Peter Lieberwirth, Toshiba — *Peter Lieberwirth*

Toshiba Electronics Europe GmbH has announced Peter Lieberwirth as president and CEO, effective July 1. He takes over from Tomoaki Kumagai, who has been promoted to a global role as Corporate Vice President, General Executive, Global Strategy & Business Development, Japan.

…

Weebit Nano and SkyWater have announced that Weebit’s ReRAM IP is now fully qualified in SkyWater’s 130nm CMOS (S130) process. The Weebit ReRAM IP was qualified for:

High endurance: 10K flash-equivalent cycles
Data retention: 10 years at industrial grade temperatures
Retention after cycling, exhibiting robust lifetime performance
3x SMT solder reflow cycles

Weebit ReRAM technology in S130 is an ultra-low power, radiation tolerant and fast NVM that companies can use in developing highly integrated SoCs for applications including analog/mixed-signal, IoT, automotive, industrial, medical and more.

Cubbit storage is a hybrid of Web2 and Web3

By

Chris Mellor

-

July 4, 2023

Cubbit uses Swarm technology – a term also used by the DataCore-acquired Caringo – for its object storage technology.

We asked the startup more about this, as well as some questions about how it relates to Web3 decentralized storage providers such as StorJ and Filecoin (Protocol Labs).

Web3 storage typically uses cryptocurrency and blockchain tech to present a file and object storage resource that’s distributed across a network of individual providers, using their spare capacity, that are presented as a single storage repository. Their stored data, fragments of a file or object, is validated with blockchains, and they are paid for storing data and making capacity available in cryptocurrencies.

Alessandro Cillario, Cubbit — *Alessandro Cillario*

A Q &A with Cubbit co-CEO and co-founder Alessandro Cillario about its DS3 product follows.

Blocks & Files: Is Cubbit a Web3 distributed storage supplier?

Alessandro Cillario: Cubbit’s solution is not fully decentralized. There is a centralized component, the Coordinator, which manages metadata, S3 gateways, and Swarms (clusters of storage nodes). This gives the user better control and is easier to manage.

Blocks & Files: How does Cubbit distinguish its offering from that of Storj, which also uses existing datacenter resources for its distributed storage?

Alessandro Cillario: Some of the basic concepts are similar, but our solution is much more sophisticated in one critical aspect.

Cubbit enables data sovereignty for users, through utilizing public or private Swarms. Public Swarms can be considered as ‘regions’, clusters of capacity nodes concentrated in a single ‘country’. Concurrently, we can build private Swarms. This empowers users with full control over their infrastructure and costs, while at the same time taking advantage of a cloud storage service.

There is much more than that, and more details will be revealed in October.

Blocks & Files: Does Cubbit use blockchain technology?

Alessandro Cillario: No, because for our product and architecture, blockchain wouldn’t really bring any major advantages.

Blocks & Files: How does Cubbit differentiate itself from Web3 storage suppliers such as Filecoin?

Alessandro Cillario: While most Web3 providers have a good price point for raw storage, Web3 is lacking on the performance side. In addition, enterprise solutions require S3 to operate, which results in an even slower object store. The alternative is to use native APIs and risk vendor lock-in. No matter how its use is intended, the number of potential use cases is very limited. Today, enterprises want more flexibility and control.

Flexibility, edge, multi-cloud, end-to-end control over cost, infrastructure, and data are Cubbit’s differentiators, and this is true also when we compare our solutions against traditional on-premises and cloud object stores.

Blocks & Files: What is Cubbit’s performance when storing files and when retrieving files?

Alessandro Cillario: Our customers report good performance from the public gateways Cubbit provides, in line with cloud storage services that boast speed of access as one of their primary capabilities.

Blocks & Files: How does Cubbit’s Swarm technology relate to DataCore’s Caringo-based Swarm software?

Alessandro Cillario: The name could create confusion, but the technology is entirely different. Cubbit is a decentralized storage solution, with nodes that can be dispersed over large distances. Traditional object storage solutions, like Datacore Swarm, are scale-out clusters that can work only when the nodes are very close to each other.

In the end, our customers love Cubbit DS3 because we give them the flexibility of a cloud solution with all the control they need over data, infrastructure and costs.

There’s more on Cubbit and its customers here.

Cloudian sacrifices some snapshots for software update

By

Chris Mellor

-

July 3, 2023

Object storage supplier Cloudian has released an update to its HyperStore object storage software, version 7.5.1. The new release focuses on improving security, protection, and search capabilities while temporarily removing snapshots.

Cloudian introduced HyperFile functionality in 2017 after acquiring Infinity Storage, an Italian firm, to expand file access and cloud storage capabilities in HyperStore. Caterina Falchi, the CEO and founder of Infinity Storage, joined Cloudian as the VP of File Technologies.

Although there was no formal announcement of the release, Cloudian’s CMO, Jon Toor, shared details in a blog post. He highlighted that HyperStore 7.5.1 introduces HyperStore File Services – a streamlined approach to data management. This update seamlessly integrates file cache functionality into the Cloudian HyperStore object storage system, allowing users to access files quickly from either all-flash or HDD-based servers, creating a scalable central repository.

Cloudian HyperStore — *Cloudian diagram of HyperStor v7.5.1 FIleServices*

In 2018, HyperFile supported SMB (CIFS)/NFS, snapshot, WORM, non-disruptive failover, scale-out, POSIX compliance, and Active Directory integration. The File Services software has been re-written to be containerized and is being more closely integrated with HyperStore but, in the process, some functionality has temporarily gone away while new features are added.

Version 7.5.1 introduces support for VSS (Volume Shadow Copy Services) snapshots for file versioning, but other snapshots are no longer available in this release. Snapshot functionality is planned for future updates, along with geographic distribution support, which will enable multiple sites to share a common file and object namespace with a global file cache.

Regarding the changes, Jon Toor told us that unified management is the most significant development. “Object and file are now managed via a single UI rather than separate UIs. As a unified platform, it is now also fully monitored via Hyper IQ, our observability and analytics platform. The underlying architecture is also now fully K8s based. This foundation will allow us to do further hardware integration down the road. Right now, object and file use separate hardware nodes, but with common management.”

Cloudian plans to make a public announcement about the release in September, along with the introduction of additional features in version 8.0 of HyperStore. The new software features support for converged bimodal data access (file + S3 object) and cloud integration. HyperStore File Services software can run on a 1U HyperStore File Appliance 1100 system or a virtual machine, providing caching and storage with hot-swap SAS drives. Licensing is based on the capacity of the file caching node.

A Cloudian FAQ says: “HyperFile is a NAS controller that delivers SMB(CIFS) and NFS file services, employing Cloudian HyperStore object storage as its underlying storage layer. This is analogous to a traditional NAS system where a controller employs SATA drive shelves as its storage layer.”

We think this is a fairly stretched analogy as HyperStore File Services runs as a separate software entity that talks to a HyperStore object cluster with its own controllers (object storage nodes) talking to storage drives – two levels of controller software in other words.

The File Services software is available in basic and enterprise modes. The basic license includes SMB/NFS support, POSIX compliance, Active Directory/LDAP integration, write-file caching using NVMe SSD, read-file caching via NVMe SSD or HDD, VSS, and file access from S3 and vice versa. The enterprise license adds active:active high-availability, non-disruptive failover, and planned features such as snapshotting for single files or entire file systems and geo-distribution support.

Encryption, search and replication or erasure coding choice

V7.5.1 of HyperStore adds integration with third-party encryption key managers via KMIP (Key Management Interoperability Protocol ) support. It has been tested with Thales CipherTrust, Fortanix Data Security Manager, and Hashicorp Vault, and the Cryptsoft KMIP client has been integrated into HyperStore.

Cloudian has used the open-source OpenSearch tool to provide search and analytics for HyperStore.

It has also added a dynamic storage feature to select an object storage size, with data up to that size being placed in buckets with replication used to protect their contents. Objects above that size are put into buckets using erasure coding for protection.

Replication is computationally simple and read latency is lower than for erasure-coded objects. As object size is typically small – kilobytes – network transmission time is so short that any latency delays can be apparent.

Erasure coding is computationally more intensive than replication but needs less bandwidth to stripe its data across multiple devices. Erasure-coded object read I/O has a longer latency than replication but the larger objects – terabytes – need a longer transmission time which masks the latency difference from replication.

Replication is suited to small objects, with erasure coding better for large objects as it saves space over replication and latency matters less.

You can try out Cloudian’s new software with a free trial.

Generative AI has democratized AI – what does this mean for COEs?

By

Clint Boulton, Dell Technologies

-

July 3, 2023

Commissioned: To centralize or decentralize? That was once the salient question for many enterprises formulating a strategy for deploying artificial intelligence. Whether it was nobler to support a singular AI department or suffer the slings and arrows that accompany distributed AI projects.

Tectonic shifts in technology can render such debates moot. Is that happening now, as generative AI catalyzes creativity in businesses, enabling employees to create texts, images and even software code on the fly?

Some of these experiments are useful; others not so much. There will be more failures along the path to innovation, which is littered with the bones of fallen tech projects.

What is crystal clear: Generative AI has democratized AI consumption for enterprises in ways that previous AI applications could not. The genie, in its many forms and functions, has shot out of the bottle.

The way it was

To streamline and curate our AI competency or allow projects to roam unchecked and hope for the best? It’s a fair question, with mixed approaches.

Over the years, some organizations consolidated AI capabilities in one department, often established as an AI center of excellence (COE). The COE was often composed of database engineers, data scientists and other specialists trained in querying machine learning (ML) models.

The inverse of COEs was highly decentralized. In classic, do-it-yourself fashion, business leaders experimented with some tools in the market on AI projects that might eventually foster innovation. Naturally, these projects tended to be more rudimentary than those created by COE members.

Both approaches had their pros and cons.

Centralizing AI functions afforded organizations the ability to dictate strategy and policy and control costs, thereby reducing risks. But COEs’ dedication to rigorous processes had its drawbacks. Typically, the COE received specifications and built a deliverable over several months. Over a long enough timeline, the goal posts moved. As data grew stale, the output rarely resembled the desired outcome.

Conversely, distributed AI functions granted business experts the freedom to quickly experiment and explore so that data remains fresh and current. Projects may have lead to some insights that were harder to cultivate in an AI COE, which lacked the domain expertise of a business line.

However, ad-hoc efforts often resulted in projects with no demonstrable ROI for the business. And lacking the kind of guardrails present in a COE, these efforts were often risky to the business.

How organizations approached AI varied from business to business, based on leaderships’ philosophies and appetite for risk, which are informed by internal capabilities and competencies.

Generative AI changed the paradigm

The arrival of generative AI clarifies the question of whether to centralize or distribute AI functions.

Today, average Joes and Janes interface directly with AI technologies using natural human language rather than special tools that query AI models.

Knowledge workers create cogent texts using Google Bard and ChatGPT. Graphic designers craft new image content with DALL·E and Midjourney. Software developers write basic apps with Copilot or Codeium.

Increasingly, employees layer these capabilities, creating mashups of text, graphic and code creation technologies to generate marketing content, analytics reports, or other dashboards – without the help of data experts who might spend months putting something more sophisticated together.

To be clear, generative AI cannot replace the expertise offered by AI COE specialists. It can’t teach somebody the intricacies of TensorFlow, the magic of Kafka, or other sophisticated tools used to query AI and ML models – yet.

Generative AI has democratized content creation as much as smartphones have facilitated access to information to anyone on the go – anywhere in the world.

Thinking through the implications

IT departments often hold the keys to many technologies, but generative AI is a different animal, requiring IT leaders to consider the impact of its use within the department and across the broader business.

As with technologies that are new to your business, you’ll huddle with C-suite peers on rules and guardrails to make sure the business and its employees are covered from a compliance, risk and security standpoint. And you’ll guard against potential lawsuits alleging content created by generative AI tools infringes on intellectual property rights and protections.

Yet this may be easier said than done for many organizations.

Fewer than half of U.S. executives surveyed by KPMG said they have the right technology, talent and governance to implement generative AI. Moreover, executives plan to spend the next 6 to 12 months increasing their understanding of how generative AI works and investing in tools. This is critical for the C-suite and board of directors, according to Atif Zaim, National Managing Principal, Advisory, KPMG.

“They have a responsibility to understand how generative AI and other emerging technologies will change their business and their workforce and to ensure they have sustainable and responsible innovation strategies that will provide a competitive advantage and maintain trust in their organization,”

To be sure, the democratization of generative AI means your rivals have ready access to these tools, too. Take care not to lose the name of action.

How will your organization use these emerging technologies to future proof your business and gain competitive advantages?

Learn more about our Dell APEX portfolio of as-a-service offerings and how Project Helix accelerates business transformation and unleashes productivity with trusted AI.

Brought to you by Dell Technologies.

Zesty aims to take on cloud cost control

By

Chris Mellor

-

June 30, 2023

Israel startup Zesty claims its software can reduce AWS EC2 instance cost by up to 45 percent by automatically managing AWS discount commitments. The Zesty Disk feature, it reckons, can save up to 70 percent on cloud storage costs by automatically shrinking and expanding block storage volumes as real-time application needs change over time. That’s because AWS users generally buy by provisioned capacity.

The Zesty Disk can boost storage IOPS and throughput performance by up to 300 percent as well, it claims. A single large volume, a GP3 Disk with 3,000 burst capacity IOPS, is turned into a virtual volume composed from multiple smaller physical volumes, each operating with 3,000 IOPS, which are added together.

How does Zesty Disk work? A downloadable Zesty Solution Brief PDF document says it: “continuously tracks usage metrics (Capacity, IOPS, and Read/Write Throughput) as well as Instance and disk metadata (such as instance type, disk type, volume names, etc) which are sent unidirectionally to Zesty’s backend.”

There are processed by an AI model which creates a behavioral profile of the instance volume, and uses this to predict the usage patterns. When the instance dictates that a change in capacity is required, the Zesty Disk backend process issues an APU request to the cloud provider to shrink or increase the capacity. It also sends an update request to a Zesty Disk collector function on the instance to adjust capacity.

Zesty was founded in 2019 by CEO Maxim Melamedov and CTO Alexey Baikov. It has raised $117.2 million in 5 funding events, including two seed, A and B rounds – the B-round in September 2022 brought in $75 million. Its software supports both AWS and Azure.

The closest competitor to our knowledge is NetApp’s Spot. This is cloud cost control software and part of its BlueXP management suite. NetApp bought Israeli startup Spot in June 2020, and claims the technology can save up to 90 per cent of public cloud compute and storage cloud expenses. These can, it said, account for up 70 percent of total public cloud spend.

Zesty’s percentage cost savings claims are more modest, but look worth investigating.

Dell-commissioned report praises APEX file services speed in AWS cloud

By

Chris Mellor

-

June 30, 2023

Color us shocked. A Dell-commissioned report has claimed that Dell’s APEX File Storage for AWS is 4.3 times faster writing data than NetApp’s Cloud Volumes ONTAP in AWS.

Update. NetApp comment added; 7 July 2023.

Prowess Consulting did the research, available as a downloadable PDF. Dell’s APEX File Storage is based on its on-premises PowerScale (Isilon as was) scale-out OneFS software. This was compared to NetApp’s high-availability configuration of its Cloud Volumes ONTAP (CVO). There were several points of comparison, but we want to focus on the performance aspect where Prowess was looking for high performance.

Prowess looked specifically at moving data into the cloud, with a home directory example: “Migrating this data into the public cloud is potentially a massive task involving months of planning, validation, migration, and re-validation. An ‘easy button’ migration tool with native data replication lets you continue to use data services that you’ve already invested in by ensuring that your data and metadata stay intact and error-free during the transfer.”

CVO has two controller nodes per cluster in HA mode. Dell APEX File Storage (AFS) has from 4 to 6 nodes per cluster and so can bring more compute power and parallel IO processing to bear.

Prowess’ test results apparently found that “Dell APEX File Storage for AWS delivers 4.3x higher bandwidth per server cluster than NetApp Cloud Volumes ONTAP HA configuration” with 100 percent sequential writes. The actual numbers were 3,930 MBps for Dell APEX and 897 MBps for NetApp CVO.

A table from the Prowess report adds more comparison points:

It also shows that the Dell AFS offering is a scale-out system whereas NetApp’s CVO is scale-up, and only two nodes in the HA configuration. Since NetApp doesn’t have a scale-out CVO there is no other Dell and NetApp comparison Prowess could make in the circumstances.

Prowess did not look at sequential read performance nor random read and write IOPS so its research is incomplete, but then it was a Dell-commissioned study and not a piece of independent research.

Bootnote

A NetApp spokesperson said: “Where Dell is coming to the reality of organizations’ hybrid cloud infrastructures several years late, NetApp has been leading the cloud storage market for the last seven years and our satisfied customers (such as Boomi and Karcher) prove the market direction we’ve set and lead is the right one.

NetApp ONTAP is the leading Cloud OS and CVO was introduced seven years ago. Today we offer Azure NetApp Files, FSx for NetApp ONTAP, and Cloud Volumes Service for GCP. NetApp is the only storage vendor to offer the simplicity and unparalleled scale of native cloud storage, which translates to integrated, simple, and highly scalable offerings built to support VMware, databases, SAP, and more.

As you know, last November NetApp launched BlueXP to deliver a single, unified management experience for storage and data services on-prem and in the cloud, which is well beyond an “easy button migration tool.” The BlueXP governance and classification feature rapidly scans volumes targeted for cloud migration to ensure correct permissions so sensitive data doesn’t get put into a cloud domain where it shouldn’t be, and customers can identify and avoid moving duplicate, stale, or non-compliant data.

Western Digital boosts Blue consumer SSD line

By

Chris Mellor

-

June 30, 2023

Western Digital has upped the bus connect from PCIe gen 3 to gen 4 for its Blue 500 series internal SSD, increasing speed by between 20-70 percent.

Blue drives are for desktops and notebooks not needing the highest performance (its Black brand drives provide that). The SN570 used the PCIe gen 3 bus and is an internal fit M.2 2280 finger-length SSD card using BiCs 5 112-layer 3D NAND in 3bits/cell (TLC) format. The drive was fitted with an SLC (1bit/cell) cache and had no onboard DRAM, needing a host memory buffer. It was launched in October 2021 and Western Digital has now updated it with a PCIe gen 4 bus connect, twice as fast as PCIe gen 3, but still using the same BiCS 5 112-layer 3D NAND.

There has been no formal announcement by Western Digital but the drive specs have appeared on its website.

Capacities are unchanged at 250GB, 500GB, 1TB and 2TB. So too are the endurance numbers over the 5-year warranty period. This means 150TB written for the entry-level 250GB product, 300TBW for the 500GB variant, 600TBW at the 1TB capacity level and 900TBW for the 2TB product. The mean time to failure (MTTF) rating has improved, though, from the SN570’s 1 million hours to the SN580’s 1.5 million hours.

Speeds, detailed in the product brief, vary with capacity:

For comparison the SN570’s speeds at the 1TB capacity point were:

Random read IOPS: 460,000
Random write IOPS: 450,000
Sequential read: 3.5GBps
Sequential write: 3GBps

The SN580 is around half as fast overall as the SN570. It has a later version of WD’s nCache technology, the use of an SLC (1bit/cell) cache to speed writes and lower TLC write amplification. This v4 nCache provides faster file copies – “blistering fast” is Western Digital’s term, but no actual numbers are provided.

There are faster PCIe gen 4 M.2 drives such as Western Digital’s own Black SN770, which runs at up to 740,000/800,000 random read/write IOPs, and has a 5.1GBps and 4.9GBps max sequential read and write bandwidth rating. But it costs a little more.

The SN580 drive draws 65mW when active and 3.3mW when in sleep mode. The Amazon prices are $27.99 for the 250GB model, $31.99 for the 500GB, $49.99 for 1TB, and the 2TB retails at $109.99.

For comparison, the Black SN770’s Amazon prices are $79.99 for 1TB, $134.98 for 2TB, and $184.98 for 4TB.

Bootnote

Western Digital’s nCache technology uses a combination of both SLC (single level cell) and TLC flash blocks to improve endurance, efficiency, and performance. By writing data to the SLC cache first, write amplification on the TLC blocks is decreased. The SN580 uses the SLC cache for file copying.

NewsPaperStorages and File System News

NewsPaperStorages and File System News

Open source data lakehouse Dremio has a new boss

WANdisco delays footwork to raise funds

Storage old guard needs to have its cache and eat it too

Researchers devise even faster 3D DRAM

WANdisco raises funds for new dancefloor turn

Bootnote

Storage news ticker – July 4

Cubbit storage is a hybrid of Web2 and Web3

Cloudian sacrifices some snapshots for software update

Encryption, search and replication or erasure coding choice

Generative AI has democratized AI – what does this mean for COEs?

Zesty aims to take on cloud cost control

Dell-commissioned report praises APEX file services speed in AWS cloud

Western Digital boosts Blue consumer SSD line

Bootnote

ABOUT US

FOLLOW US