Decentralized storage provider Storj has released a research paper claiming its technology can reduce the carbon footprint of storing a TB of data by 66-83 percent when compared against cloud hyperscalers and corporate datacenters.
This claim rests upon two pillars. Firstly, Storj places its data on spare disk drive capacity in existing datacenters, capacity that is already powered and spinning. It does not run its own datacenters and buy racks of new drives. Secondly, it employs distributed Reed-Solomon erasure coding instead of replicated drive copies to safeguard data and this requires fewer disk drive copies to safeguard data to the same extent.
Storj chief architect JT Olio said: “When assessing the environmental impact of centralized providers, it’s essential to consider not only the electricity generated in powering and cooling data storage devices but the environmental impact from manufacturing and transporting those devices as well. Storj is able to eliminate much of those costs by storing data on the under-utilized capacity of drives that are already manufactured, powered, and cooled for other purposes.”
Decentralized storage places data in multiple locations rather than in a single, centralized datacenter. The paper, titled “How Using Spare Capacity for Data Storage is Better for the Environment,” explains that datacenters over-provision disk capacity to cater for drive failures and future growth. Storj hires that capacity all around the globe and uses it to store distributed shards or slices of its erasure-coded data. A segment of data – a file, files or part files – is divided into 80 erasure-coded pieces, each stored on a different drive in different parts of its network.
Any segment can be reconstituted from just 29 of these drives, meaning 59 could fail before data is lost. This means that drive life can safely be extended beyond the standard three to five-year period that Storj says is normal. It also means that it does not have to use standard three-copy replication to ensure data durability and availability, nor for geographic coverage. It is globally distributed by default.
The research paper models disk drive carbon emissions in a hyperscaler datacenter, a traditional datacenter, and in a Storj environment.
Summarizing, the paper says a corporate datacenter generates 523kg of CO2 by storing 1TB of data for three years. Hyperscalers, being more efficient, emit 251kg, but Storj produces just 12kg. Taking everything in the research paper into account, Storj can make up to 66 percent carbon savings relative to hyperscalers and 83 percent compared to corporate datacenters.
Storj’s technology is S3-compatible object storage with file retrieval generated from the nearest 29 sources in its network and so faster than S3. View it as fast nearline storage. It shouldn’t be used for primary data needing SSD storage-class speed.
Organizations needing to demonstrate a commitment to lowering their carbon emissions as part of their ESG activity could consider decentralized storage as a way of achieving this, it suggests. The paper, with appendices detailing its calculations and assumptions, is available here.