Filecoin: The Airbnb of archival storage

Filecoin, by Protocol Labs, is a multi-faceted storage infrastructure for archival data, operating inside a framework of the cryptoeconomy. It’s designed to make a network of storage providers operate like an enterprise-class setup.

These independent entities cooperate to provide decentralized storage with the scale and immutability of the large public clouds – but at up to 95 percent lower cost (compared to AWS S3 Infrequent Access) and without lock-in. This is like an Airbnb of storage – an artisanal datacenter, as opposed to the Marriotts and Hiltons (played in this analogy by AWS, Google and the like).

Colin Evra, Filecoin
Colin Evran,

That’s what Protocol Labs’s Colin Evran, who looks after the ecosystem and operations, told an IT Press Tour in Palo Alto. Protocol is an open source R&D lab whose projects include IPFS, Filecoin, libp2p, and others. It aims “to make human existence orders of magnitude better through technology.”

Filecoin and IPFS

Filecoin is a software platform that uses some concepts from the Inter-Planetary File System (IPFS). IPFS is a peer-to-peer file sharing network using content addressing (object storage) to give each file a unique ID in a global namespace which connects IPFS hosts. These hosts, nodes or user-operators each hold a part of the overall data. Protocol Labs says this is an alternative to the world wide web’s HTTP and HTTPS protocols.

Filecoin is built on IPFS, based on the blockchain, and used to provide a data storage and retrieval method wherein users effectively rent providers’ unused storage drive space. These rental deals are registered with the blockchain. Transactions are made using FILs – digital Filecoin cryptocurrency tokens. The blockchain is based on operators providing proof of replication and proof of space-time (PoSt).

This decentralized storage system verifies data integrity every day, it says, and uses storage and retrieval transactions or deals in a marketplace.

Messari, a market intelligence consultancy for cryptocurrency, says a storage provider gets a user fee for storing data, and a retrieval provider receives another fee for retrieving the data users want. We can view these as ingress and egress fees. The storage provider guarantees data is stored, with the Filecoin network and storage providers verifying its integrity.

Filecoin graphic

This verification is done through “a cryptoeconomic incentive model that verifies the storage with zero-knowledge proofs” involving PoSt window checks carried out at 24-hour intervals showing storage providers are hosting their capacity. Inactive storage providers must pay a fault fee. Storage providers also get FIL token rewards for making capacity available.

Storage transactions are done through the blockchain (on-chain) while “retrieval deals can use payment channels to settle payments off-chain, resulting in faster retrieval.” Messari says: “Any storage deal or proof requires a transaction-based network fee. All network participants – both demand and supply side – pay this fee to interact with the network.”

Protocol Labs provides the base open source Filecoin software platform, and encourages storage providers to make capacity available and business startups to offer that capacity to customers in local and/or vertical markets. It acts as a Y Combinator-type organization to get startups involved.

Filecoin growth

Protocol Labs says it has a master plan with three elements:

Filecoin master plan

It says it is finding success with the first of these, quoting Filecoin capacity and provider growth numbers. We are told about 4,500 storage providers have joined, with some 17 EiB (19.6EB) of capacity. There are 11,300 contributors on GitHub and more than 2,800 projects in the ecosystem. But the number of storage providers actually storing user data (verified deals) is much smaller at 465 – which is up from 120 at the start of the year.

These providers have so far ingested almost 230PiB (259PB) through nine million deals from 1,047 data storing clients, up 9x this year and with 2 to 3PiB (2.3 to 3.4PB) onboarded daily. It is claimed that this makes Filecoin the world’s largest decentralized storage network. We’re told Filecoin reached this total capacity available number faster than AWS, Azure or GCP, and has outstripped its competitors, such as Storj. Protocol Labs makes the somewhat grandiose claim that Filecoin is now the largest aggregation of data storage in the history of the world.

That’s nice, but so what? Enterprises are not going to transition away from datacenters to decentralized PCs.

I/O performance

Filecoin’s actual I/O performance is slow compared to AWS, for example. It can take five to ten minutes for a 1MiB (1.1MB) file from the start (deal acceptance) to the end of the upload process (deal block chain registration, aka appearance on-chain). But note this: “Once the deal shows up on-chain, the storage provider must still complete generating a Proof-of-Replication and sealing the sector. This process is currently estimated to take ~1.5 hours for a 32GB sector on a machine that meets these minimum hardware requirements for storage providers.”

Data retrieval can also be slow, but it depends upon whether the storage provider holds a sealed copy of the data and an unsealed copy as well. Retrieving a sealed copy involves decoding the data, while unsealed data can be sent straight away after completing payment for the retrieval. Unsealing can take “around ~3 hours for a 32GiB (34.4GB) sector on a machine running minimum hardware requirements.” Then the data has to be sent over the network.

Dealing with sealed data is obviously extremely slow compared to on-premises filer I/O or even to public cloud ingress/egress. There is a non-negligible compute burden associated with the blockchain transactions and sealing process.

This leaves us where?

Should a classic archival storage customer – say, a bank in Idaho storing cold data in an on-premises NetApp StorageGRID box or up in a Glacier vault in AWS – transfer its storage to Filecoin? The lower cost will be appealing, but the slow I/O will be a disincentive. That is, unless it doesn’t matter – because apps processing archive data don’t have latency needs that break the organization’s budget. That’s why educational institutions with low budgets and far from real-time data access requirements may be attracted to Filecoin. That, and the open source and decentralized attributes that can resonate with some academic cultures.

From an enterprise point of view, Filecoin is a growing but distant alternative to existing archive storage arrangements. Its use of cryptocurrency, inside a dollars and cents pricing wrapper, can be a distracting and complicating aspect. CIOs can maybe watch this space and see what makes the distributed datacenter approach more attractive than corporate IT suppliers.