Backblaze cloud and backup storage is speeding up small file uploads by using a fast SSD ingest cache called a shard stash.
The business stores ingested files on disk drives, which write data more slowly than SSDs. The company is now writing incoming files simultaneously to disk drives and SSDs, with the SSD-held data stored only until the HDDs have received all the data, at which point the SSD copies are deleted. The result is small file upload speeds as much as 30 percent faster than AWS S3, Backblaze claims.
Gleb Budman, Backblaze CEO, claimed: “Backblaze’s pioneering approach delivers cloud storage at 1/5th the price versus legacy vendors, and our latest innovation maintains those savings while delivering 10-30 percent faster performance versus AWS depending on the file size.”
Details are explained in a Backblaze blog, which says: “Prior to this work, when a customer uploaded a file to Backblaze B2, the data was written to multiple hard disk drives (HDDs). Those operations had to be completed before returning a response to the client.
“Now, we write the incoming data to the same HDDs and also, simultaneously, to a pool of solid state drives (SSDs) we call a ‘shard stash,’ waiting only for the HDD writes to make it to the filesystems’ in-memory caches and the SSD writes to complete before returning a response. Once the writes to HDD are complete, we free up the space from the SSDs so it can be reused.”
SSD vs HDD speeds can be shown by comparing stats from a Seagate Exos 16 TB disk drive and a Micron 7450 Max 3.2 TB SSD:
The SSD is more than 20 times faster at sustained data transfer, more than 2,200 times faster at reading data, and nearly 900 times faster for writes than the HDD, we’re told.
The blog says: “Let’s consider a real-world operation, say, writing 64 KB of data. Assuming the HDD can write that data to sequential disk sectors, it will spin for an average of 4.2 ms, then spend 0.25 ms writing the data to the disk, for a total of 4.5 ms. The SSD, in contrast, can write the data to any location instantaneously, taking just 27µs (0.027 ms) to do so. This (somewhat theoretical) 167x speed advantage is the basis for the performance improvement.”
Previously, when a client application uploaded a file to the Backblaze B2 Storage Cloud, a “coordinator pod” split the file into 16 data shards, creating four additional parity shards, and then wrote the resulting 20 shards to 20 different HDDs, each in a different pod.
Now, “upon receiving a file of 1 MB or less, the coordinator splits it into shards as before, then simultaneously sends the shards to a set of 20 Pods and a separate pool of servers, each populated with 10 of the Micron SSDs described above – a ‘shard stash.’ The shard stash servers easily win the ‘flush the data to disk’ race and return their status to the coordinator in just a few milliseconds. Meanwhile, each HDD Pod writes its shard to the filesystem, queues up a task to flush the shard data to the disk, and returns an acknowledgement to the coordinator.”
“Once the coordinator has received replies establishing that at least 19 of the 20 Pods have written their shards to the filesystem, and at least 19 of the 20 shards have been flushed to the SSDs, it returns its response to the client … If power was to fail at this point, the data has already been safely written to solid state storage.” Then the SSD shard copies can be purged.
Backblaze tested the speed increase. “Over a 12-day period following the shard stash deployment … the average time to upload a 256 KB file was 118 ms, while a 1 MB file clocked in at 137 ms … For comparison, we ran the same test against Amazon S3’s US East (Northern Virginia) region, aka us-east-1, from the same machine in New Jersey. On average, uploading a 256 KB file to S3 took 157 ms, with a 1 MB file taking 153 ms.”
In summary: “We benchmarked the new, improved Backblaze B2 as 30 percent faster than S3 for 256 KB files and 10 percent faster than S3 for 1 MB files.”
Veeam backups were even faster, we’re told: “These low-level tests were confirmed when we timed Veeam Backup & Replication software backing up 1TB of virtual machines with 256k block sizes. Backing the server up to Amazon S3 took three hours and 12 minutes; we measured the same backup to Backblaze B2 at just two hours and 15 minutes, 40 percent faster than S3.”
The increased speeds benefit all Backblaze B2 Cloud Storage customers, especially those who rely on 1 MB or less small file uploads. Files of 1 MB or less make up about 70 percent of all uploads to B2 Cloud Storage and are common for backup and archive workflows. Many data protection software providers split data into smaller, fixed-size blocks for upload to cloud storage, meaning users can expect to see significantly faster upload speeds for smaller files without any change to durability, availability, or pricing.
The shard stash approach has been fully rolled out to Backblaze’s global data regions. All Backblaze B2 customers will, it promised, enjoy faster uploads and downloads, no matter their storage workload. Additional B2 Cloud Storage download performance enhancements are planned over the coming months.
We have asked AWS to comment.