The way data is written and erasure coded is crucial to VAST’s data protection capabilities and centres on striping.
Suppose a QLC drive fails. Its data contents have to be rebuilt using the data stripes on the other drives. Here things gets clever. VAST Data’s System has clusters of databoxes with 20-30 drives each. There can be 1,000 data boxes and so 20,000 – 30,000 drives. Because it has so many drives with a global view VAST it has its own way of global erasure coding which has a low overhead. With, for example, a 150+4, you can lose 4 drives without losing data and have just a 2.7 per cent overhead.”
VAST says its 150+4 scheme is 3x faster than a smaller disk stripe, such as 8+2, and classic Reed-Solomon rebuilds.
The stripe erasure code parameters are dynamic. As databoxes are added the stripe lengths can be extended, increasing resilience. We could envisage a 500+10 stripe structure. Ten drives can be lost and the overhead is even loser, at two per cent. This scheme is twice as fast as disk drives at erasure rebuilds.
With a 150+4 structure customer need to read (recover) from all the drives but only an amount that’s equivalent to a quarter of the drives. There are specific areas in XPoint for the resiliency scheme, and they tell a compute node part from this drive and that part from that drive, using locally decodable algorithms.
Because of these locally decodable codes VAST does not need to read from the entirety of the wide stripe in order to perform a recovery.
So each CPU reads its part of the stripes on the failed drives and recovery is, VAST says, fast. It’s roughly like Point-in-Time recovery from synthetic backups.
Move on and explore more parts of the VAST Data storage universe;