Resilvering

Resilvering – In the ZFS file system the process of moving data from one disk drive device in a ZFS storage pool (RAIDz poo)l to another device is known as resilvering. It is a disk data copy rebuild operation but ZFS calls it resilvering. It can be monitored by using the zpool status command. When a device is replaced due for example to failure, a resilvering operation is initiated to move the data from the good data copies on other devices to the new device.

The concept of resilvering refers to antique glass mirrors which were made with a layer of silver light reflecting material coating the back of the glass. When this decayed the mirror would appear streaky and tarnished and not work so well. Resilvering, replacing the silvered coating, restored the original clarity to the mirror.

RAIDz is the ZFS version of RAID. It is tightly bound to ZFS in that it does not have a fixed block size and works with ZFS’ copy-on-write technology.

Resilvering can take a lot of time. The OpenE website explains that, iIn a traditional RAID, where all blocks are regular, you take block 0 from each of the old drives, compute correct data for block 0 on the missing drive and write the data onto a new device. This process is then repeated for all blocks, even for the blocks that hold no data. This is because the traditional RAID does not know which blocks on the RAID are in use and which are not. If the array is otherwise idle, serving no user requests during the rebuild, the process is done sequentially from start to end, which is the fastest way to access rotational hard drives. 

ZFS uses variable-sized blocks. Therefore, for each recordsize wort of data, which can be anywhere from 4KB to 1MB, ZFS needs to consult the block pointer tree to see how data is laid out on disks. Because block pointer trees are often fragmented and files are often fragmented, there is quite a lot of head movement involved. Rotational hard drives perform much slower with a lot of head movement, so megabyte per second speed of the rebuild is slower than that of a traditional RAID. Now, ZFS only rebuilds the part of the array which is in use and it does not rebuild free space. Therefore, on lightly used pools it may actually complete faster than a traditional RAID. However, this advantage disappears as the pool fills up.