Log-Structured Merge Tree – The Log-Structured Merge Tree (LSM Tee) concept is used for ingesting and indexing data and persisting it to storage by application storage engines. It has ordered key:value pairs which are held in two or more places such as an in-memory structure and a storage structure. We can envisage three such structures: (1) an in-memory container to hold incoming data requests as a hash table or K:V pair table, (2) when full the in-memory container contents are written to one or more sorted and indexed in-memory delta container, freeing up the in-memory container, (3) a base storage structure into which the delta container contents are batch written or merged in a way that is consistent with the disk structure.
The merging or compaction is a background operation. Once deltas are merged into the base they are deleted. A search is then a three-step process, looking into each data structure in turn until the item is found.
The code for an LSM-Tree does not have to be written from scratch as developers can link to libraries of such code. There are three popular open source libraries for this – LevelDB, RocksDB and Speedb. Speedb is a rewritten RocksDB implementation that, the company claims, is significantly faster than RocksDB.
Thousands of enterprise apps use LSM-Tree-based storage engines and here are some prominent LSM-Tree app users:
It is claimed that these embedded LSM-Tree storage engines offer the highest-performance data storage capabilities, enabling massive scale and stability under pressures. This approach is highly efficient for massive ingest of data because it allows new data to be quickly written from sorted arrays in memory, aligning aggregated bytes (in memory) to fill complete blocks (on disk), while allowing for efficient retrieval of new data from memory, and old data from disk.