Meatier MemSQL database munches more data in memory and goes faster on disk

Version 7 of MemSQL’s relational database, out today, does space-saving in memory and adds hash indexing on disk to speed access. 

This is phase 1 of MemSQL’s single row+column concept, called SingleStore. MemSQL 7.0 also adds accelerated synchronous replication and incremental backup, making it better suited to be a system of record for critical workloads. The upgrade is generally available for on-premises use or via the company’s Helios managed service on AWS.

MemSQL Co-CEO Nikita Shamgunov said in a statement today: “Having MemSQL 7.0 available on AWS gives our customers – which include global enterprises that are leaders in their industries – easy access to a fast, scalable data platform for operational analytics, ML, and AI.

“Upgrading MemSQL Helios with the 7.0’s new SingleStore and system of record functionality continues our relentless focus on eliminating the speed and scale limits that organisations have long struggled with in their operational workloads.”

MemSQL Rowstores and Columnstores

Relational databases (RDBMS) are tables which store data in rows and columns. They can be queried using Structured Query Language (SQL)  and accessed in row or column mode. Rows are identified by keys and can be directly accessed by these keys. This makes them suitable for latency-sensitive applications like online transaction processing (OLTP)

The MemSQL database operates in two formats. The rowstore format executes entirely in memory, making it much faster than the columnstore format which is stored on disk. Slower columnstore access makes it applicable to less latency-sensitive work such as analysis. It is also the choice if you have more data than will fit in DRAM.

MemSQL 7.0 remedies two problems. First, more memory is needed as row format data grows, making it expensive and thereby limiting its applicability.

Secondly, accessing columnstore data means scanning all the columns to find the data record needed, as there is no key access to an individual column. 

Therefore columnstore data in MemSQL is inherently slower to access. It is mostly stored on slow external storage with only a small cache in memory. Also, the RDBMS access method provides no direct access to a column, so millions of columns may have to be scanned to find the one of interest.

Version 7 of the MemSQL database has remedies for both the row memory capacity and columnar disk access time problems.

Sparse compression

Rowstore data tables now get compression of records containing null data. Every record has a fixed-width section containing ordinary data and index keys and a variable width section that can be slimmed if it contains fields with null data. The variable width portion contains sparse data and a bitmap is used to indicate which fields containing null data.

This sparse data compression preserves fast access to rows while shrinking them if they have null data fields. MemSQL rowstore users can now store more data in memory than before, with the extra effective capacity varying with the amount of compressed null data. For instance, if half the values are null you can double the effective memory capacity.

In the future MemSQL intends to compress fields containing blanks and zeros, and store small integers in less than the current 8 bytes. The bitmap is wide enough at 4 bits to handle nulls, blanks, zeros and small integers, according to the company.

Columnar hash indexing

MemSQL columnstore tables are divided into 1 million-row segments. Columns are stored independently within each segment, in contiguous parts of a file or in a file by themselves. Accessing a field in a single row in a segment entails scanning the entire million-row segment, with many many file accesses.

MemSQL 7.0 introduces subsegment access. The software calculates the location of data within a columnstore for a row, and then reads only the part of a column segment that is needed to locate and access that row. This reduces the task from reading a million rows to reading a few thousand.

V7 also introduces hash indexes on columnstores, with users able to specify hash indexing of individual columns. Selection operations can be executed using the index without any scanning. Using hash indexes to enable filters can be orders of magnitude faster than full or partial-segment scanning.

The speed increase in columstore SQL is so large that MemSQL floats the idea that it could be used for some, less latency-sensitive OLTP applications. 

This new version also introduces row-level locking of a columnstore, which is much less access-limiting than the existing segment-level locking.

MemSQL CMO Peter Guagenti discusses MemSQL 7.0 in more detail and you can also read this technical introduction to MemSQL’s database technology.