Los Alamos Labs and SK hynix will demo a computational storage SSD at the Flash Memory Summit next week with simulation analysis accelerated by three orders of magnitude with indexing of key value stored data.
Los Alamos Labs researches the safety and security of the US nuclear stockpile and carries out weapons research. The organization relies on high-performance computing (HPC) and simulations rather than actual nuclear explosions. Typically the results are stored and analyzed as file-held data but Labs staff, wanting to use big data analytics tools, are moving to store simulation output data in record- and column-based formats to facilitate this.
Gary Grider, High Performance Computing division leader at Los Alamos, said in a statement: ”Moving our large-scale physics simulations from file-based I/O to record- and columnar-indexed I/O has shown incredible speedups for analysis of simulation output.”
The Laboratory has shown 1,000X speedups on analysis of simulation output by leveraging indexing to achieve data reduction on query via its DeltaFS parallel-file system technology.
Computational storage offloads a host server processor by carrying out low-level, repetitive processing operations on a processor attached to the drive, minimizing data movement to the host server and so accelerating processing a notch. Having parallel processing on the drive speeds it up even more.
A relational database stores data records organized into rows and columns and accessed through by row:column addresses. A key value database, such as Redis and RocksDB, stores records (values) using unique keys for each record. Each record is written as a key value pair and the key is used to retrieve the record.
SK hynix research engineers implemented a key value store on an NVMe SSD instead of the traditional block-based Flash Translation layer, and pushed indexing capabilities to a processor attached to the prototype drive. There they joined with Laboratory security science applications and enabled them to run faster, because this technique can save orders of magnitude of data movement upon retrieval for analysis.
The indexing capabilities enable ordered range queries and point queries which are common operations in simulation output data analysis. Range queries look for all records on a drive with values between upper and lower limits whereas point queries look for records with a specific value.
Grider said: “Demonstrations like this show it is possible to build an ordered KV-CSD that moves the ordering and indexing of data as close to the storage device as possible, maximizing the wins on retrieval from on-the-fly indexing as data is written to the storage. The ordering capability enables range queries that are particularly useful in computational science applications as well as point queries that key value storage is known for.”
Charles Ahn, head of solution development at SK hynix, said: “As large-scale simulation data and big data analytics grow, solutions are critical for these communities. We are very excited about continuing our research partnership with Los Alamos on this high-performance innovation.”
Los Alamos National Laboratory and SK hynix have a memorandum of understanding toward the design, implementation and evaluation of the KV-CSD.