Quobyte tackles high-speed file querying demand with new engine

Unified high-performance storage platform Quobyte has taken the wraps off its distributed File Query Engine, intended to allow users to query file system metadata at high speed.

Quobyte allows small teams to run large-scale high-performance computing (HPC) infrastructures across various industry segments, including education and health research.

Aimed at environments with massive data sets, File Query Engine offers a range of capabilities, including the ability to query user-defined metadata for AI/ML training, enabling users to label files with data directly, instead of managing small metadata files.

Additionally, administrators can quickly answer operational questions, such as identifying space-consuming cold files or locating files owned by specific users.

File Query Engine replaces slow file system tree walks (“find”), offering a faster and more efficient alternative for large volumes. It is integrated with Quobyte’s distributed and replicated key-value store, which stores metadata.

And, unlike other products, the engine does not require an additional database layer, resulting in faster queries and “significant” resource savings, claimed Quobyte. Queries are executed in parallel across all metadata servers for fast scans across the entire cluster or select volumes.

“File Query Engine is a game-changer for our customers,” said Bjorn Kolbeck, CEO of Quobyte. “It streamlines the process of querying file system metadata, offering fast and efficient results even for large datasets, AI, and machine-learning workloads.”

The technology is part of Quobyte release 3.22 and is automatically available without any configuration. Users can run file metadata queries using the command-line tool “qmgmt,” which supports output in CSV or JSON formats.

Additionally, queries can be initiated via the Quobyte API, providing “flexibility and ease of use”, said the provider.

Among various existing use cases, Quobyte unified block, file and object storage is being used by the HudsonAlpha Institute for Biotechnology in the US, to store primary life sciences and genomics data in a hybrid disk+flash system.