AWS – Amazon Web Services. This is Amazon’s public cloud offering and provides application, compute and storage services plus system application services. It is the world’s largest public cloud and probably the largest hyperscaler system in the world as well.
Autoloader
Autoloader – A tape storage devce with a drive and slots for holding tape cartridges. These are automatically loaded into the drive and removed from it by some kind of mechanised picking system or robot.
VAST links arms with Vertica for fast analytics
All-flash scale-out file and object storage supplier VAST Data is using Vertica’s analytics software with the aim of wrangling data from VAST’s array at high speed.
Micro Focus-owned Vertica provides its Unified Analytics Platform as SaaS across all major public clouds and on-premises datacenters, and integrates data in cloud object storage.
Now with VAST and Vertica, customers get to use a single system for both data warehouse and data lake. Jeff Denworth, VAST Data co-founder and CMO, said: “VAST’s entry into the Vertica ecosystem allows customers of both companies to benefit from hyperscale flash storage and industry-leading data reduction, unlocking greater operational efficiency and faster performance to implement advanced analytics – at a fraction of the cost of the public cloud.”
Vertica’s SQL-based Unified Analytics Platform has a massively scalable architecture and analytical functions spanning event and time series, pattern matching, geospatial, and end-to-end in-database machine learning. VAST Universal Storage has a disaggregated shared everything (DASE) architecture with a single tier of flash storage for all data
The VAST and Vertica alliance supports the fusing of the data lake and data warehouse architectures so providing a one-stop shop for customer’s structured, semi-structured, and unstructured data. The companies said customers get to benefit from real-time storage performance for all their use-cases: data-warehouse analytics, ad-hoc queries, and complex data science jobs. And they get simplified management from eliminating the data warehouse and data lake separation.
As Vertica’s Eon Mode separates compute from storage functions, each can scale independently and achieve workload isolation for different types of analytics.

The two also claim that their combination delivers 3x faster database queries than legacy all-flash NAS (meaning, we guess, PowerScale/Isilon), 2x better datacenter density at lower cost, and consistent quality of service without performance degradation or resource contention.
VAST and Vertica are providing an on-premises offering. The pair have published a joint technical validation guide and a solution brief.
Vertica also partnered with NetApp in November last year to have its cloud-native analytics software run on NetApp StorageGRID object storage.
At the time, parent Micro Focus said Vertica with StorageGRID has a fast multi-site, active-active architecture and can perform queries 10-25 times faster than conventional databases. It also claimed that the separation of Vertica’s compute and storage architecture in Eon Mode allowed admin staff to use StorageGRID as the main data warehouse repository or as a data lake.
Seagate object storage used in exascale computing projects
Seagate’s CORTX object storage was used for high-performance research projects in the European Union’s SAGE Exascale computing initiative.
SAGE, started in 2015, is one those weird made-up acronyms, and apparently stands for Percipient StorAGe for Exascale Data Centric Computing. PSEDCC doesn’t have the same memorable ring to it. Anyway, the SAGE system, which aimed to merge Big Data analytics and HPC, had a storage-centric approach in that it was meant for storing and processing large data volumes at exascale.
According to an ACM document abstract, “The SAGE storage system consists of multiple types of storage device technologies in a multi-tier I/O hierarchy, including flash, disk, and non-volatile memory technologies. The main SAGE software component is the Seagate Mero Object Storage that is accessible via the Clovis API and higher level interfaces.” [Mero was a prior name for what became CORTX.]

A first prototype of the SAGE system was implemented and installed at the Jülich Supercomputing Center in Germany. A SAGE 2 project was set up in 2018 to validate a next generation storage system building on SAGE for extreme scale computing scientific workflows and AI/deep learning. It “provides a highly performant and resilient, QoS capable multi tiered storage system, with data layouts across the tiers managed by the Mero Object Store, which is capable of handling in-transit/in-situ processing of data within the storage system, accessible through the Clovis API.”
SAGE and SAGE 2 have given rise to research papers, such as a PhD doctoral thesis by Wei Der Chien, a student at the KTH Royal Institute of Technology in Stockholm, entitled “Large-scale I/O Models for Traditional and Emerging HPC Workloads on Next-Generation HPC Storage Systems.“ This looked at using an object store for HPC applications. Chien developed a programming interface that can be used to leverage Seagate’s Motr object-store.
Motr
Motr, according to Github documentation, is a distributed object and key-value storage system that sits at the heart of Seagate’s CORTX object store and uses high-capacity drives. Its design was influenced by the Lustre distributed and parallel filesystem, NFS v4.0, and database technology. Motr interacts directly with block devices and is not layered on top of a local file system. It provides a filesystem interface but is not, itself, a filesystem.
Motr controls a cluster of networked storage nodes which can be disk or solid state-based, meaning flash, faster PCIe-attached flash, battery-backed memory and phase-change memory. Each Motr node caches a part of system state. This cache consists of meta-data (information about directories, files, their attributes) and data (file contents, usually in form of pages). The cache can be stored in volatile memory or on persistent store.
IO activities result in system state updates which can occur on multiple nodes. State updates are gradually moved towards more persistent stores. For example, an update to an in-memory page cache might be propagated to a cache stored on a flash drive and later to a cache stored on a disk drive.
A Seagate spokesperson told us the SAGE platform at Jülich Supercomputing ran CORTX Motr, with 22 nodes: 8 clients and 14 storage nodes. The storage nodes had multiple tiers: NVRAM, SSD, and HDD – served by different Motr pools. They form a single Motr cluster with these multiple performance tiers.
Users specify which pool to use and there is a user-directed Hierarchical Storage Management (HSM) tool to move data between pools. This connects to a libmotr interface, as do the HPC applications. We’re told that the libmotr interface is more HPC and AI-friendly than Amazon’s S3. Libmotr has high performance options, like scatter-gather, and direct connections via MPI-IO.
Some in the HPC community prefer to avoid high level interfaces like S3, opting instead for low-level interfaces, like libmotr, and APIs which provide greater control.
NoaSci
This month, Wei and others authored a follow-on paper called “NoaSci: A Numerical Object Array Library for I/O of Scientific Applications on Object Storage.” We have not seen the whole document but its abstract states: “While the POSIX-based I/O model dominates modern HPC storage infrastructure, emerging object storage technology can potentially improve I/O performance by eliminating these bottlenecks.” The researchers devised NoaSci, a Numerical Object Array library for scientific applications, which supports different data formats (e.g. HDF5, binary), and focuses on supporting node-local burst buffers and object stores.
They then showed how scientific applications can perform parallel I/O on Seagate’s Motr object store through NoaSci.
Seagate technical staff operating in Senior Vice President Ken Claffey’s system business team were involved in this research in the SAGE and SAGE 2 projects, which in turn informed Wei’s research.
The Motr low-level object API was co-designed by Seagate with its EU HPC partners including Professor Stefano Markidis at KTH. Wei is a student of Markidis. His Google scholar page shows that his 6th most cited publication is the original SAGE work on which Sai Narasimhamurthy, a Seagate UK-based engineering director, was a co-author.
Another cited paper, “MPI windows on storage for HPC applications,” was co-authored by Markidis, Narasimhamurthy, and others
Seagate told us: ”We were honored that CORTX Motr was the choosen object storage system for these projects and greatly benefited from these relationships which drove the CORTX Motr interface to be what it is today and remains the preferred interface for many within this community.”
It has added an S3 interface for enterprise and cloud users who prefer a higher-level interface and are not, typically, willing to rewrite their applications to achieve very high performance.
The SAGE and SAGE2 projects have ended but Seagate continues its collaboration with KTH and others in the IO-SEA and https://www.esiwace.eu/ projects.
Comment
MinIO has made most of the perceived running in positioning object storage as a primary data store for applications needing fast access to large amounts of data. Now we find that, nestled in European academic HPC research, Seagate’s CORTX object storage software has a low-level interface to its core Motr system, enabling HPC users to enjoy fast access to object data as well.
But, to enjoy the high speed, CORTX has to be used with the libmotr API interface, meaning application software changes are required. It would be fascinating to see if CORTX, via libmotr, is as fast or even faster than MinIO, and whether CORTX could have a future in the commercial sphere for fast access object storage.
ASIC
ASIC – Application-Specific Integrated Circuit. This is often implemented as a chip and its use is specific to an application and not general purpose, like a computer processing unit (CPU). Example applications include a Video Codec (Compression/Decompression) chip that compresses or decompresses a stream of video data.
ASCII
ASCII – American Standard Code for Information Interchange. This is a standard format for encoding letters, numerals, punctuation marks, characters and symbols as numeric strings when transmitting them in electronic communications. They are used to represent the characters and symbols on a computer keyboard amongst other things. The ASCII character set consists of 128 x 7-bit characters, including the digits 0 to 9, upper and lower case English letters from A to Z and various special characters.
Storage news ticker – April 19
AWS DataSync now supports transferring files to and from Amazon FSx for OpenZFS, a fully managed service built on the open-source OpenZFS file system. Using DataSync, you can migrate your on-premises file or object storage to FSx for OpenZFS or perform ongoing transfers of your data between FSx and OpenZFS and your on-premises storage or AWS Storage services. You can also use DataSync to move data between FSx and OpenZFS file systems.
…
AWS says you can automatically attach your FSx for ONTAP and FSx for OpenZFS file systems on newly created EC2 instances using the new launch experience on the EC2 Console. The Amazon FSx family of services enables you to launch, run, and scale shared storage powered by commercial and open-source file systems. Amazon FSx for NetApp ONTAP provides fully managed shared storage in the AWS cloud with the data access and management capabilities of ONTAP. Amazon FSx for OpenZFS provides fully managed cost-effective shared storage powered by the OpenZFS file system.
…
ChaosSearch will showcase its operational analytics capabilities at AWS Summit San Francisco, taking place April 20-21. The ChaosSearch Data Lake Platform empowers organizations to activate their data lakes built on Amazon S3 for analytics at scale. The ChaosSearch Data Lake Platform is the only solution that indexes data and renders it immediately available for search and analytics via Elasticsearch, Kibana, and SQL. Users can perform both log analytics and SQL queries concurrently and in-situ from their cloud object storage, without data pipelining, transformation, or movement.
…
Analyst house DCIG has announced the immediate availability of the 2022-23 DCIG TOP 5 Microsoft Hyper-V Backup Solutions report. It evaluated 12 suppliers’ products, looking at backup administration, capabilities, configuration, licensing, and pricing, recovery and restore, service and support. The Top 5 in alphabetical order are Arcserve, Atempo, Commvault, Unitrends, and Veritas.
…
IBM has announced Spectrum Scale Container Native v5.1.3.0 with Red Hat OpenShift Container Platform. It supports Red Hat OpenShift Container Platform v4.10, integrated CSI driver for application persistent storage with automated deployment, and an automated IBM Spectrum Scale performance monitoring bridge for Grafana. There is compression and quotas, ACLs, ILM support, file clones and snapshots on the storage cluster, and much more. See here for more information.
…

Kioxia and Western Digital have finalized a formal agreement to jointly invest in the first phase of the Fab7 (Y7) manufacturing facility at Kioxia’s Yokkaichi Plant in the Mie Prefecture of Japan. With construction of the first phase of Y7 completed, the joint investment will enable initial production output beginning in the fall of this year. This joint-venture investment adds a sixth flash memory manufacturing facility to the Yokkaichi Plant, enhancing its position as the world’s largest flash memory manufacturing site. The first phase of the Y7 facility will produce 3D flash memory including 112 and 162-layer and future nodes.
…
Datacenter composability supplier Liqid has reached an agreement with PIER Group to sell Liqid’s line of composable disaggregated infrastructure (CDI) solutions and services to research and education (R&E) customers. Liqid says its software maximizes datacenter utilization and CPU/GPU performance. The addition of the Liqid Matrix CDI platform to PIER Group’s portfolio is intened to provide customers a cost-effective way to disaggregate and pool datacenter resources such as GPUs, FPGAs, NVME SSDs, persistent memory, and other accelerators via software, allowing the resources to be dynamically configured as bare-metal servers.
…
Pure Storage has said that more than 10 of the leading autonomous vehicle software development companies are using its FlashBlade all-flash, unified file and object product. This includes a major ride-share company that is using FlashBlade to take a new approach to training its automated vehicle systems, a robotics company that develops autonomous driverless delivery vehicles, and a electric vehicle company with demanding workflows from an engineering perspective that is housing engineering data on FlashBlade.
…

Samsung is reportedly lining up a rugged external SSD product, according to SamMobile. It appears like a rugged model of the T7 SSD, with a USB Type-C port and a status LED.
…
Keysight Technologies, which delivers design and validation systems, was chosen by SK hynix to provide integrated peripheral component interconnect express (PCIe) 5.0 test platforms to speed the development of memory semiconductors used to design advanced products capable of supporting high data speeds and managing massive amounts of data. Keysight’s integrated solutions for physical layer simulation, characterization, and validation of PCIe 5.0 devices enables SK hynix to speed up test and development of next-generation dynamic random access memory (DRAM) and PCIe devices with CXL high-speed memory interconnect technology.
…
Ocient has come out of stealth and launched Ocient Hyperscale Data Warehouse (OHDW). This enables organizations to execute previously infeasible workloads in interactive time, the firm claimed. They can scale to analyze trillions of data points 10 to 50 times faster than existing offerings, returning results in seconds or minutes versus hours or days, it further claimed. Customers can also tackle CPU-intensive workloads with ease, including large-scale joins and full-table scans with extreme I/O performance. For example: “Because Ocient runs a massively parallel set of nodes with many NVMe drives in each node, the total random read throughput is 12 million 4KB random read IOPS per node,” the company said.
Intel FPGA used to hook non-x86 processors to Optane PMem
SMART Modular’s Kestral Optane Memory card connects to non-x86 processors with an Intel Stratix FPGA, which contains Optane controller functions.
Kestral is an add-in PCIe card with up to 2TB of Optane Persistent Memory (PMem), which can be used for memory acceleration or computational storage by Xeon, AMD, Arm or Nvidia processors. Up until now, only Xeon gen 2 scalable CPUs or later could connect to Optane PMem because they contained Optane PMem controller functions without which the PMem could not function as additional memory alongside the host system’s DRAM. The Stratix-10 DX FPGA used in the Kestral card has the controller functions programmed within it.
Blocks & Files asked SMART Modular’s solutions architect, Pekon Gupta, some questions about how the Kestral card interfaces to hosts, and the FPGA interface for Optane PMem was revealed in his answers.
Re: memory expansion for X86, AMD, Arm, and Nvidia servers, what software would be needed on the servers for this?
Pekon Gupta: Kestral exposes the memory to the accelerator as a memory mapped I/O (MMIO) region. Applications can take benefit from this large pool of extended memory by mapping it to their application space. A standard PCIe driver can enumerate the device. Intel released specific PCIe drivers which can be built on any standard Linux distribution.
What mode of Optane PMem operation (DAX for example) is supported?
Pekon Gupta: The current version of Kestral supports Intel Optane PMem in Memory mode only. There are plans to [add] app direct mode in future revisions.
How is the Optane memory connected to the memory subsystems on the non-x86 servers which don’t support DDR-T? As I understand it, DDR-T is Intel’s protocol for Xeons CPUs (with embedded Optane controller functions) to talk to Optane PMem. Arm, AMD, and Nvidia processors don’t support it.
Pekon Gupta: The Intel FPGA controller on Kestral Add-in-card converts PCIe protocol to the DDR-T protocol. Therefore, the host platform only needs a 16-lane wide PCIe bus. The host never interacts with Optane PMem directly using any proprietary protocol. This card also supports standard DDR4 RDIMMs, so if the end user needs better performance, they can replace Intel Optane PMem DIMMs with 256GB DDR4 DIMMs.
Why is SMART specifying “Possibly CCIX coherent attached Optane”? What is needed for the possibility to become actual?
Pekon Gupta: This card was designed for a targeted customer, when CCIX was still around. Therefore, there is a possibility of adding CCIX support using third-party IP on the FPGA. There are still a few ARM-based systems in market which support CCIX Home Agent. These systems can benefit by using a large pool of memory expansion available through this card.
Will PCIe gen 5 be supported?
Pekon Gupta: Not in the current generation of hardware. The current FPGA controller can only support PCIe Gen 4.0.

Will CXL v1.1 be supported ?
Pekon Gupta: The current generation of hardware does not support CXL. The current FPGA controller does not support CXL. However, most of our customers are asking for CXL-2.0 support and beyond so we are considering that for future revisions.
Will CXL v2.0 be supported?
Pekon Gupta: CXL is not supported in the current generation of Kestral. We are in discussion with multiple CXL controller suppliers to build a CXL based add-in-card. And would like to hear from interested customers which features they would like to see in future versions of Kestral or similar CXL-based accelerators.
Memory Acceleration or Storage Cache – how can Optane-based Kestral (with Optane slower than DRAM) be used for memory acceleration?
Pekon Gupta: Although Optane is slower than DRAM, by offloading certain fixed functions onto the FPGA this compensates for the latency by bringing compute next to the data.
Most of the architectures today support bringing (copying) the data near the compute engine (CPU or GPU), but if the data is in 100s of Gigabytes it may be more efficient to bring “some” fixed compute functions near the data, to prefilter the data and re-arrange it. This is exactly the same concept used in “processing in memory (PIM)” technology like the ones shown in Samsung’s AXDIMM. Kestral is just an extension of PIM concept.
When modern x86 servers support Optane PMem, why would you need a Kestral Storage Cache? Is it for non-Optane PMem-supporting servers?
Pekon Gupta: There are two benefits which are seen by both Intel and non-Intel platforms.
1) By using the Kestral Add-in-Card you can attach the Optane PMem to PCIe slots, which frees up DDR4 or DDR5 DIMM slots for adding more direct-attached high speed memory.
2) Kestral hides the nuances of proprietary protocol of Optane DIMM and allows users to focus on using the memory, or adding custom hardware engines to accelerate their workload.
Host server offload functions such as compression, KV compaction and RAID 1 are mentioned as use cases. This, as I understand it, is sometimes called compute-in-storage and also a function of some smart NICs. Putting these functions in a Optane PMem + Arm + FPGA card seems overkill. What advantage does Optane bring here? Is it speed?
Pekon Gupta: Compute in Storage or Processing in Memory are two sides of the same coin. The idea is to offload a “few fixed” compute functions near to the large pool of data so that the host does not need to copy large [amounts of] data and then filter and discard most of it. Optane brings the advantage of a large pool of memory. A single Optane DIMM supports 512GB of capacity and four 512GB DIMMs gives total of 2TB of capacity per Kestral card. So PIM and Computational storage will become cost effective only when with a large density of Memory or storage. This high density of DIMM keeps $$/GB at acceptable levels from cost point-of-view.
Can you provide numbers to justify Optane PMem use?
Pekon Gupta: We have a few benchmark data [points] which can be shared under NDA, as there are some proprietary implementations in it. But we were able to achieve similar performance as what Optane DIMM will give when directly attached to the processor bus. So we concluded that PCIe is not the bottleneck here.
Web3
Web3 – decentralized storage (dStorage), and computing. Such dStorage is based on capacity being provided by a global peer-to-peer network of providers whose trustworthiness is verified by blockchain technology with the incentive of blockchain-based cryptopcurrency such as Protocol Labs’ Filecoin. This Web3 dStorage concept is specifically opposed to the so-called Web2 centralized storage provided by the main public clouds such as AWS, Azure, GCP, Wasabi, and others.
GCRAM
GCRAM – Gain-Cell Random Access Memory. The requires 3 transistors to store a bit of data, as opposed to the 6-8 transistors needed for SRAM-based highest-density memory technology. When used with System-on-Chips (SoCs) GCRAM can provide 50% area reduction over high-density SRAM and reduce power consumption by a factor of five. GCRAM can be manufactured using the standard CMOS process. Startup RAAAM Technologies says GCRAM technology combines the density advantages of embedded DRAM with SRAM performance, without any modifications to the standard CMOS process.
RAAAM Memory Technologies’ GCRAM uses separate read and write ports, in contrast to DRAM’s combined read/write port per cell. This enables it to amplify the cell’s stored charge and separately optimize the read and write processes while retaining SRAM-like performance.

Archive
Archive – A storage repository for little-accessed data that needs to be retained for a long time. The key difference from a backup is that a backup is a copy of the original data from which the original data can be restored, as in restoring a virtual machine. An archive is created by moving data to the storage media, not copying it. Also information is retrieved from an archive and content description metadata is needed to facilitate this. That is generally part of archive system software which is used to manage and operate the archive.

Archived data is best stored on cheap and durable offline storage media, such as tape, which lowers its costs compared to keeping data online in constantly powered disk drives or SSDs. Generally, archive data does not need fast access. An archive vault can be front-ended with a disk or SSD cache to provide faster access to the metadata describing the archive and its contents. Such an archive system may be described as an active archive.