Home Blog Page 230

Pure Storage wants to work with data gravity, not against it

Pure Storage CEO Charles Giancarlo expressed two noteworthy views in an interview with Blocks & Files – that hyperconverged infrastructure doesn’t exist inside hyperscaler datacenters, and that data needs virtualizing.

He expressed many noteworthy views actually but these two were particularly impressive. Firstly, we asked him if running applications in the public cloud rendered the distinction between DAS (Direct-Attached Storage) and external storage redundant. He said: “In general the public cloud is designed with disaggregated storage in mind… with DAS used for server boot drives.”

The storage systems are connected to compute by high-speed Ethernet networks. 

Pure Storage CEO Charles Giancarlo
Charles Giancarlo

It’s more efficient than creating virtual SANs or filers by aggregating each server’s DAS in the HCI (hyperconverged infrastructure). HCI was a good approach generally, in the 2000 era when networking speeds were in the 1Gbit/s area, but “now with 100Gbit/s and 400Gbit/s coming, disassociated elements can be used and this is more efficient.”

HCI’s use is limited, in Giancarlo’s view, by scaling difficulties, as the larger an HCI cluster becomes, the more of its resources are applied to internal matters and not to running applications.

Faster networking is a factor in a second point he made about data virtualization: “Networking was virtualized 20 years ago. Compute was virtualized 15 years ago, but storage is still very physical. Initially networking wasn’t fast enough to share storage. That’s not so now.” He noted that applications are becoming containerized (cloud-native) and so able to run anywhere.

He mentioned that large datasets at petabyte scale have data gravity; moving them takes time. With Kubernetes and containers in mind, Pure will soon have Fusion for traditional workloads and Portworx Data Services (PDS) for cloud-native workloads. Both will become generally available in June.

What does this mean? Fusion is Pure’s way of federating all Pure devices – on-premises hardware/software arrays and off-premises, meaning software in the public cloud – with a cloud-like hyperscaler consumption model. PDS, meanwhile, brings the ability to deploy databases on demand in a Kubernetes cluster. Fusion is a self-service, autonomous, SaaS management plane, and PDS is also a SaaS offering for data services.

We should conceive of a customer’s Pure infrastructure, on and off-premises, being combined to form resource pools and presented for use in a public cloud-like way, with service classes, workload placement, and balancing.

Giancarlo said “datasets will be managed through policies” in an orchestrated way, with one benefit being the elimination of uncontrolled copying.

He said: “DMBSes and unstructured data can be replicated 10 or even 20 times for development, testing, analytics, archiving and other reasons. How do people keep track? Dataset management will be automated inside Pure.”

Suppose there was a 1PB dataset in a London datacenter and an app in New York needed it to run analysis routines? Do you move the data to New York? 

Giancarlo said: “Don’t move the [petabyte-level] dataset. Move the megabytes of application code instead.”

A containerized application can run anywhere. Kubernetes (Portworx) can be used to instantiate it in the London datacenter. In effect, you accept the limits imposed by data gravity and work with them, by moving lightweight containers to heavyweight data sets and not the inverse. You snapshot the dataset in London and the moved containerized app code works against the snapshot and not the original raw data.

When the app’s work is complete, the snapshot is deleted and excess data copying avoided.

Of course data does have to be copied for disaster recovery reasons. Replication can be used for this as it is not so time-critical as an analytics app needing results in seconds rather than waiting for hours as a dataset slowly trundles its way through a 3,500-mile network pipe.

Giancarlo claimed: “With Pure Fusion you can set that up by policy – and keep track of data sovereignty requirements.” 

He said that information lifecycle management ideas need updating with dataset lifecycle management. In his view, Pure needs to be applicable to the very large-scale dataset environments, the ones being addressed by Infinidat and VAST Data. Giancarlo referred to them as up-and-comers, saying they were suppliers Pure watched although he said it didn’t meet them very often in customer bids.

Referring to this high-end market, Giancarlo said: “We clearly want to touch the very large scale environment that out systems haven’t reached yet. We do intend to change that with specific strategies.” There was no more detail said about that. We asked about mainframe connectivity and he said it was relatively low on Pure’s priority list: “Maybe through M&A but we don’t want to fragment the product line.” 

Pure’s main competition is from incumbent mainstream suppliers such as Dell EMC, Hitachi Vantara, HPE, IBM, and NetApp. “Our main competitive advantage,” he said, “is we believe data storage is high-technology and our competitors believe it’s a commodity… This changes the way you invest in the market.”

For example, it’s better to have a consistent product set than multiple, different products to fulfill every need. Take that, Dell EMC. It’s also necessary and worthwhile to invest in building one’s own flash drives and not using commodity SSDs.

Our takeaway is that Pure is bringing the cloud-like storage consumption and infrastructure model to the on-premises world, using the containerization movement to its advantage. It will provide data infrastructure management facilities to virtualize datasets and overcome data gravity by moving compute (apps) to data instead of the reverse. Expect announcements about progress along this route at the Pure Accelerate event in June.

It’s a snap: Nebulon’s four-minute TimeJump restore

Nebulon TimeJump can restore a compromised server’s operating environment and data in less than four minutes – because it uses snapshot technology instead of backups and is outside the server’s security domain.

But snapshots are not backups, exactly. That is what gives Nebulon its advantage. We’re talking about servers equipped with a Nebulon Services Processing Unit (SPU) card and formed into an nPod cluster of up to 32 nodes. The SPU functions as a RAID or storage controller and all of a host server’s direct-attached storage drives are accessed through it, with their contents, including boot files, encrypted.

All the servers in the cluster can access their own and the other clustered server members’ direct-attached storage (DAS). Whenever data is updated, it is stored in the NVRAM on the server’s SPU card and then written to the owning server’s DAS. The SPUs can take a timestamped snapshot of the clusters’ data and store that in a partition on the drives. Snapshotting is policy-driven and, since snapshots consist of pointers to actual drive blocks containing data rather than the data itself, snapshots take up very little space.

Nebulon
Nebulon SPU

Nebulon’s Martin Cooper, VP of Customer Experience, said that the SPU knows where the data blocks are because it manages their placement. It manages how data is laid out on physical drive media when the host writes to logical drives through the SPU, much like an external storage array when a host writes to a LUN. The snapshot metadata is spread over all SSD’s in the server. 

When a server- or system-compromising event such as a ransomware attack occurs, the snapshot taken just before the attack can be located and then restored – in effect reinstantiating the servers to a pristine state with the SPU’s metadata pointing to clean data. This can take four minutes or less, and the entire nPod is restored in this way.

David Scott.

To the statement that snapshots are not backups, Nebulon executive chairman David Scott told me: “There may be some ambiguity around the terms ‘backup’ and ‘restore’ that is responsible for this contradiction.”

In his view, “Both terms can be used at two levels: a higher level that refers to all approaches of creating copies of primary data that can be used for recovery and ‘restored’, and then at a second level that is used specifically in the context of data protection backup software.”

Nebulon’s snapshotting is backup in the first sense but not in the second sense, since it does not rely on data protection backup software, like Veritas or Veeam or Acronis. Backup processes in the first sense include online snapshots, online (disk/SSD) or offline (tape) backup copies using data protection software and remote copies using replication technology.

According to Scott: “The Nebulon EULA is using the term ‘Backup’ associated with full copies of data volumes via data protection software – i.e. the second level (our contribution to the ambiguity).”

Scott says: “Primary online data volumes can be recovered from any of these approaches through some form of ‘restore’ process.” In the case of snapshots the “restore” process is by promoting a previous snapshot. With backup through data protection software, restore is by using that software’s restore function which, generally speaking, is slower than using a snapshot.

Scott points out: “Snapshot promotion offers the fastest speed of recovery in comparison with a traditional restore from a backup copy using data protection software. … Snapshot promotion also is likely to represent the most recent coherent data set possible (backups are usually taken once a day whereas snapshots can be taken much more frequently).”

He then identifies a specific vulnerability which Nebulon’s technology protects: “Unfortunately, in today’s ransomware attacks the customer’s operating system and management software have often also been compromised and are therefore not in a state to allow snapshot ‘promotion’ to occur.”

But Nebulon’s technology restores pristine copes of the customer’s operating system and management software as well as data volumes. The process involves reinstalling the system software and then altering data volume metadata structures to point to clean data blocks – a comparatively quick process compared to restoring the full data volumes from backed up copies of data. This is how Nebulon does its four minutes or less restore.

CrossBar tries to secure embedded ReRAM IoT market

ReRAM developer CrossBar is separating itself from other ReRAM pioneers by concentrating on the near invincibility of its chips against external content-detection threats.

Update; Weebit Nano funding rise and scaling targets added plus security comment; 26 April 2022.

Resistive RAM (ReRAM) is a fast non-volatile memory technology – storage-class memory relying on the formation of electrically conducting oxygen vacancy filaments through an otherwise insulating medium. This alters the resistance of the material between high and low states – the binary signalling mechanism. ReRAM is said to be as fast to read as SRAM (static random access memory), but lower cost with reduced power consumption and non-volatility. Weebit Nano and Intrinsic Technology are also developing ReRAM products. All three use a silicon oxide material and a crosspoint-style architecture.

Ashish Pancholy, CrossBar VP for marketing and sales, told us that CrossBar’s technology has “resistance to an extremely motivated and persistent attacker trying to sniff out memory contents.” This makes the product suited to storing highly sensitive information such as encryption keys. Pancholy believes CrossBar’s technology is ideal for secure CMOS applications as an embedded memory in standard CMOS manufacturing processes. 

CrossBar graphic
CrossBar graphic

Starting last year, the company moved to focus more on secure memory and applications. It has found a niche in the hardware security field with Physically Unclonable Function (PUF) keys as a replacement for SRAM. PUF keys have a unique identifier or fingerprint, and Pancholy said an embedded hardware PUF dramatically reduces a device’s attack surface.

Pancholy enumerated the types of attack that technologically skilled hostiles might use to get at a memory device’s contents:

  • Power analysis – Power use can vary with memory contents and so be used to infer the contents.  ReRAM consumes little power and is resistant to simple/differential power analysis. Power is constant regardless of storage contents).
  • Optics attack – ReRAM is secure against optics-based attack due to dense metal wiring. Resistant to delayering and backside attacks since ReRAM is implemented in middle layers.
  • No transistor – ReRAM is non-transistor-based and immune to photon emission analysis, unlike SRAM, Flash, ROM, and floating Gate/anti-fuse OTP.
  • Transmission Electron Microscopy (TEM) – Techniques such as TEM cannot effectively detect localized atomic-level defects. 
  • Electromagnetic analysis – High “cell” density limits the effectiveness of electromagnetic analysis (size of magnetic probe must be comparable to the target device size for accurate attack). It’s also easy to make physical magnetic shields to protect the ReRAM.
  • Low-bit error rate (PUF) –  Unlike conventional PUF (physically unclonable function) keys with high bit error rates, ReRAM PUF does not require helper data, which makes it more secure from invasive attacks.
CrossBar attack graphic
CrossBar attack graphic

We didn’t know that such attack methods existed, nor realize that attackers would go to such extreme lengths to get at the contents of digital devices. Pancholy said that the market for attack-resistant devices is growing with the IoT market.

A Weebit spokesperson offered the thought that the security features, such as immunity to attacks, etc., are really features that are inherent in ReRAM technology in general and not unique to Crossbar.

CrossBar background

CrossBar was founded in 2010 by chief scientist, professor Wei Lu, VP engineering Hagop Nazarian, and departed ex-CEOs Chris de Groat and George Minassian. The latter left in March 2002. Dr Ker Zhang, an entrepreneur-in-residence at lead investor Kleiner Perkins, and executive chairman since March 2019, then became CEO. CrossBar has taken in almost $131 million in funding across four VC rounds – the last in 2015 for $35 million – and three subsequent venture rounds, the most recent for $20.1 million in 2019. There are about 50 employees.

That makes it richly endowed compared to competitor Weebit Nano, which has raised $75 million since being founded in 2014, and ReRAM newcomer Intrinsic Technology, which was started up in 2017 and has raised just $1.8 million.

CrossBar has focused on patent-protecting its technology and has 301 filed patents worldwide with 193 issued in the US. Another 80-plus applications are pending elsewhere. Its revenue comes from licensing its patented technology to producers of CMOS chips embedding its ReRAM to provide fast, dense, non-volatile memory functionality.

It has developed 28nm foundry technology and is working on shrinking that to 12nm with Microsemi. Pancholy said there is potential for 10nm and possibly smaller technology – far below the limits of NAND. For comparison, Weebit and CEA-Lei have scaled down Weebit’s technology to 28nm and are targeting 22nm. 

Intrinsic has demonstrated the scaling of its ReRAM technology down to 50nm.

CXL

CXL – Computer eXpress Link – the extension of the PCIe bus outside a server’s chassis, based on the PCIe 5.0 standard. There are five successive versions of the standard;

  • CXL v1, released in March 2019 enables server CPUs to access shared memory on local accelerator devices with a cache coherent protocol; memory expansion. 
  • CXL v1.1 enables interoperability between the host processor and an attached CXL memory device; still memory expansion.
  • CXL 2.0 provides for memory pooling between a CXL memory device and more than a single accessing host. It enables cache coherency between a server CPU host and three device types.
  • CXL 3.0 uses PCIe gen 6.0 and doubles per-lane bandwidth to 64 gigatransfers/sec (GT/sec). It supports multi-level switching and enables more memory access modes – providing sharing flexibility and more memory sharing topologies – than CXL 2.0.
  • CXL 3.1 introduced:
    • CXL fabric improvements and extensions with, for example, scale-out CXL fabrics using Port-Based Routing (PBR)
    • Trusted-Execution-Environment Security Protocol (TSP) allowing virtualization-based Trusted Execution Environments (TEEs) to host confidential computing workloads
    • Memory expander improvements with up to 32-bit of metadata and RAS capability enhancements

There are three device types in the CXL 2.0 standard;

  • Type 1 devices are I/O accelerators with caches, such as smartNICs, and they use CXL.io protocol and CXL.cache protocol to communicate with the host processor’s DDR memory.
  • Type 2 devices are accelerators fitted with their own DDR or HBM (High Bandwidth) memory and they use the CXL.io, CXL.cache, and CXL.memory protocols to share host memory with the accelerator and accelerator memory with the host.
  • Type 3 devices are just memory expander buffers or pools and use the CXL.io and CXL.memory protocols to communicate with hosts.

A CXL 2.0 host may have its own directly connected DRAM as well as the ability to access external DRAM across the CXL 2.0 link. Such external DRAM access will be slower, by nanoseconds, than the local DRAM access, and system software will be needed to bridge this gap.

MemVerge memory tiering graphic.

A CXL fabric controller can talk to Single Logical Devices (SLDs) or Multi-Logical Devices (MLDs). In a CXL 2.0 domain multiple hosts can be connected to a single CXL device, and it can be useful to divide the device’s resources into multiple host recipients. A supporting MLD can be virtually separated into up to 16 logical devices (LDs) each with their own LD-ID.

    Computational Storage

    Computational Storage – A storage drive with computational facilities in the drive and used for working directly on the data stored in the drive. The aim is to reduce data movement inside a computer system and so save time, electrical energy and host CPU resources. Potential applications include compression/decompression, encryption and video transcoding.

    Composable Systems

    Composable Systems – These dynamically compose servers from separate pools of base server (CPU+DRAM) nodes, GPUs and other accelerators, NVMe storage, Optane SSDs and network interface cards. These servers are then presented to server operating systems and applications as exactly equivalent to static bare metal servers with no change to any upstream software.

    CMOS

    CMOS – Complementary Metal-Oxide-Semiconductor. This is a fabrication technology for transistors used in the construction of integrated circuit (IC) chips, such as microcontroller, microprocessors, DRAM and NAND memory chips, and other typs of digital logic circuits. It is based on Metal Oxide Semiconductor Field Effect Transistors (MOSFETs)which have symmetrical and complementary pairs of p-type and n-type MOSFETs.

    Storage news ticker – April 20

    An updated all-flash array history chart includes newer entrants like VAST Data and StoreOne, older ones like Fusion-io, Violin and Vexata, and more acquisitions. Enjoy this snapshot and tell me if there are any omissions. (You can see I’m running out of space!)

    There is a general timescale from left (older) to right (newer) but it’s not that accurate.

    William Blair analyst Jason Ader tells subscribers: “We hosted investor meetings with Box CEO Aaron Levie and CFO Dylan Smith, and came away with reinforced confidence in the increasing value of the Box Cloud, its strong product-market fit and competitive differentiation in an era of its distributed work and content fragmentation, and the replicable rhythm of its suite-selling, go-to-market motion. Box’s ultimate objective – which we believe is beginning to come to fruition after many years of toil – is to create a centralized, cloud-based platform that will help distributed workers access and collaborate with their content and eliminate the various content silos that negatively impact their productivity.” Ader says secular trends are working in Box’s favor, and management is executing on both top-line acceleration and higher profitability.

    DPU chip hardware, system, software and composability startup Fungible announced the launch of its Fungible Partner Exchange (FunPX) partner program. FunPX allows resellers, distributors, and technology providers worldwide to connect, collaborate, and grow their composable infrastructure capabilities by delivering best-in-class technology solutions necessary to cloudify the world’s datacenters. Brian McCloskey, Fungible chief revenue officer, said: “We are investing in the tools, people, and processes to enable our partners to grow with us by providing immediate access to the essential information they need to win and help our mutual customers realize the benefits that the Fungible portfolio can bring to their infrastructure.”

    Granular and air-gapped backups are critical to data recovery when – not if – a business falls victim to ransomware. Those are among the key takeaways from a new Enterprise Strategy Group (ESG) study, titled “The Long Road Ahead to Ransomware Preparedness“. The authors surveyed information technology (IT) and cybersecurity professionals working within organizations across North America and Western Europe. The study was sponsored by Keepit, which claims to be the world’s only vendor-neutral and independent cloud dedicated to Software-as-a-Service (SaaS) data protection based on a blockchain-verified offering.

    The Linear Tape Open (LTO) organization has issued a ten-year history of compressed capacity shipped and units shipped from 2011 to 2021. 

    The ** is a pointless note saying only ten years of compressed capacity shipments have been charted.

    Total shipped compressed capacity in exabytes rose 40 per cent from 2020 to 2021, reaching 148EB – a record. The depressed 2018, 2019 and 2020 numbers were due to LTO-8 supply problems as the two media suppliers, Sony and Fujifilm, argued amongst themselves, settling in late 2019. Then the COVID-19 pandemic upset customers’ buying patterns again. These have been overtaken by the great rise in unstructured data needing cheap and resilient storage which leads to the 40 per cent capacity shipped jump from the 105.7 compressed EB shipped in 2020.

    Unit sales (tape cartridges) rose slightly year-on-year as an LTO-issued chart shows: 

    It would be good to know the split between on-premises and public cloud/hyperscaler sales, but those numbers are a closely guarded secret.

    Media and entertainment market object storage supplier Object Matrix announced that partner Scale Logic launched its Object Storage offering based on Object Matrix tech. Scale Logic says that, with its active archiving functionality for long-term data retention, the system enables users to offload, migrate, store, and centralize files from various sites within a less expensive, secondary tier. This functionality frees up space while decreasing costs and the need for additional IT resources. It includes built-in support for WORM and S3 object locking to enable continuous data integrity.

    HPC storage supplier Panasas has announced the appointments of Jeff Whitaker as VP of marketing and product management, Kieran Penwill and Chris Sassone as sales directors of EMEA and APAC, respectively, and Mike Sheppard as global director of channels and alliances. Brian Peterson, Panasas COO said: “The ActiveStor Ultra product line has received high praise from both established and new customers, and Panasas has enjoyed a sharp increase in demand over the last 18 months. As a result, we are investing further in our teams and our products to support our expanding global customer base.” Current VP corporate marketing Todd Ruff says on LinkedIn he’s looking for a new post.

    ….

    Renesas Electronics Corp. introduced the first clock buffers and multiplexers that meet PCIe 6 specifications. It is offering 11 new clock buffers and four new multiplexers. The devices, which also support and provide extra margin for PCIe 5 implementations, complement Renesas’s low-jitter 9SQ440, 9FGV1002 and 9FGV1006 clock generators to offer customers a complete PCIe 6 timing solution for datacenter/cloud computing, networking and high-speed industrial applications. The PCIe 6 standard supports data rates of 64GT/sec (that’s gigatransfers per second) while requiring very low clock jitter performance of less than 100 fs RMS.

    … 

    SoftIron, which supplies purpose-built and performance-optimized Ceph-aligned datacenter systems, announced the expansion of its CTO office as it realigns internally and expands into new engineering roles. Kenny Van Alstyne, SoftIron CTO, said the development reflects SoftIron’s desire to expand its patent portfolio and intellectual property while deepening its engagement with the open source community through high-level strategic engineering initiatives. The company is looking for engineers who are passionate and dedicated to the open source ethos and in developing sophisticated software that makes a meaningful impact. The first two projects that will benefit will be Ceph and SONiC, which was recently donated to The Linux Foundation.

    An IDC Perspective report on VAST Data by analyst Eric Burgener – “Growing Data Demands Are Driving VAST Data’s Continued Hypergrowth” – is extremely laudatory. Burgener writes: “With $220 million in the bank, continued triple-digit growth for the future, and a net revenue retention from existing customers of over 300 percent, VAST Data is very strong financially and sustainably differentiated from the competition. Not only is it a vendor to watch in enterprise storage, it is the vendor to watch.” The man is in love. He adds: “One extremely notable aspect of [its Gemini] go-to-market model for customers is that they get hardware discounts based on the volume of all VAST Data purchases that go through Avnet, not just their own. This means that customers deploy like enterprises with appliances but buy like hyperscalers with very high discounts.” Get this seven-page VAST Data love-in doc here.

    CIFS

    CIFS – Common Internet File System, pronounced ‘sifs’. This is a Windows filesystem SMB (Server Message Block) dialect. The CIFS implementation of SMB is rarely used these days and most modern storage systems use SMB 2 or SMB 3. CIFS gained a reputation for problematic client/server interactions, proprietary extensions, and poor performance over high latency networks. Because of this Microsoft comprehensively re-engineered it, effectively replacing it with the faster-performing SMB 2 in Windows Vista in 2006 and then SMB 3 in Windows 8 and Windows Server 2012.

    Charge Trap

    Charge Trap – Charge Trap transistors use an insulator, silicon nitride film, to store electrons in localized traps in a NAND or NOR cell. This is in contrast to NAND and NOR Floating Gate transistors, which use a a polycrystalline silicon conductive layer, isolated between insulating (dielectric) layers, to store electrons.

    CBA

    CBA – CMOS directly Bonded to Array. In this NAND technology each CMOS control wafer and cell array wafer are manufactured separately and then bonded together to deliver, Kioxia says, enhanced bit density and fast NAND I/O speed. It is an alternative to monolithic NAND fabrication in which the control logic and NAND cell array are fabricated on the same CMOS wafer. See CMOS entry.

    CAS

    CAS – Content Addressing System. A data item’s contents are binary numbers. They can be mathematically manipulated by a cryptographic function to provide a hash or key, a unique number, which is then used to locate the data item in an address space. A directory stores these addresses with a pointer to the physical storage of the content. Any attempt to store the same file will generate the same key, and any change to the file will generate a new key. This property can be used to show that a file has not been changed. See also object storage.