Home Blog Page 262

Spin orbits could make better MRAM

Magneto-Resistive Random Access Memory (MRAM) has so far generally failed to replace SRAM because its Spin Transfer Torque (STT-MRAM) implementation is too slow and doesn’t last long enough. A new variant — Spin Orbit Transfer MRAM (SOT-MRAM) — promises to be faster, have a longer life and use less power.

To better understand what’s going on, we have to delve into how STT-MRAM works. This is my understanding and I am neither a physicist nor a CMOS electrical engineer, so bear with me. STT-MRAM is based on a magnetic tunnel junction. This is a three-layer CMOS (Complementary Metal Oxide Semiconductor) device with a dielectric or partially insulating layer between two ferromagnetic plates or layers. The thicker layer has a fixed or pinned magnetic direction. 

Blocks & Files diagrams.

The upper and thinner layer is called a free layer and its magnetic polarity can be set either way. When the magnetic polarity of both layers is in sync (parallel) then the electrical resistance of the device is lower than when the polarities are opposite, as shown in the diagram above. High or low resistance signals binary one or zero. The resistance is made high or low with a write current, stronger than the resistance-sensing read current, sent through the device.

Electrons in the magnetic layers have spins (angular momentum) which can be in an up or down direction. The current sent through the fixed layer is spin-polarised so that the bulk of its electrons spin in one direction. Some of these travel — or rather tunnel — through the dielectric layer into the free layer and can change its magnetic polarity, and thus the device’s resistance. Hence the name, magneto-resistive RAM. The change is permanent — MRAM is non-volatile, but repeated writes degrade the device’s tunnel barrier material and reduce its life. 

STT-MRAM can deliver higher speed by using a larger write current, but that shortens the device’s endurance. Or it can deliver longer endurance at the cost of slower speed.

Proponents of SOT-MRAM say the problem is due to the write and read currents using the same path, from terminal to terminal, across the device. By separating the two currents you can raise both speed and endurance. But you still need to set the magnetic polarity of the free layer.

This is done by passing the write current through a so-called strap layer set adjacent to the free layer.

Blocks & Files diagram.

This means the device now has three terminals — a point to which we will return. 

According to Bryon Moyer Semiconductor Engineering cited in Semiengineering.com, writing requires either a special asymmetric shaped strap layer or an externally-applied magnetic field. Research is progressing into field-free switching using using atomic-scale phenomena such as the Rashba effect, which is concerned with spin orbits and crystal asymmetry, and also the Dzyaloshinskii–Moriya effect related to magnetic vortices.

A particle can spin on its own axis or it can spin — orbit — around some other particle.  

We can say that electron spin orbits represent the interaction of a particle’s spin with its motion inside an electrical field, without actually understanding what that means. A “Nanoscale physics and electronics“ scientific paper stated: “Spin orbit coupling (SOC) can be regarded as a form of effective magnetic field ‘seen’ by the spin of the electron in the rest frame. Based on the notion of effective magnetic field, it will be straightforward to conceive that spin orbit coupling can be a natural, non-magnetic means of generating spin-polarized electron current.” 

It may be “straightforward” to CMOS-level electric engineers and scientists but this writer is now operating way out of any mental comfort zone.

Bryon Moyer Semiconductor Engineering says that researchers realised that with the right layering and combination of ferroelectric or ferrielectric materials, and with the right spin-index relationships, the magnetic symmetry can be broken to drive the desired [magnetic] orientation. In other words you can use spin orbits to set the free layer’s magnetic field in the desired direction.

Whatever the method, the magnetic polarity of the free layer can be set and the tunnel junction’s resistance be read as with STT-MRAM.

The three-terminal point means that an SOT-MRAM cell requires an extra select transistor — one per terminal — and this makes an SOT cell bigger than an STT cell. Unless this conundrum can be solved, SOT-MRAM may be restricted to specific niche markets within the overall SRAM market.

It is likely to be three years or more — perhaps ten years — before any SOT-MRAM products will be ready for testing by customers. In the meantime organisations like Intel, the Taiwan Semiconductor Research Institute (TSRI) and Belgium’s IMEC are researching SOT technology. We’ll keep a watch on what’s going on.

Storage news ticker – January 17

UK-based digital archiver Arkivium has been selected to take its petabyte-scale and SaaS-based ARCHIVER offering through to the final, pilot stage, in preparation for commercialisation on the European Open Science Cloud (EOSC) and elsewhere. It has successfully completed both the design and prototyping phase of ARCHIVER (Archiving and Preservation for Research Environments), a €4.8 million project. Arkivum has selected Google Cloud for all phases of the ARCHIVER project.

Dell EMC has just released AppSync v4.4, providing Integrated Copy Data Management (iCDM) with Dell EMC’s primary storage systems. A Dell EMC blog discusses PowerMax secure snapshots, application integration improvements, and deeper platform support for PowerStore arrays with, for example, NVMe-oFC support.

Hyve Solutions’ VP of technology Jay Shenoy told us that he believed Twitter’s Jan 2021 DriveScale acquisition was a normal acqui-hire — Twitter wanting some of DriveScale’s people and not its composable systems products. Shenoy previously worked at Twitter before joining Hyve. DriveScale was founded by CTO and then chief architect Satya Nishtala and chief scientist Tom Lyon. Shenoy said “None of the business people at DriveScale went to Twitter. … Tom and many other people from the software team are the only ones who went to Twitter.” Their remit is to build flexible infrastructure, whether it be on-premises or in the public cloud.

IBM’s Spectrum Fusion HCI v2.1.2 is generally available. IT includes enhanced active file management (AFM), a Container Network Interface(CNI) daemon, support for Red Hat OpenShift Container Platform, IBM Spectrum Protect Plus, Spectrum Scale Erasure Code Edition, also proxy server setup for the OpenShift Container Platform cluster.

We heard Veritas is going to hire Santosh Rao, currently a Gartner analyst with the title “principal product management, Amazon RDS/Aurora”. On asking, a Veritas spokesperson told us: “Veritas does not comment on rumours or speculation.”

Hyve: busy hyperscaler bees buy hardware, not software

Hyve Solutions sells storage hardware to its hyperscaler customers, but not software — because they don’t want it. What, then, do hyperscalers want?

Hyve is a TD Synnex business unit that sells, delivers and deploys IT Infrastructure — compute, networking and storage — to hyperscaler customers. That means the likes of AWS, Azure, Facebook, Twitter, etc., though it would not confirm any as customers. Typically the hardware is rackscale and complies with Open Compute Project standards. Hyve also sells into IoT and edge deployments — such as ones using 5G, where sub-rackscale systems may be typical and site numbers can be in the hundreds.

Jay Shenoy.

We were briefed by Jayarama (Jay) Shenoy, Hyve’s VP of technology and our conversation ranged from drive and rack connection protocols to SSD formats.

Shenoy said Hyve is an Original Design Manufacturer (ODM), and  focusses exclusively on hyperscaler customers and hardware. It doesn’t really have separate storage products — SANs and filers for example — with their different software. Separate, that is, from servers.

Shenoy said “We make boxes, often to requirements and specifications of our hyperscale customers. And we have very little to do with the software part of it.”

Storage and storage servers

Initially hyperscalers used generic servers for everything. “A storage server was different from a compute server only in the number of either disks or SSDs. … There really was not that much different at times about the hardware, being storage. That is changing now. But the change will manifest in, I would say, two or three years. Things move much more slowly than we would like them to move.”

One of the first big changes in the storage area was a drive type change. “So the first thing that happened was hard drives gave way to SSDs. And initially, for for several years, SSDs was a test. By now that trend is really, really entrenched so that, you may just assume that the standard storage device is an SSD.”

As a consequence, there was a drive connection protocol change. “In the last three years, SATA started giving way to NVMe, to the point where now, SATA SSD would be a novelty. … We still have some legacy, things that we keep shipping. But in new designs, it’s almost all NVMe.” 

Shenoy moved on to drive formats. “Three years ago, hyperscalers were sort of sort of divided between U.2 [2.5-inch] and M.2 [gumstick card format]. All the M.2 people kind of had buyer’s regret. … All three of them have confirmed that they’ve moved or are moving to the E1.S form factor.” 

E1.S is the M.2 replacement design in the EDSFF (Enterprise and Datacentre SSD Form Factor) ruler set of new SSD form factors.

Shenoy said “There are primarily three, thank goodness. The whole form factor discussion settled out. That happened maybe a year and a half ago, when Samsung finally gave up on its NF1 and moved to EDSFF.”

There are two basic EDSFF formats: E1 and E3. Each comes in short and long sub-formats, and also with varying thicknesses:

SNIA’s EDSFF form factor details.

The E3 form factor allows for x4, x8, or x16 PCIe host interface. Shenoy’s basic threesome are E1.S, E1.L and E3.

Shenoy told Blocks and Files “At a high enough level, what seems to me to be happening is E3 is going into … what would have been known. What is still known, as enterprise storage with features like dual porting. …  E1 does not really support dual porting.”

Hyperscalers have adopted the E1 format, “but to this date, I have not come across a single one that has an E3 [drive].” He explained that “the people who picked U.2 are happy with U.2 and are going to like U.3 instead of E3.”

The difference for hyperscalers between E1.S and the E1.l format is capacity. “[With] E1.S versus E1.L the difference is exactly one capacity point, meaning the highest capacity point in the SSD line. … Only the highest capacity port will be offered in E1.L. And everything else will be offered in E1.S. So even .L is basically restricted to whoever is going to be today adopting the 30 terabyte drive capacity point.” 

Edge computing

“In terms of hardware, there’s one other subtle, subtle change — well, actually, not so subtle — that’s been happening in hyperscale. When we think of hyperscale, we think of big datacentres, hundreds of thousands of nodes, and racks, and racks, and racks and racks. But edge computing … has stopped being a buzzword and started to be very much been a reality for us in the last couple of years.

“The way edge computing changes storage requirements is that hotswap of SSDs has come back. Hotswap — or at least easy field servicing — has come back as a definite requirement.” Shenoy thinks that “Edge computing driving different form factors of servers is a given.”

Another change is disaggregation. Shenoy said “The larger hyperscalers have basically embraced network storage. So it’s basically object storage or object-like storage for the petabytes or exabytes of data [and] that’s placed on hard disks. And SSD is a fast storage. It’s a combination of either — it can be object storage, or sometimes it has to be even block storage.”

PCIe matters

What about the connection protocols in Hyve’s hyperscaler world? For them, “Outside of the rack … can do anything as long as it’s Ethernet. Within the rack, there’s much more leeway in what cables people will tolerate, and what exceptions people are willing to make.” Consider hyperscale storage chassis as basically boxes of flash or disks — JBOFs or JBODs. As often as not they’ll be connected with a fat PCIe cable. In Shenoy’s view, “PCIe within a rack and Ethernet outside of a rack, I think is already a sort of a thing or reality.”

What about PCIe generation transitions — from gen 3 to 4 and from 4 to 5? Shenoy said there are very few gen 4 endpoints as yet, especially GPUs, which “were almost ready before the server CPUs. … It’s partly the success of AMD, as Rome had to do with being the first with PCIe 4.” In fact the “PCIe gen 4 transition happened rather quickly. As soon as the server CPUs were there, at least some of the endpoints were there. SSDs took a little bit of time to transition to gen 4.”

But PCIe 5 is different: “Gen 5 …. and the protocol that writes on top of gen 5, CXL, is turning out to be … very different. It’s turning out to be like the gen 2 to gen 3 transition, where the CPU showed up and nobody else does.”

According to Shenoy “The gen 5 network adapters are also lacking actually … to the point where the first CPU to carry gen 5 will probably go for, I don’t know, a good part of a year without actually attaching to gen 5 devices. That’s a long time.”

Shenoy was keen in this point. “The gen 5 transition is kind of looking a little bit like the gen 3 transition. … Gen 3 stayed around for a really, really long time.”

Chassis and racks

Hyve has standard server and storage chassis building blocks. “We have our standard building blocks. And then we have three levels of customisation. … So chassis is one of the things that we customise pretty frequently.” Edge computing systems vary from racks to sub-racks. The edge computing racks may be sparser — Shenoy’s term — than datacentre racks. Every one, without exception, will contain individual servers and may also contain routers and even long haul connection devices. 

They are more like converged systems than Hyve’s normal datacentre racks.

Customer count

How has Hyve’s hyperscaler customer count changed over the years? Shenoy said “The number of hyperscalers has gone up, or the number of Hyve customers has gone up slightly, I would say, in the last three years. 

“The core hyperscale market has not changed. Three years ago, or five years ago, you had the big four in the US — big five if you count Apple — and then the big three in China. And then there was a bunch of what Intel calls the next wave” — that means companies like eBay and Twitter. “These would have been known ten years ago as CDNs. Today it would be some flavour of edge computing. I am making reference to them without [saying] whether they are our customers or not.”

He said “Then the other type of customers that have come into our view … are 5G telcos.”

That means that the addressable market for Hyve in terms of customer count three to five years ago was around ten, and is now larger than that — possibly in the 15 to 20 range. It is a specialised market, but its buying power and its design influence — witness OCP — is immense.

That design influence is affecting drive formats — witness the EDSFF standards set to replace M.2 and 2.5-inch SSD form factors. Nearline high-capacity disk drives will stay on in their legacy 3.5-inch drive bays. The main point of developing EDSFF was to get more SSD capacity into server chassis while overcoming the consequent power and cooling needs.

Mainstream enterprise storage buyers will probably not come into contact with Hyve, unless they start rolling out high numbers of edge sites using 5G connectivity. Apart from that, Hyve should remain a specialised hyperscale IT infrastructure supplier.

Bootnote: Distributor Synnex became involved in this whole area when it started supplying IT infrastructure to Facebook. That prompted it to form its Hyve business ten years ago. The rise of OCP and its adoption by hyperscalers propelled Hyve’s business upwards.

The TD part of the name comes from Synnex merging with Tech Data in September last year, with combined annual revenues reaching almost $60 billion. This made it the IT industry’s largest distributor.

WAF

WAF – Write Amplification Factor – indicates the extra NAND media write operations the SSD performs in response to the host writes (WAF = total NAND media writes / total host issued writes). The additional media writes are necessitated due to the way in which NAND flash media handles writes. NAND flash pages once programmed cannot be overwritten, unless the entire flash block (1 block = N pages) is erased again. Since the erase is a costly operation, the SSD firmware avoids the unnecessary erase operations by handling the writes in a log structured manner. So any overwrites will be redirected to a new flash page and the old page is marked as invalid. Eventually many such invalid pages are generated. The Garbage Collection (GC) process of SSD will handle such pages by moving the valid pages to a new flash block. GC process also releases these flash blocks for erase and eventually for new writes. The GC causes additional writes due to the movement of valid pages, in addition to the ongoing host initiated writes. This process is the root cause of WAF problem in SSDs. The extent of the WAF problem can vary based on the active host workloads. For example, a sequential workload might not cause much of WAF since it aligns with the log structured write design of the SSD software, while a random workload with lots of overwrites can cause high WAF in SSD. [Note from Samsung Tech Blog.]

CFDP

CDFP – CDFP is short for 400 (CD in Roman numerals) Form-factor Pluggable, and designed to provide a low cost, high density 400 Gigabit Ethernet connection.

OSFP MSA

OSFP MSA – Octal Small Form Factor Pluggable (OSFP) Multi Source Agreement (MSA). The OSFP (x8) and its denser OSFP-XD (x16) variants both support the latest signaling rate of 224G PAM4 per lane (for example 8 x 200G = 1.6 Tbps Ethernet). They are compatible with PCIe Gen5 / CXL 1.1, 2.0 (32G NRZ), PCIe Gen6 / CXL 3.x (64G PAM4) and PCIe Gen7 / CXL 4.x (128G PAM4). This OSFP cabling system is future-proof for 2 generations ahead in the PCIe domain. It is also ready for UALink that reuses Ethernet IO at the electrical level.

QSFP

QSFP – Quad Small Form-factor Pluggable standard referring to transceivers for optical fiber or copper cabling, and providing speeds four times their corresponding SFP (Small Form-factor Pluggable) standard. The QSFP28 variant was published in 2014 and allowed speeds up to 100 Gbps while the QSFP56 variant was standardized in 2019, doubling the top speeds to 200 Gbps. A larger variant Octal Small Format Pluggable (OSFP) had products released in 2022 capable of 800 Gbps links between network equipment. 

Storage news ticker – January 14

A Backblaze blog claims public cloud service provider data retention minimum periods are like a data deletion tax. “Let’s call retention minimums what they really are: delete penalties. We stand against delete penalties. We don’t charge them. We see them as the enemy of every use case in which data is intentionally replaced or deprecated in hours, days, or weeks instead of months.” AWS S3, for example, has minimum retention periods defined in pricing page footnotes. Other CSPs bury them deep inside their terms of service or FAQs. Backblaze says they should delete the delete penalties. 

The 2022 edition of the Flash Memory Summit is a live event scheduled for August 2–4 at the Santa Clara Convention Centre. FMS says it “offers the unique industry experience to explore the latest data storage standards, the newest innovations of memory technology and product advances to help attendees create the most competitive, high-performance storage solutions for on-premise datacentres and public cloud locations.” The program manager is Tom Coughlin (tom@tomcoughlin.com).

IDC has produced a v2 2021 Worldwide Edge Spending Guide looking at the 2020–2025 period. It defines the edge as “the technology-related actions that are performed outside of the centralized datacenter, where edge is the intermediary between the connected endpoints and the core IT environment. Characteristically, edge is distributed, software defined, and flexible.” Worldwide spending on edge computing is expected to be $176 billion in 2022 — an increase of 14.8 per cent over 2021. For enterprise adopters, the edge use cases with the largest investments in 2022 include manufacturing operations, production asset management, smart grids, omni-channel operations, public safety & emergency response, freight monitoring, and intelligent transportation systems.

Cloud Titan OEM deals mean advantage NetApp

NetApp’s OEM deals with the AWS, Azure and Google public clouds set it ahead of all other file-focused storage providers.

Jason Ader, a William Blair financial analyst, gave subscribers the benefit of his interview with NetApp EVP and general manager for public cloud Antony Lye. “No other storage vendor’s technology sits behind the user consoles of three big cloud service providers and is treated as a first-party service, sold, supported, and billed by the CSPs themselves.”

He is referring to OEM-type deals in which NetApp’s ONTAP file and block data management software is sold as a service by the three big CSPs: AWS, Azure and Google.

NetApp provides two ways for customers to get ONTAP in the three public clouds. The first is as a self-managed Cloud Volumes ONTAP service with software available through the respective marketplaces, while the second is through services sold, supported and billed by the cloud providers themselves.

These are:

Blocks & Files diagram.

NetApp receives a cut of the revenue from these services. Thus, with Azure NetApp Files (ANF), “NetApp is compensated monthly by Microsoft based on sold capacity.” Customers for ANF typically run SAP, VMware-based apps, VDU apps and legacy RDBMS apps. SAP has certified ANF so “NetApp is able to offer backups, snapshots and clones in the context of the application itself.”

Lye told Ader that Amazon FSx for NetApp ONTAP “took two and a half years to stand up.” And, because FSx sits behind the AWS console, NetApp has been able to provide native integrations with popular Amazon services like S3, AKS, Lambda, Aurora, Redshift and SageMaker.”

AWS also offers FSx for Lustre and FSx for Windows File Server but AWS execs “have publicly stated that FSx for NetAppONTAP is one of AWS’s fastest-growing services right now.” According to Lye, 60 per cent of FSx for ONTAP customers are new to NetApp.

Because NetApp is seen as “the clear leader in file storage and file-based protocols” the three cloud titans “have not felt the need (at least not yet) to integrate as deeply with NetApp’s competitors.”

Ader’s NetApp public cloud offerings table.

NetApp also provides a set of CloudOps services (Spot by NetApp) covering DevOps, FinOps and SecOps (development, cos and security) based on five acquired products; Spot, CloudHawk, CloudJumper, Data Mechanics, and CloudCheckr

They are aimed at customers with cloud-native application development and their DevOps architects. Such customers are often new to NetApp and represent a storage cross-sell opportunity. 

The company also offers its Cloud Insights facility, cloud instantiations of its OnCommand Insight software to help with “storage resource management, monitoring and security”. This is sold both direct to customers and through the three CSP marketplaces. Altogether the Insight offerings cover on-premises, hybrid and public cloud scenarios, supporting legacy and cloud-native applications.

From Ader’s point of view no other enterprise storage supplier is as well-placed as NetApp for providing hybrid cloud storage and data services. Until its competitors catch up — if they ever do — NetApp has a clear lead and its public cloud-related revenues should increase steadily.

Fewer disk units shipped in Q4 ’21 as nearline rises

There were fewer disk drives shipped in 2021’s last quarter than a year ago, but the nearline segment saw a 38 per cent unit ship rise.

Approximately 64 million drives were shipped in the quarter, according to preliminary numbers from research house TrendFocus — a 9 per cent year-on-year fall. Within that number about 18.5 million nearline (3.5-inch high-capacity) drives were shipped — up 38 per cent from a year ago (13.39 million). 

There was a quarter-on-quarter decline in nearline disk ship units, with 19.75 million being shipped in Q3 2021 but Wells Fargo analyst Aaron Rakers thinks it was due to supply chain issues affecting shipments to hyperscalers and some softening of demand in China, not an overall market slowdown.

Rakers estimates that nearline drive capacity shipped in the quarter was between 235 and 240EB, meaning a 60 per cent year-on-year increase, due to the average capacity per drive increasing. We saw 16 and  18TB drives shipping in 2021.

He says there was no significant change in vendor ship market share. Seagate shipped some 27.2 to 27.7 million drives in the quarter, giving it a near 43 per cent market share. Western Digital was next, with 23.5 to 24 million drives shipped and a near 37 per cent share. Toshiba had a 20 per cent share, having shipped 12.7 to 13 million drives.

The chart of disk drive segment revenue shares above shows a sawtooth liners in nearline drives, and also an unexpected rising trend in mission-critical enterprise drives (2.5-inch 10,000rpm – blue line) over the year. In general 2.5-inch mobile and branded drives and 3.5-inch PC and branded drives continued their decline as SSDs take over their data storage role. That takeover seems to be slowing down as a rump of customers prefer disk capacity and lower price than SSD’s speed at a higher cost.

Storage news ticker – January 13

Data protector Catalogic Software announced general availability for its latest DPX software, with enhancements to agentless backups for virtual environments, cloud archiving, and improved capabilities for compliance and ransomware protection. Version 4.8 has single file recovery enabling restoration of specific files or directories from VMware and Microsoft Hyper-V agentless backups, support for protecting data attached to SATA and NVMe controllers for VMware, and the option to run only full backups of VDI VMs. The vStor component gets AWS Object Lock.

Danielle Sheer.

Commvault has appointed a new chief legal officer, Danielle Sheer, who will lead the company’s global legal and compliance teams and its governance, commercial, intellectual property, and privacy programs. Sheer previously served as general counsel at financial technology services company Bottomline and at cloud-backup SaaS solutions provider Carbonite. She currently serves on the board of directors of LinkSquares, the leadership board at Beth Israel Deaconess Medical Center, and on the steering committee for TechGC. 

MSP and cloud data protector Datto has hired Brooke Cunningham as its CMO. She comes from being area VP for global partner marketing and experience at Splunk, with time at Qlik, CA Technologies and SAP before that.  

Brooke Cunningham.

Scale-out file system provider Qumulo has announced the availability of AWS Quick Start for its Cloud Q offering running on AWS. This fully automated deployment experience enables customers to build fully cloud-first AWS file systems ranging from 1TB to 6PB in minutes. Qumulo offers 1TB and 12TB AWS free trials in the AWS Marketplace as well. AWS Quick Start for Qumulo Cloud Q supports almost all AWS regions globally and also supports deployments on AWS Local Zones and AWS Outposts.

Seagate says international non-profit organisation CyArk — which digitally records, archives, and shares world cultural heritage sites — has moved its vast data stores to Seagate’s Lyve Cloud. CyArk’s team used Seagate’s Lyve Mobile data transfer services to move its datasets from multiple on-premises storage devices and server to Lyve Cloud.

AIOps and one time app and infrastructure performance management business Virtana has raised $73 million in funding from Atalaya Capital Management, Elm Park Capital Management, HighBar Partners, and Benhamou Global Ventures. The company says that, with additional capital and resources, it will be able to accelerate its innovation and better meet the needs of customers through increased investment in product development, sales, and marketing. Virtana had a comprehensive exec makeover in November 2020 and the new team has convinced investors to stump up significantly more funding.

ReRAM developer Weebit Nano has joined the Global Semiconductor Alliance (GSA), described as the voice of the global semiconductor industry. Coby Hanoch, CEO of Weebit Nano, said “2022 will be a pivotal year for Weebit as we qualify our ReRAM IP, paving the path to customer volume production. Therefore, as we enter this new commercial phase of the business, it now makes sense for us to join the GSA, enabling us to more deeply engage and collaborate with partners, customers and peers.”

Open source distributed SQL databases Yugabyte has expanded the course offerings and certification opportunities of its free education program, Yugabyte University. It offers students free resources, including video downloads, hands-on labs, office hours, discussions, and proof of completion. Yugabyte University intends to support the growing demand for distributed SQL database professionals by offering free training courses to over 10,000 new students and awarding more than 4,000 professional certifications in 2022.

Ising on the cake: Sync Computing spots opportunity for cloud resource optimisation

Startup Sync Computing has devised a hardware answer to the problem that NetApp’s Spot solves with software: how to optimise large-scale public cloud compute and storage use.

Update. CEO Jeff Chou positions Sync vs NetApp’s Spot. 14 January 2022. SW focus. 17 January 2022.

It’s operating in near stealth, and what we describe here is not based on company announcements. Instead it relies on an article by one of its funders: The Engine, an MIT-based financial backer.

Enterprises are finding that using hundreds, if not thousands, of cloud compute instances and storage resources costs significant amounts of cash. It’s virtually impossible to navigate the complex compute and storage cloud infrastructure environments in real time or manage them effectively over time, meaning cloud customers spend more, much more than they actually need to in order to get their application jobs done in AWS, Azure and Google, etc.

The genius of the Spot.io company bought by NetApp lay in recognising that software could help solve the problem. Its Elastigroup product provisions applications with the lowest cost, discounted cloud compute instances, while maintaining service level agreements, and with a 70–90 per cent cost saving.

Now, two years later, a pair of MIT Lincoln Laboratory researchers argue the problem is getting so bad that navigating the maze of instance classes across time and clouds needs attacking with hardware as well as software. They say the problem, classed as combinatorial optimisation (CO), is analogous to physical world CO issues, such as the classic travelling salesman scenario. This is trying to find a route for the sales rep between a set of different destinations to minimise the time and distance travelled.

They have applied their CO algorithm expertise to designing hardware — a parallel processing item — to solve the specific cloud instance optimisation problem more effectively.

Suraj Bramhavar (left) and Jeff Chou (right). Image from The Engine.

Sync Computing was founded 2019 in by two people: CEO Jeff Chou and CTO Suraj Bramhavar. Chou was a high-speed optical interconnect researcher at UC Berkeley and a postdoctoral researcher running high-performance computing optical simulations at MIT. Bramhavar was a photonics researcher at Intel and then a technical staff member at MIT, developing photonic ICs and new electronic circuits for unconventional computing architectures.

Their company took in a $1.3 million seed round in November 2019 and more cash from an undisclosed venture round in October 2021. The company website provides a flavour of what they are doing, declaring: “Future performance will be defined not by individual processors but by careful orchestration over thousands of them. The Sync Optimization Engine is key to this transition, instantly unlocking new levels of performance and savings. … Our technology is poised to accelerate scientific simulations, data analytics, financial modeling, machine learning, and more. These workloads are scaling at an unprecedented rate.”

The OPU

Sync Computing’s Optimization Processing Unit (OPU) has a non-conventional circuit architecture designed to cope when the number of potential combinations (of instances and instance types for a job in the cloud) is too high for a current server to search through and find the best one. They say that is as the number of combinations scales up then their OPU’s performance overtakes that of general purpose CPUs and the GPUs, taking orders of magnitude less time to find the best combination.

THE OPU uses a design mentioned in a 2019 Nature article by the two founders and others, Analog Coupled Oscillator Based Weighted Ising Machine. This describes an “analog computing system with coupled non-linear oscillators which is capable of solving complex combinatorial optimisation problems using the weighted Ising model. The circuit is composed of a fully-connected four-node LC oscillator network with low-cost electronic components and compatible with traditional integrated circuit technologies.”

Diagram from Nature paper with rightmost image showing the OPU breadboard system.

The Ising model is a mathematical description of ferromagnetism in statistical mechanics and has become a generalised mathematical model for handling phase transitions in statistics. 

The paper showed that the OPU — an oscillator-based Ising machine instantiated as a breadboard — could solve random MAX-CUT problems with 98 per cent success. MAX-CUT is a CO benchmark problem where the solution is to produce a maximum cut (combination of options) no larger than any other cut.

The paper argues: “Solutions are obtained within five oscillator cycles, and the time-to-solution has been demonstrated to scale directly with oscillator frequency. We present scaling analysis which suggests that large coupled oscillator networks may be used to solve computationally intensive problems faster and more efficiently than conventional algorithms. The proof-of-concept system presented here provides the foundation for realizing such larger scale systems using existing hardware technologies and could pave the way towards an entirely novel computing paradigm.”

Update. We now understand that Sync is focusing on software rather than hardware for its initial product with hardware becoming necessary as the problem scales.

Sync versus NetApp’s Spot

Chou sent us his views on how Sync’s technology relates to NetApp Spot, saying: “Our solution goes much deeper technically than theirs, in fact you can use us on top of Spot (Duolingo is already using Spot).  The gains we got for them were on top of Spot instances.

“Fundamentally we deploy a level of optimisation that goes from the application down to the hardware, which is how we’re able to get even more gains. We are not just cost based, we can accelerate jobs as well.  We let companies choose if they want to go faster, cheaper or both.

“We are also cloud platform-agnostic, we work with AWS EMR, Databricks, etc.  Whereas [NetApp’s] Data Mechanics is only Spark on Kubernetes within the NETapp ecosystem.

“Longer term our “Orchestrator” product goes into cluster level scheduling to perform a global optimisation of all resources and applications; something nobody else is doing.”

Comment

Sync Computing’s OPU could optimise large-scale public cloud resources better, meaning faster and at lower cost. Dynamically too — beyond the point where conventional server processors and even GPUs give up. It is very early days for this startup, but its area of focus is the core of NetApp’s CloudOps business unit.

Earlier this month data protector Cobalt Iron said it had been awarded a patent that covered technology for the optimal use of on-premises and public cloud resources. This technology is based on operational and infrastructure analytics and responds to changing conditions; it’s dynamic. 

We have two established companies highlighting software approaches to solving the public cloud CO problem. If they have identified a large enough problem that is growing then Sync Computing has a good shot at making it.