Ising on the cake: Sync Computing spots opportunity for cloud resource optimisation

Startup Sync Computing has devised a hardware answer to the problem that NetApp’s Spot solves with software: how to optimise large-scale public cloud compute and storage use.

Update. CEO Jeff Chou positions Sync vs NetApp’s Spot. 14 January 2022. SW focus. 17 January 2022.

It’s operating in near stealth, and what we describe here is not based on company announcements. Instead it relies on an article by one of its funders: The Engine, an MIT-based financial backer.

Enterprises are finding that using hundreds, if not thousands, of cloud compute instances and storage resources costs significant amounts of cash. It’s virtually impossible to navigate the complex compute and storage cloud infrastructure environments in real time or manage them effectively over time, meaning cloud customers spend more, much more than they actually need to in order to get their application jobs done in AWS, Azure and Google, etc.

The genius of the Spot.io company bought by NetApp lay in recognising that software could help solve the problem. Its Elastigroup product provisions applications with the lowest cost, discounted cloud compute instances, while maintaining service level agreements, and with a 70–90 per cent cost saving.

Now, two years later, a pair of MIT Lincoln Laboratory researchers argue the problem is getting so bad that navigating the maze of instance classes across time and clouds needs attacking with hardware as well as software. They say the problem, classed as combinatorial optimisation (CO), is analogous to physical world CO issues, such as the classic travelling salesman scenario. This is trying to find a route for the sales rep between a set of different destinations to minimise the time and distance travelled.

They have applied their CO algorithm expertise to designing hardware — a parallel processing item — to solve the specific cloud instance optimisation problem more effectively.

Suraj Bramhavar (left) and Jeff Chou (right). Image from The Engine.

Sync Computing was founded 2019 in by two people: CEO Jeff Chou and CTO Suraj Bramhavar. Chou was a high-speed optical interconnect researcher at UC Berkeley and a postdoctoral researcher running high-performance computing optical simulations at MIT. Bramhavar was a photonics researcher at Intel and then a technical staff member at MIT, developing photonic ICs and new electronic circuits for unconventional computing architectures.

Their company took in a $1.3 million seed round in November 2019 and more cash from an undisclosed venture round in October 2021. The company website provides a flavour of what they are doing, declaring: “Future performance will be defined not by individual processors but by careful orchestration over thousands of them. The Sync Optimization Engine is key to this transition, instantly unlocking new levels of performance and savings. … Our technology is poised to accelerate scientific simulations, data analytics, financial modeling, machine learning, and more. These workloads are scaling at an unprecedented rate.”

The OPU

Sync Computing’s Optimization Processing Unit (OPU) has a non-conventional circuit architecture designed to cope when the number of potential combinations (of instances and instance types for a job in the cloud) is too high for a current server to search through and find the best one. They say that is as the number of combinations scales up then their OPU’s performance overtakes that of general purpose CPUs and the GPUs, taking orders of magnitude less time to find the best combination.

THE OPU uses a design mentioned in a 2019 Nature article by the two founders and others, Analog Coupled Oscillator Based Weighted Ising Machine. This describes an “analog computing system with coupled non-linear oscillators which is capable of solving complex combinatorial optimisation problems using the weighted Ising model. The circuit is composed of a fully-connected four-node LC oscillator network with low-cost electronic components and compatible with traditional integrated circuit technologies.”

Diagram from Nature paper with rightmost image showing the OPU breadboard system.

The Ising model is a mathematical description of ferromagnetism in statistical mechanics and has become a generalised mathematical model for handling phase transitions in statistics. 

The paper showed that the OPU — an oscillator-based Ising machine instantiated as a breadboard — could solve random MAX-CUT problems with 98 per cent success. MAX-CUT is a CO benchmark problem where the solution is to produce a maximum cut (combination of options) no larger than any other cut.

The paper argues: “Solutions are obtained within five oscillator cycles, and the time-to-solution has been demonstrated to scale directly with oscillator frequency. We present scaling analysis which suggests that large coupled oscillator networks may be used to solve computationally intensive problems faster and more efficiently than conventional algorithms. The proof-of-concept system presented here provides the foundation for realizing such larger scale systems using existing hardware technologies and could pave the way towards an entirely novel computing paradigm.”

Update. We now understand that Sync is focusing on software rather than hardware for its initial product with hardware becoming necessary as the problem scales.

Sync versus NetApp’s Spot

Chou sent us his views on how Sync’s technology relates to NetApp Spot, saying: “Our solution goes much deeper technically than theirs, in fact you can use us on top of Spot (Duolingo is already using Spot).  The gains we got for them were on top of Spot instances.

“Fundamentally we deploy a level of optimisation that goes from the application down to the hardware, which is how we’re able to get even more gains. We are not just cost based, we can accelerate jobs as well.  We let companies choose if they want to go faster, cheaper or both.

“We are also cloud platform-agnostic, we work with AWS EMR, Databricks, etc.  Whereas [NetApp’s] Data Mechanics is only Spark on Kubernetes within the NETapp ecosystem.

“Longer term our “Orchestrator” product goes into cluster level scheduling to perform a global optimisation of all resources and applications; something nobody else is doing.”

Comment

Sync Computing’s OPU could optimise large-scale public cloud resources better, meaning faster and at lower cost. Dynamically too — beyond the point where conventional server processors and even GPUs give up. It is very early days for this startup, but its area of focus is the core of NetApp’s CloudOps business unit.

Earlier this month data protector Cobalt Iron said it had been awarded a patent that covered technology for the optimal use of on-premises and public cloud resources. This technology is based on operational and infrastructure analytics and responds to changing conditions; it’s dynamic. 

We have two established companies highlighting software approaches to solving the public cloud CO problem. If they have identified a large enough problem that is growing then Sync Computing has a good shot at making it.