Live data mover WANdisco says that for mass data needing hyperscale compute analysis, users need to move the data to compute in the cloud, and claims normal data gravity rules don’t apply.
WANdisco’s Data Activation Platform technology bridges internet edge environments, datacenters, and multiple public clouds for AI, machine learning and big data analysis. It views its technology as appropriate for constant data movement and not one-off migration products. The company has customers in three main markets: automotive, telco and manufacturing.
The big trend in the automotive market is connected electric vehicles (EVs) with ADAS (advanced driver assistance systems), which can generate 25GB data an hour when being driven. CEO Dave Richards said in a briefing: “In the edge there isn’t enough compute. You have to go to the cloud and its huge elastic compute.”
Richards blogged: “Edge computing doesn’t allow for the collective analysis of data distributed across many edge datacenters – the type of analysis that could inform product roadmaps or shape new business models.”
An edge datacenter in the automotive market could be one of a hundred datacenters fed with data from ADAS EVs, and local charging networks. You can’t run a machine learning training algorithm across all of them. This data needs analyzing and used for machine learning training globally to get the best results. You have to run ML training across the combined data. That can mean throwing 5,000 or more CPUs at it – and you can’t install them in an edge datacenter.
Richards said the amount of compute you need at the center is so vast that only the hyperscalers can supply it. Only they have this level of burst demand compute capacity available for hire.
The analysis need not be complex, according to Richards: “Basically it’s BigQuery running on an object store.”
What Richards is implying is that, apart from the need to analyze or train using a single massive data set, there is a reversal needed in the bring compute-to-storage argument. This relies on slow data transmission times to say that, in effect, data has gravity, and it is better (easier, faster) to bring compute to the storage rather than the stored data to the compute.
Apart from the situation where real-time decision making is needed at the edge, data can reach such a scale that you can’t bring compute to the edge; too much would be needed. Or it can need aggregating from multiple edge sites for ML training, say, and you can’t analyze at the edge sites in a distributed way; it has to be fed to a single central site.
At a large scale, compute capacity limitations overrule the data gravity argument.
Richards thinks the 200TB/day data level is a rough crossover point. Below that data gravity rules and it’s feasible to process it locally. Above that, then you can find you need hyperscaler-class burst compute for ML training and the data needs to be in the cloud – AWS, Azure or GCP.
WANdisco’s technology can move it up into the cloud constantly, rather than in single massive hits. It can copy the data from an edge site to a cloud, validate that the transfer is complete and then auto-delete the data at the edge, making room for fresh data to come in.
WANdisco marketing VP Chris Pemberton said: “80 percent of the data in automotive is learning data – and it’s sitting at the edge,” where it is of little use.
Making EV self-driving capabilities more reliable and certain is the big EV ML training activity we all know about. But another ML training application can be used by EV charging network suppliers to calculate how much electricity to supply to an edge charging site. Generate or buy too much and it has to be wasted, while obtaining an insufficient supply means you can’t satisfy the charging demand. In both instances huge amounts of money are involved, either wasted electricity generating capacity or chargers sitting idle because of inadequate electricity supply.
Richards thinks WANdisco will be selling to every power generator in Europe because of factors like this.
In general, he suggests that in the live data movement market WANdisco has no competition: “We have 100 percent market share.”
The analysis and ML environment can vary with vertical markets and WANdisco’s 10-person salesforce is organized by vertical market. It has just hired Thomas Wirtz away from Databricks to be its Director for Automotive and Manufacturing segments.
Richards said: “Ten enterprise sales guys pulled in $127 million in bookings. We think we can get to a billion dollars of bookings with 20 salespeople, and be the most profitable company the world has ever seen.”
With much of its sales activity taking place in the USA, it’s been rumored that WANdisco could be considering adding a US listing alongside its UK AIM market listing. The company has just put out a statement saying: “As a dual UK and US headquartered technology company, WANdisco has long stated its intention to consider an additional listing of its ordinary shares in the United States. The company can confirm that it is in the early stages of proactively exploring this option.”