Autonomous vehicle data storage: We grill self-driving car experts about sensors, clouds … and robo taxis

In Internet of Things terms, an autonomous or near-autonomous vehicle (AV, NAV) is a mobile edge computing device that generates data needing to be processed for it to make its way across a network of roads. How much data? No one knows.

It’s the wild west in development terms and experts differ over how much data will be generated and stored each day, with up 450TB mentioned. 

Blocks & Files has previously discussed the autonomous and near-autonomous vehicle (AV, NAV) storage topic, and also asked John Hayes, the CEO of Ghost Automotive, about it in order to get a broad look at the subject. Now we’re going to take a deeper dive, via an email interview with three industry aces.

B&F talked to Christian Renaud, an analyst at 451 Research; Robert Bielby, senior director of Automotive System Architecture at memory-maker Micron; and Thaddeus Fortenberry, who spent four years at Tesla working on Autopilot architecture. He left in September 2019 and is talking to a number of AV companies.

Left to right: Christian Renaud, Robert Bielby, Thaddeus Fortenberry.

We’ll note quickly that the test phase of AV development will generate more data than the operational phase. We’re concerned here with the operational phase unless specifically saying otherwise. 

Also we point out there is going to be a difference in data generation between consumer-owned AVs and robo-taxi/ride-share. With the latter they will be operational for more hours each day and the operator will want more and different data from the vehicle than the average consumer who owns one.

Blocks & Files: How much data will near-autonomous and autonomous vehicles (AV) generate per day?

Christian Renaud, 451: This depends on a number of factors and there is no one (accurate) answer. If you assume all systems (DCUs, ECUs, multiple LIDARs, multiple long and short range radar, multiple ultrasonics, multiple cameras), operating for a typical consumer two hour duty cycle per day (so, excluding ride share drivers that may drive 10-12 hours a day), then you’re in the neighbourhood of 12-15TB/day generated by all the sensors, and then by the ISP/VSPs, V2X (Vehicle to Everything network links) systems, and the cellular vehicle connectivity gateway.  Seventy-eight per cent of OEMs plan to have between 50-70 per cent of all that data analysed locally (on car) versus being sent off vehicle.

Robert Bielby, Micron: Let’s just start by saying it’s a lot!  Though there is going to be quite a bit of variability to that number. First, it depends if you’re looking at a robo-taxi or a passenger car that has self-driving capabilities. The use profile of a robo-taxi is expected to be as close to 24 hours use as possible, where, if you were to look at the average use of a vehicle in the US, it’s estimated to be around 17,600 minutes a year, or on average about 48 minutes per day.  

The next variable to consider is the sensors’ types and the number of those sensors that are employed to realize self-driving capabilities. High resolution cameras typically produce very large amounts of data 500 – 11,500 Mbit/s whereas LIDAR produces data streams ranging from 20 – 100 Mbit/s. Today’s autonomous vehicles have a wide range of mix and number of sensor types and associated resolutions of those sensors. It’s not impractical to see a self-driving car employing 6 – 21 cameras – which means lots of data! When you do the math, you find data rates that range from 3 Gbit/s (~1.4TB/h) to 40 Gbit/s (~18 TB/h). Implying that for the average self-driving car you are looking at about 1 – 15 TB data per day (for US) whereas for robo-taxi this number can get as high as 450 TB data per day.

Thaddeus Fortenberry, ex-Tesla: My near-term estimate is ~1.6TB per car per day – without Lidar. Intel did a good analysis and project 3-4TB per car per day. We will likely see this wildly change in the next few years. Innovation is required in data optimization and connectivity choices.

Sending data to the cloud

Blocks & Files; Will this data be transmitted to the cloud for decision-making and analysis? Will the AV be completely self-sufficient and make its own decisions , or will there be a balance between the two approaches?

Christian Renaud, 451: There are control-loop decisions (Advanced Emergency Braking) that are local functions using radar. There are other functions, such as model training (like Tesla does with Autopilot) and V2X that are, by nature, multi-body problems and therefore have to go off-vehicle. If this is ‘cloud’ per se, or network operator infrastructure like multi-access edge computing for V2X (which has a 4ms response time requirement, so it’s not going to any clouds), depends on the specific workload. Navigation data is less latency sensitive, V2X being the other extreme.

Short answer, it will be a mix, but cars already (with active ADAS) have to function independent of any connectivity for liability reasons. They’ll just function better when they do have connectivity just like your Waze or Google Nav does today. An implementation of this that is not critical to the driving function is the BMW on-street parking information where they use the ultrasonic sensors on BMWs to tell other BMW owners where there are open parking places. Distributed sensors, cloud function, not super latency sensitive.

Robert Bielby, Micron: There are two distinct phases to realize self-driving capabilities – there is the training phase and the inference phase. During the training phase, the vehicle is concerned with collecting as much data about the surroundings and the road as possible to be used to ultimately train the deep neural network algorithms to accurately detect objects such as pedestrians, street signs, lane markers and the overall operating environment of the vehicle. 

During this phase, typically all sensor data is recorded and stored on large arrays of hard drives or SSDs for data capture. Based on the earlier discussion of data generation, the quantity of storage that is required can be significant. During this part of the training phase, training data is typically stored directly to [flash] memory – not sent to the cloud.

When the self-driving car is operating in the inference mode – once it has been successfully trained – the algorithms will continuously be monitoring the performance of the system for accuracy.  As AI algorithms are ultimately non-deterministic, if the system encounters a situation or object that is has never seen before or doesn’t recognize, the results can be very unpredictable. 

A well-designed system will deliberately introduce disparity into the system such that when a “new” situation or object has been encountered or detected, the system will recognize this, and the data associated with this disparity will typically be sent to the cloud for future algorithm improvements.  Similarly, other details such as real time map updates – very low rate data sets on the order of Kbit/s are regularly transmitted to the cloud to ensure a map with the most up to date information is maintained and available to all self-driving vehicles.

Thaddeus Fortenberry, ex-Tesla: Definitely a combination. The vehicle must be able to function without requiring connectivity, but the vision of ‘fleet learning’ to rapidly assimilate and publish information for more accurate driving requires effective ingest, back-end collaboration and processing and effective publishing. It all requires a managed environment respecting policy and optimizations.

Blocks & Files: Will the AV IT system be a centralised one with sensors and component systems networked to a single central compute/storage and networking resource or will it be a distributed one with a set of separate systems each with their own processor, memory, etc? Ie. navigation, braking, suspension, engine management, etc.

Christian Renaud, 451: There are ECUs and DCUs, and then GPUs (as well as a variety of embedded compute in sensors, a number of image and video signal processors, etc.). Each perform different functions. The ECU that controls, say, your electric window system on the driver door isn’t a target for consolidation into a GPU that is also doing sensor fusion for ADAS/AV.

Now the AV-specific function, which is a combination of inputs (LIDAR, radar, ultrasonics, cameras, accelerometers, GPS), sensor fusion, path planning, and outputs (robotic actuation) will be a combination of GPUs and ECUs. Some people think a single GPU will do the job (NVIDIA Jetson) and others think it’s a combo of xPUs (Intel Mobileye).

Robert Bielby, Micron: There is no one right school of thought on the approach – each has their own pros and cons and I expect you will see both architectures co-exist in the market. While there have been arguments made that fully centralized architectures, where all processing occurs at a centralized location leads to a robust architecture that can deliver self-driving capabilities at the most competitive price, these architectures can typically be challenged in the area of thermal management.  

Arguments have also been made that this architecture may not be as robust as a distributed architecture – where smart sensors with edge-based processing offer greater resiliency by virtue of the fact that the computation is indeed distributed.

While distributing the computing across the system can provide some relief in thermal management vs. a centralized architecture, close attention needs to be paid to local thermal management associated with close proximity processing in smart cameras due to detrimental effects on image sensor linearity and response associated with heat.  

Again, no one “best” solution – each represents a different set of tradeoffs and the choice is typically dependent upon the overall objectives of the platform.

Thaddeus Fortenberry, ex-Tesla: The direction I see most automotive companies going is to have components systems with volatile memory/storage where the data is moved and processed on the central ADAS computer. A key point is the level of RAW-ness of the data coming from the micro-controllers and/or sensors. Companies like Tesla want direct sensor data, whereas more traditional companies ask the ODM to give them processed data.

Again, I think we will see a lot of innovation here. Nvidia and Intel are creating solutions for Tier-2 car companies. They will be interested in creating whole vehicle systems incorporating all aspects of relevant data. The Tier-1 vehicle companies must decide where they will directly innovate and who (and how) they will partner with for sensors.

Blocks & Files summary

We can already see that there are great differences between our three experts in terms of data generation amounts per day. A table illustrates this:

Renaude and Bielby can see a consumer-owned AV generating up to 15TB/day while Fortenberry reckons 5TB a day as an upper limit. The robo-taxi data generation shoots up though, to a maximum of 450TB, half a petabyte near enough.

They are unanimous in thinking that only long reaction time reference data will be sent to the cloud for processing and storage by a car fleet operator or manufacturer. How much data this amounts to is unknown.

They also tend to favour a hybrid style of AV IT, with a central computer set-up allied to distributed subsystems. The degree of distribution will be limited in Bielby’s view by thermal management issues and by ODM/car manufacturer differences in Fortenberry’s view. Renaude thinks the centralised system could be GPU-centric or a mix of different processor types. A clear picture has not yet emerged.

We’ll soon be taking a look at how the AV data might be stored in the AV and transmitted to the cloud in a follow-up article. Stay tuned.