The Ghost in the self-driving storage machine (OK, it’s a car)

Interview: Blocks & Files aims to get a sense of the market for data storage in autonomous and near-autonomous vehicles (AVs). Seagate and Renovo have suggested an AV could generate up to 32TB of data a day. Data will presumably be stored in the vehicle if the vehicle is not in constant or regular connection to a cloud data centre .

How much data storage will an AV need and what storage media will it use? Let’s ask John Hayes, the CEO of Mountain View-based Ghost Locomotion, a self-driving car software startup.  He co-founded the company in 2017 with CTO Volkmar Uhlig, who designed and built a fully automated programmatic media trading platform at Adello. Hayes was a Pure Storage co-founder and chief architect at the company from 2009 to 2017.

Ghost is focused initially on producing an AI system for self-driving cars on highways, not on rural or urban roads. The system is designed to retrofit existing vehicles and should launch this year.

Blocks & Files: How much data will near-autonomous and autonomous vehicles (AV) generate per day?

John Hayes, Ghost Locomotion CEO

John Hayes: At the sensor, an AV might generate TBs a day, however that’s not practical. It’s like saying your phone camera generates 0.5 GB/sec, while recording video when in practice it writes 0.5 MB/sec after the video is MPEG compressed.

The highest bandwidth sensors are cameras; distant second is lidar where each laser is 40 KB/sec x 256 lasers. That’s 10 MB/sec, but even simple compression brings that down to 1 MB/sec. Everything else (like IMUs, GPS or logs) is trivial amounts of data.

Compression is important because when you’re sending data around a realtime system, any transmission delays within the vehicle increase the latency for decisions. If you want a decision on information less than 100ms ago, that can only be done by compressing before transmission.

Our system, with 8 cameras, stores 10 GB/hour. That’s already a large enough transmission challenge that we cut back on the data to transmit. Doing 10x that would significantly reduce the duty cycle of a vehicle.

Blocks & Files: Will this data be transmitted to the cloud for decision-making and analysis? Will the AV be completely self-sufficient and make its own decisions , or will there be a balance between the two approaches?

Hayes: AVs will have to make their decisions entirely internally, speed of decision doesn’t support remote transmission and the Internet is unreliable. Most common reason to connect to a data center is remote driving or other exception handling where a person has to look around and make a decision for the AV. Here you’re limited by wireless bandwidth- probably at most a few compressed video streams.

Blocks & Files: Will the AV IT system be centralised with sensors and component systems networked to a single central compute/storage and networking resource. Or will it be a distributed one with a set of separate systems each with their own processor, memory, etc? Ie. navigation, braking, suspension, engine management, etc.

Hayes: There will most likely be a central computer for driving, and there will be a backup computer with more limited capabilities, like pulling over to a safe location. Powertrain management will probably be a separate computer again for isolation purposes.

Blocks & Files: Will the AV-generated data have to be stored in the vehicle and, if so, for how long?

Hayes: AV generated data will definitely be stored in the vehicle and companies that test autonomous vehicles tend to swap the data storage components rather than connecting either a wired or wireless network. This increases the duty cycle of the expensive ($150-500k) vehicles. In this scenario, data is held no longer than 12 hours.

In personally-owned AVs, it is preferable to store data on the vehicle rather than in the data centre for privacy/legal exposure reasons. It might be stored for a period of days to weeks, but not much longer than three months.

Blocks & Files: How will the data be uploaded to a cloud data centre? How often?

Hayes: There are two models, uploading everything in batch and processing for interesting data during an upload stage, or processing for interesting data and then uploading. AVs for testing will prefer the former because you want to keep the AVs in service. Personally-owned AVs are idle 20-23 hours a day and can use that time to reprocess data and select it for upload. Our upload target is <1 per cent of observed data.

Uploading will be whatever is cheaper, home Wi-Fi, or LTE.

Blocks & Files: What is the maximum amount of storage capacity that will be needed in an AV to cope with the data generation load and the worst case data transmission capability?

Hayes: Space/power limitations create a practical limits on the amount of data storage. You can put a rack server in a trunk, but only half depth and 4-6Us tall. Never seen one with a drive shelf.

Blocks & Files: Will disk drives or flash storage be used or a combination?

Hayes: Flash will probably be preferred because there’s lower service requirements and it’s a tiny percentage of the vehicle cost.

Blocks & Files: Assuming flash storage is used will the workload be a mixed read/write one? If so, how much endurance should the flash have? (AVs could have a 15+ year working life.)

AVs will almost certainly not have a 15 year working life. If they’re individually owned, the electronics will be replaced every few years inline with a typical consumer electronics cycle. If they’re used as a robo-taxi, the lifetime for vehicles is more determined by miles driven rather than calendar age putting their lifetime at 3-5 years.

Blocks & Files: Will the flash have to be ruggedised to cope with the AV environment with its vibrations and temperature/moisture variations?

Hayes: Car interiors are already suitable for people and auto-grade electronics are characterized by extended environmental range, rather than extended warranty. Ordinary max temperatures of 70-85C will work fine.


Semi and fully autonomous vehicles will not need a lot of in-vehicle data storage. In the test phase Hayes sees storage drives being swapped out (“swap the data storage components”) and in operation, personally owned vehicles will need to store up to a terabyte. That assumes 10GB/hour, with one-hour operation per day and data stored for up to 90 days. This is trivial.