Ahana: Lakehouses need community-governed SQL

It’s not just any old SQL lakehouses need, but open community source SQL, according to Ahana CEO and co-founder Steven Mih, a member of the governing board of the Presto Foundation. Ahana was started in 2020 with $4.8 million seed funding. The company raised $20 million in VC funding last year and is developing software for data lake analytics, using the Presto-distributed SQL query engine.

Update 1. Revised Trino history added; 22 Dec 2022.

Update 2. Linux Foundation and Presto Foundation comments added; 23 Dec 2022.

Stephen Mih, Ahana
Steven Mih

Presto is an open source project created by Facebook and used at Uber, Intel and many more businesses. It’s said to be the de facto standard for fast SQL processing of data lakes.

Mih is a past CEO of Alluxio and Aviatrix Systems. He was worldwide sales VP at NoSQL database supplier Couchbase before that. Alluxio produced open source data orchestration system software for analytics and machine learning in the cloud. He became a Governing Board Member of the Presto Foundation in late 2019. We asked Mih some questions to find out more.

Blocks & Files: Set the data analytics scene for us.

Steven Mih: Enterprises naturally want to amass as much data as possible. By combining the data diversity of a data lake – able to store structured, semi-structured and unstructured data that is streamed, loaded, or transformed for interactive, batch, or in-app workloads – with the management capabilities of a data warehouse, the data lakehouse provides the best way to amass the data.

Blocks & Files: But you see a problem?

Steven Mih: Data lakehouses’ diversity and volume of data has complicated  the world of data analytics. Separate components now handle storage, compute, table types, metadata catalog, and security and permissions. Enterprises can now choose among different fit-for-purpose analytics engines for these components. However, working with various engines requires higher levels of technical expertise and mastery of different query and programming languages. Most organizations have few experts qualified to run sophisticated analytics; therefore, the whole process can become error-prone, slow, and confusing.

Blocks & Files: How do we fix this?

Steven Mih: There’s a need for innovation to make analytics capabilities more streamlined and accessible throughout an organization. One way to accomplish this is to use SQL, which is familiar to many professionals who work with data, as the common query language throughout an organization’s analytics ecosystem.

Blocks & Files: So is it a solved problem or not?

Steven Mih: No, it’s not a solved problem. We need to use open source technologies for the lakehouse components. Unlike proprietary offerings, open source technologies provide the flexibility to adapt and start using tools and technologies without getting locked into a particular vendor or technology platform.

But one more consideration is important: who’s in charge of the technology? When open source technology is managed by a single corporate entity – as with MongoDB/Mongo, Redis/Redis Labs, or Trino/Starburst – the interests of the corporate entity tend to supersede the interests of the users.

Blocks & Files: What is the alternative that you favor?

Steven Mih: Open source technology can be managed by an openly governed community, as with MySQL, Kubernetes, or Linux Foundation Presto. By giving all users and participants “skin in the game,” open communities tend to be more flexible, deliver technical enhancements provided by and specifically beneficial to the whole community, and adopt governance practices that reflect the needs of all its members.

Blocks & Files: Could you describe the features of your ideal query engine?

Steven Mih: To democratize and enhance data analytics for enterprises with data lakehouses, I believe the ideal query engine would have the following characteristics:

  • High price-performance and scalability, to handle lakehouse ever-increasing analytics workload  
  • Use standard SQL as a common query language, to make analytics capabilities more widely accessible throughout an organization’s technical staff
  • Be based on open technology, to provide the flexibility to choose the best tools and technologies for specific needs, now and in the future
  • Governed by a truly open community, not beholden to any single vendor’s whims

Comment

The “governed by a truly open community” is the key aspect here in our view. It’s possibly more a matter of which IT tech open source philosophy you prefer than product features. Certainly the Presto Foundation has good open source credentials.

Ahana and Starburst are competitors. A Trino spokesperson tells us that; “In 2012, Dain Sundstrom, Martin Traverso, David Phillips and Eric Hwang created Presto. They worked on the project for six years at Facebook until 2018 when Facebook made it clear that they wanted tighter control of the project.”

“Maintaining the open source integrity of the project was very important to the original Presto creators, so as Facebook took ownership of the Presto brand as PrestoDB, the Presto creators continued their work on the original Presto, just under the name PrestoSQL (then rebranded to Trino in 2020).”

“So all that is to say that Trino, which is formerly PrestoSQL, is actually the original Presto, and PrestoDB is what Facebook took over and sold to the Linux Foundation.” 

The Trino execs founded Starburst to sell Trino connectors and support.

Linux Foundation Comment

The Linux Foundation took issue with the Trino spokesperson’s comments. Mike Dolan, Linux Foundation SVP & GM of Projects, and Girish Baliga, Presto Foundation Chair, told Blocks & Files: “First, Facebook contributed the Presto project to the Linux Foundation. It was not “sold”. This is an open source project that was transitioned into our nonprofit foundation to open the project’s governance to the community under a neutral entity.”

“The foundations also want to clarify some of the points:

  • Presto Foundation has (every year for years now) invited the creators of the Trino fork to join the project. Trino has not yet accepted the invitation. We understand this is because Trino prefers to retain control versus using an open governance decision-making model employed by the Presto community. That’s their decision and an option that may make sense for Trino.
  • The Presto Foundation (not Facebook) is the entity that governs the project. Facebook continues to participate, but the governance of Presto was transferred into the hands of the project’s community contributors. We refer to this a a “do-ocracy” model whereby the people doing the work in the project make decisions for the project.

“The Linux Foundation ensures that the Presto Foundation remains a community-controlled initiative governed by open and neutral contribution guidelines and policies. Everybody is invited to join the Presto Foundation. Anyone can contribute to the Presto project, regardless of whether their organization is an official member. I hope this helps add clarity to the situation.”