The Iceberg age: StarTree latest to adopt popular table format

Iceberg is becoming the lingua franca of datalake table formats with StarTree the latest supplier to embrace it as a real-time backend.

Open source Iceberg is an open table format for large-scale analytics which doesn’t store or execute queries itself, as a traditional database would. It operates as a software layer atop storage systems like Parquet, ORC, and Avro, and cloud object stores such as AWS S3, Azure Blob, and the Google Cloud Store, which handle handles metadata, partitioning, and schema evolution.  Iceberg provides ACID transactions, schema versioning, and time travel, with data querying and processing handled by separate SW such Apache Flink, Presto, Spark, Trino, and other analytics engines. 

The StarTree Cloud is a fully managed, cloud-native platform built on Apache Pinot, a real-time distributed OLAP (Online Analytical Processing) datastore. StarTree Cloud is designed for OLAP and enables low-latency querying (milliseconds) and high-throughput processing (10,000+ queries/second) of large-scale data from streaming sources (e.g., Apache Kafka, Amazon Kinesis) and batch sources (e.g., AWS S3, Snowflake). Now it can be both the analytic and serving layer on top of Iceberg.

StarTree claims Iceberg support can transform it from a passive storage format into a real-time backend capable of powering customer-facing applications and AI agents with high concurrency serving thousands of simultaneous users with consistent speed and reliability. 

Kishore Gopalakrishna on the summit of Half Dome, Yosemite.

Kishore Gopalakrishna, StarTree co-founder and CEO, stated: “We’re seeing explosive growth in customer-facing, and increasingly agent-facing, data products that demand sub-second responsiveness and fresh insights. At the same time, Iceberg is emerging as the industry standard for managing historical data at scale.”

“As these two trends converge, StarTree is delivering unique value by acting as a real-time serving layer for Iceberg empowering companies to serve millions of external users and AI agents securely, scalably, and without moving data.”

Recent Iceberg adoptees include Snowflake, Confluent, AWS with S3, SingleStore, and Databricks

Paul Nashawaty, principal analyst at theCUBE Research, said: “Apache Iceberg is rapidly becoming the de facto standard for managing large-scale analytical data in the open data lakehouse—adoption has surged by over 60 percent year-over-year, according to theCUBE Research.”

StarTree asserts that most existing query engines built around Iceberg and Parquet struggle to meet the performance SLAs required for external-facing, high-concurrency analytical applications, and companies have historically avoided serving data directly from their lakehouse. It claims that by combining Iceberg and Parquet open table formats Pinot’s indexing and high-performance serving capabilities, StarTree offers real-time query acceleration directly on native Iceberg tables.

StarTree Cloud graphic.

Unlike Presto, Trino and similar engines, StarTree says it’s built for low-latency, high-concurrency access, integrating directly with Iceberg, boosting performance with features such as:

  • Native support for Apache Iceberg and Parquet in StarTree Cloud
  • Real-time indexing and aggregations, including support for numerical, text, JSON, and geo indexes
  • Intelligent materialized views via the StarTree Index
  • Local caching and pruning for low-latency, high-concurrency queries
  • No data movement required—serve directly from Iceberg
  • Intelligent prefetching from Iceberg, minimizing irrelevant data scans 

Nashawaty reckons: “StarTree’s ability to serve Iceberg data with sub-second latency and without data duplication is a unique and timely advancement. It addresses a critical performance need for accessing historical data in modern data products.”

Support for Apache Iceberg in StarTree Cloud is available today in private preview. For more information, visit www.startree.ai.