The Iceberg age: StarTree latest to adopt popular table format

Published wed 23 Jul 2025 // 15:14 UTC

Iceberg is becoming the lingua franca of datalake table formats with StarTree the latest supplier to embrace it as a real-time backend.

Open source Iceberg is an open table format for large-scale analytics which doesn’t store or execute queries itself, as a traditional database would. It operates as a software layer atop storage systems like Parquet, ORC, and Avro, and cloud object stores such as AWS S3, Azure Blob, and the Google Cloud Store, which handle handles metadata, partitioning, and schema evolution. Iceberg provides ACID transactions, schema versioning, and time travel, with data querying and processing handled by separate SW such Apache Flink, Presto, Spark, Trino, and other analytics engines.

BANDF AD

The StarTree Cloud is a fully managed, cloud-native platform built on Apache Pinot, a real-time distributed OLAP (Online Analytical Processing) datastore. StarTree Cloud is designed for OLAP and enables low-latency querying (milliseconds) and high-throughput processing (10,000+ queries/second) of large-scale data from streaming sources (e.g., Apache Kafka, Amazon Kinesis) and batch sources (e.g., AWS S3, Snowflake). Now it can be both the analytic and serving layer on top of Iceberg.

StarTree claims Iceberg support can transform it from a passive storage format into a real-time backend capable of powering customer-facing applications and AI agents with high concurrency serving thousands of simultaneous users with consistent speed and reliability.

Kishore Gopalakrishna on the summit of Half Dome, Yosemite.

Kishore Gopalakrishna, StarTree co-founder and CEO, stated: “We’re seeing explosive growth in customer-facing, and increasingly agent-facing, data products that demand sub-second responsiveness and fresh insights. At the same time, Iceberg is emerging as the industry standard for managing historical data at scale.”

“As these two trends converge, StarTree is delivering unique value by acting as a real-time serving layer for Iceberg empowering companies to serve millions of external users and AI agents securely, scalably, and without moving data.”

BANDF AD

Recent Iceberg adoptees include Snowflake, Confluent, AWS with S3, SingleStore, and Databricks.

Paul Nashawaty, principal analyst at theCUBE Research, said: “Apache Iceberg is rapidly becoming the de facto standard for managing large-scale analytical data in the open data lakehouse—adoption has surged by over 60 percent year-over-year, according to theCUBE Research.”

StarTree asserts that most existing query engines built around Iceberg and Parquet struggle to meet the performance SLAs required for external-facing, high-concurrency analytical applications, and companies have historically avoided serving data directly from their lakehouse. It claims that by combining Iceberg and Parquet open table formats Pinot’s indexing and high-performance serving capabilities, StarTree offers real-time query acceleration directly on native Iceberg tables.

Unlike Presto, Trino and similar engines, StarTree says it’s built for low-latency, high-concurrency access, integrating directly with Iceberg, boosting performance with features such as:

BANDF AD

Native support for Apache Iceberg and Parquet in StarTree Cloud
Real-time indexing and aggregations, including support for numerical, text, JSON, and geo indexes
Intelligent materialized views via the StarTree Index
Local caching and pruning for low-latency, high-concurrency queries
No data movement required—serve directly from Iceberg
Intelligent prefetching from Iceberg, minimizing irrelevant data scans

Nashawaty reckons: “StarTree’s ability to serve Iceberg data with sub-second latency and without data duplication is a unique and timely advancement. It addresses a critical performance need for accessing historical data in modern data products.”

Support for Apache Iceberg in StarTree Cloud is available today in private preview. For more information, visit www.startree.ai.

The Iceberg age: StarTree latest to adopt popular table format

Storage news ticker - 13 March

Everpure tops SPECstorage Solution 2020 AI IMAGE benchmark charts

Solidigm strikes out in new AI computer vision direction

Lightbits and ScaleFlux demo 100x to 280x KV Cache acceleration

Qdrant pockets $50M to push composable vector search

VAST Data raises $1B at $30B valuation as AI storage demand surges

HPE networking boom offsets server dip as revenue hits $9.3B

Cohesity builds guardrails for rogue AI agents and their data access

VDURA pairs V5000 flash with WD Data60 and Data102 disk shelves

Everpure stretches ActiveCluster to metro-distance DR for file workloads

LucidLink Connect streams S3 buckets without the ingest headache

Box pitches 'virtual filesystem' layer for AI agents

MariaDB buys GridGain to cut latency for AI inference workloads

Storage news ticker – March 9

How AI is boosting gender equality in high-performance racing

News ticker – March 6

What’s the M in MWC stand for? Memory if you're Micron or SK Hynix

Nasuni buys Resilio following torrent of exec changes

Businesses still struggling to manage data budgets, deliver ROI when it comes to AI

Seagate HAMRs out production deal for 44TB Mozaic4+ drives with hyperscaler

Storage news ticker – March 2

AI server frenzy fuels record revenues for Dell