AI/ML

Databricks buys AI darling MosaicML for $1.3B

June 26, 2023

Lakehouse developer Databricks is buying generative AI startup MosaicML for $1.3 billion so its customers can build and deploy AI models on their own data.

Large language models (LLMs) have ushered in AI that can understand queries, analyze multiple data sources and respond with natural language answers or even produce programming language. However, they can also produce wrong or imaginary answers, and need significant GPU resources to run. MosaicML helps customers run on minimal systems, and have their models trained with their own and not generally available public data.

Databricks CEO Ali Ghodsi said: “Every organization should be able to benefit from the AI revolution with more control over how their data is used. Databricks and MosaicML have an incredible opportunity to democratize AI and make the Lakehouse the best place to build generative AI and LLMs.”

In April Databricks revealed its updated open source Dolly LLM to make its AI facilities available for business applications without needing massive GPU resources or costly API use. The chatbot can be used to generate queries that run against Databtricks’ lakehouse.

Naveen Rao and Hanlin Tang of MosaicML, which has been bought by Databricks — Naveen Rao and Hanlin Tang

MosaicML was founded in 2021 by CEO Naveen Rao, a former VP and general manager of Intel’s AI Products Group, and CTO Hanling Tang, previously the senior director of Intel’s AI Labs. It has pulled in $64 million in funding. MosaicML’s open source LLMs are based on its MPT-7B architecture, built with 7 billion parameters and a 64,000 token context window.

There have been over 3.3 million downloads of MPT-7B and the recent release of MPT-30B. The latter is significantly more powerful than MPT-7B and outperforms the original GPT-3. MosaicML says the size of MPT-30B was specifically chosen to make it easy to deploy on a single GPU – either 1xA100-80GB in 16-bit precision or 1xA100-40GB in 8-bit precision. MosaicML says other comparable LLMs such as Falcon-40B have larger parameter counts and cannot be served on a single datacenter GPU; this necessitates 2+ GPUs, which increases the minimum inference system cost.

The US firm has offices in San Francisco, New York, Palo Alto and San Diego. Customers include AI2 (Allen Institute for AI), Generally Intelligent, Hippocratic AI, Replit and Scatter Labs.

Databricks says that bringing in MosaicML’s tech will offer its customers a simple and fast way to retain control, security and ownership over their data without incurring high costs.

MosaicML’s optimization provides 2-7x faster model training compared to standard approaches and is linearly scalable. It claims multibillion-parameter models can be trained in hours, not days.

The entire MosaicML team is expected to join Databricks after the transaction closes. MosaicML’s platform will be supported, scaled and integrated over time. Customers will get a unified platform on which they can build, own and secure their generative AI models, training them with their own data.

Rao said: “We started MosaicML to solve the hard engineering and research problems necessary to make large scale training more accessible to everyone. With the recent generative AI wave, this mission has taken center stage. Together with Databricks, we will tip the scales in the favor of many – and we’ll do it as kindred spirits: researchers turned entrepreneurs sharing a similar mission.”

The proposed acquisition is subject to customary closing conditions, including any required regulatory clearances. Other generative AI startups may now be getting approaches from Databricks’ competitors.

Bootnote

Databricks is itself a startup, having been founded in 2013 and raising $3.6 billion across multiple rounds. The acquisition cost includes some retention packages for MosaicML employees.

Databricks buys AI darling MosaicML for $1.3B

Bootnote

ABOUT US

FOLLOW US

VAST Data cracks into HPC with Doudna supercomputer win

Storage news collection – July 3

Kioxia tunes SSD-based vector search for RAG workloads