Databricks Acquires Tabular for $1B+ - What's Behind the Deal?

< Back to insights

Databricks Acquires Tabular for $1B+ - What's Behind the Deal?

June 6, 2024

In major news for the data infrastructure , Databricks has announced its acquisition of Tabular, a company specializing in open table formats based on Apache Iceberg. This strategic acquisition enables Databricks to considerably enhance its data lakehouse platform by integrating advanced open table formats, offering users a faster, more reliable, and scalable data solution. The integration of Tabular's expertise and technology will improve Databricks' ability to manage large-scale data storage and analytics, particularly in optimizing AI workloads, leading to more efficient data retrieval, processing, and ultimately, faster and more accurate insights.

Understanding Apache Iceberg and Its Growing Adoption

Apache Iceberg, an open-source table format designed to handle large analytic datasets in data lakes and warehouses, has been gaining significant traction in the big data community. Its ability to address key challenges associated with big data, such as ensuring high performance, managing schema evolution, and enabling ACID transactions, has made it a preferred choice for organizations dealing with petabyte-scale data, a crucial requirement for modern data processing needs, including AI workloads.

Iceberg's support for features like time travel, which allows users to easily query historical data, and its optimization of data layout for improved read and write performance, make it particularly beneficial for AI use cases that require consistent and quick access to vast amounts of historical data for model training. Moreover, Iceberg's seamless integration with popular data processing and query engines like Apache Spark, Flink, Trino, and Presto, enhances its versatility and enables organizations to streamline their data pipelines, execute complex queries more efficiently, and at a lower cost.

The robust framework provided by Apache Iceberg for managing large datasets enhances data reliability and consistency, which are critical for AI and machine learning applications. This leads to more efficient data retrieval and processing, resulting in faster and more accurate AI model training and inference, making it a preferred choice for organizations aiming to optimize their data lakes and improve data accessibility and analytics capabilities.

Tabular: Pioneering Open Table Formats

Tabular, founded by former Netflix data team members Ryan Blue and Daniel Weeks, has quickly gained traction in the data infrastructure space by specializing in commercializing Apache Iceberg and its open table formats. Apache Iceberg was originally developed at Netflix to tackle the challenges of managing petabyte-scale datasets in their data lake, aiming to improve performance, support schema evolution, and enable more efficient querying and data processing. By open-sourcing Iceberg, Netflix sought to provide a robust table format that could be widely adopted by the industry to address similar big data challenges.

The widespread adoption of Apache Iceberg by leading companies such as Netflix, Apple, LinkedIn, and Adobe, along with its embrace by numerous data platforms, including Snowflake, Databricks, and Cloudera, demonstrates its effectiveness and versatility in the industry. Tabular's success in raising $37M from prominent investors like a16z, Zetta, and Altimeter further validates its position as a key player in the data infrastructure landscape.

The Strategic Importance of Iceberg to Databricks

Databricks' acquisition of Tabular is a strategic move that not only enhances its data lakehouse capabilities but also strengthens its competitive position. By bringing together the original creators of Apache Iceberg and Delta Lake, Databricks aims to eliminate limitations caused by format incompatibility, allowing users to work seamlessly across different lakehouse formats. While the future trajectory of Apache Iceberg remains to be seen, the integration of Tabular is a step towards more open data formats, with the long-term goal of evolving toward a single, open, and common standard of interoperability.

The industry's reaction to Databricks owning the two leading open table formats will be closely watched. Snowflake, for instance, recently unveiled Polaris, an open catalog implementation for Apache Iceberg, and may choose to fork Iceberg to ensure its continuity as an open format in a democratic setting.

Powering Your Data Infrastructure with Vertical Data

As businesses navigate the rapidly evolving landscape of data management and AI workloads, having access to the right infrastructure solutions is crucial. Vertical Data, a leading independent distributor of data center infrastructure solutions, including NVIDIA GPUs, is well-positioned to help organizations capitalize on the exponential growth in compute demand.

By partnering with Vertical Data, your data center, GPU leasing business, or other NVIDIA H100 or similar GPU-powered center can benefit from rapid equipment access, minimized red tape, maximized infrastructure revenue, efficient acquisition processes, unrivaled customer support, and optimized resource utilization. With $1T worth of data center infrastructure and hardware expected to be built by 2030, Vertical Data's innovative financing solutions can help your business bridge the compute demand-supply imbalance and be a part of this growth in compute power.

Vertical Data's hybrid "Service First" philosophy and deep understanding of global markets, coupled with its rock-solid Tier 1 Supply Chain and financial stability, set it apart in the highly competitive marketplace for data center infrastructure solutions. By minimizing red tape in the procurement and acquisition process and delivering unmatched service and capabilities, Vertical Data is well-equipped to transform your business' role in the data center and compute infrastructure ecosystem.

Schedule a Call with Vertical Data to Learn More About Our Distribution Model