Get Free Shipping on orders over $79
Engineering Lakehouses with Open Table Formats : Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake - Dipankar Mazumdar

Engineering Lakehouses with Open Table Formats

Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake

By: Dipankar Mazumdar

eText | 26 December 2025 | Edition Number 1

At a Glance

eText


$54.99

or 4 interest-free payments of $13.75 with

 or 

Instant online reading in your Booktopia eTextbook Library *

Why choose an eTextbook?

Instant Access *

Purchase and read your book immediately

Read Aloud

Listen and follow along as Bookshelf reads to you

Study Tools

Built-in study tools like highlights and more

* eTextbooks are not downloadable to your eReader or an app and can be accessed via web browsers only. You must be connected to the internet and have no technical issues with your device or browser that could prevent the eTextbook from operating.

Jumpstart your journey towards mastering open data architectural patterns by learning the fundamentals and applications of open table formats

Key Features

  • Build open lakehouses with open table formats using popular compute engines such as Apache Spark, Apache Flink, Trino, and Python
  • Optimize Lakehouse performance with advanced techniques such as pruning, partitioning, compaction, indexing, and clustering
  • Learn how to enable seamless integration, data management, and interoperability using Apache XTable
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

Engineering Lakehouses with Open Table Formats provides detailed insights into lakehouse concepts, and dives deep into the practical implementation of open table formats such as Apache Iceberg, Apache Hudi, and Delta Lake. If you are a data engineer or architect looking to understand the intricacies of open lakehouse architectures, this book is for you. You'll start by exploring the internals of a table format and learn in detail about the transactional capabilities of lakehouses. You'll also work with each table format with hands-on exercises using popular computing engines such as Apache Spark, Flink, Trino, dbt, and Python-based tools. The book addresses advanced topics, including performance optimization techniques and interoperability among different formats, equipping you to build production-ready lakehouses. With step-by-step explanations, you'll get to grips with the key components of Lakehouse architecture and learn how to build, maintain, and optimize them. By the end, you'll be proficient in evaluating and implementing open table formats, optimizing lakehouse performance, and applying these concepts to real-world scenarios, ensuring you make informed decisions in selecting the right architecture for your organization's data needs.

What you will learn

  • Explore Lakehouse fundamentals such as table formats, file formats, compute engines, and catalogs
  • Gain a complete understanding of data lifecycle management in lakehouses
  • Integrate lakehouses with Apache Airflow, dbt, and Apache Beam
  • Optimize performance with sorting, clustering, and indexing techniques
  • Use the open table formats data with ML frameworks like Spark MLlib, Tensorflow, and MLFlow
  • Interoperate across different table formats with Apache XTable and UniForm
  • Secure your lakehouse with access controls and ensure regulatory compliance

Who this book is for

This book is for data engineers, software engineers, and data architects who want to deepen their understanding of open table formats such as Apache Iceberg, Apache Hudi, and Delta Lake, and learn how they are used to build lakehouses. It is also a good fit for professionals working with traditional data warehouses, relational databases, and data lakes, who wish to transition to an open data architectural pattern. Basic knowledge of databases, Python, Apache Spark, Java, and SQL are recommended for a smooth learning experience.

on
Desktop
Tablet
Mobile

More in Data Warehousing

Critical Facilities Engineering - Yekini K. Tidjani

eBOOK

RRP $16.49

$15.99

Mastering SQL Server 2008 - Michael Lee

eTEXT