Building Big Data Pipelines with Apache Beam

Use a single programming model for both batch and stream data processing

By: Jan Lukavský

Write A Review

eText | 22 August 2102 | Edition Number 1

Sorry, we are not able to source the ebook you are looking for right now.

We did a search for other ebooks with a similar title, however there were no matches. You can try selecting from a similar category, click on the author's name, or use the search box above to find your ebook.

Why choose an eTextbook?

Instant Access *

Purchase and read your book immediately

Read Aloud

Listen and follow along as Bookshelf reads to you

Study Tools

Built-in study tools like highlights and more

* eTextbooks are not downloadable to your eReader or an app and can be accessed via web browsers only. You must be connected to the internet and have no technical issues with your device or browser that could prevent the eTextbook from operating.

Implement, run, operate and test data processing pipelines using Apache Beam

Key Features

Understand how to improve usability and productivity when implementing Beam pipelines
Learn how you can use stateful processing to expand the capabilities of Apache Beam
Implement, test, and run Apache Beam Pipelines with the help of practical tips and techniques

Book Description

Apache Beam is an open source unified programming model to define and execute multiple data processing pipelines, including extract, transform, and load (ETL), batch, and stream processing.

This book will help you to confidently build data processing pipelines with Apache Beam. You'll start with an overview of Apache Beam and understand how to implement basic pipelines using it. The book covers various techniques to load data, perform transformations and store the data. You will also learn how to test and run the pipelines effectively. As you progress, you will explore how to implement your own Domain Specific Language (DSL)and also get to grips with using Euphoria DSL. Later chapters will show you how to query your data using SQL before progressing to run a pipeline using a portable runner. Finally, you will learn advanced Apache Beam concepts such as IO connectors and R.

By the end of this Apache book, you will be able to confidently implement batch and streaming data pipelines using Apache Beam.

What you will learn

Understand the core concepts and architecture of Apache Beam
Perform stateless and stateful transforms to build pipelines
Use state and timers for processing events and time
Implement your own DSL using Join library
Implement SQL to get real-time data to increase productivity and data accessibility
Run a pipeline using a portable runner to implement complex pipelines using Python
Generate the source and modular source using the source API and Splittable DoFn API

Who This Book Is For

This book is for data engineers, data scientists, and data analysts who want to learn how Apache Beam works. Intermediate-level knowledge of the Java programming language is assumed.