Data Engineering with AWS

Learn how to design and build cloud-based data transformation pipelines using AWS

By: Gareth Eagar

Write A Review

eText | 29 December 2021 | Edition Number 1

At a Glance

Format
ePUB

eText

$79.19

or 4 interest-free payments of $19.80 with

Instant online reading in your Booktopia eTextbook Library *

Why choose an eTextbook?

Instant Access *

Purchase and read your book immediately

Read Aloud

Listen and follow along as Bookshelf reads to you

Study Tools

Built-in study tools like highlights and more

* eTextbooks are not downloadable to your eReader or an app and can be accessed via web browsers only. You must be connected to the internet and have no technical issues with your device or browser that could prevent the eTextbook from operating.

Become an efficient data engineer with this easy-to-follow hands-on guide to performing different data engineering techniques with AWS tools

Key Features

Get to grips with the common data engineering for building data pipelines on AWS
Explore the different AWS tools to ingest, consume, and transform data and orchestrate pipelines
Learn how to architect and implement data lakes and data lakehouses for big data analytics

Book Description

Knowing how to architect and implement complex data pipelines is a highly sought-after skill. Data engineers are responsible for building these pipelines and transforming data from one format to another so that it can be processed by a data analyst or data scientist to further work on. Amazon Web Services offers a range of tools to ease the job of a data engineer, making it the preferred platform for performing data engineering tasks.

This data engineering book will take you through the services and the skills you need to architect and implement data pipelines on AWS. You'll begin by understanding data engineering concepts and some of the core AWS tools that form a part of the data engineers toolkit. You'll then architect a data pipeline, review raw data sources, identify varied data consumers, and transform raw datasets to meet their needs. The book will show you how to populate data marts or data warehouses and how a data lakehouse fits into the picture. Next, you'll be introduced to some AWS tools for analyzing your data, including tools for ad-hoc SQL queries, and creating data visualizations and dashboards. In the final chapters, you'll perform predictive analytics using Amazon AI and machine learning tools.

By the end of this book, you'll be able to carry out data engineering tasks and implement a complex data pipeline on AWS independently.

What you will learn

Configure an AWS Glue crawler to automatically populate data catalogs
Implement Amazon Kinesis Firehose to ingest streaming data
Optimize and denormalize your dataset with AWS Glue
Run complex SQL queries on data in the data lake using Amazon Athena
Use Redshift Spectrum to join data lake and data warehouse tables
Create a simple visualization and dashboard using AWS QuickSight
Load and index data into Amazon ElasticSearch
Use Amazon Comprehend to get sentiment data from your dataset

Who This Book Is For

The Data Engineering with AWS book is for data analysts and data engineers who are new to AWS and looking to extend their skills to the AWS cloud, as well as anyone who wants to get practical experience with common data engineering services on AWS.

A basic understanding of big data-related topics and knowledge of Python programming will help you to get the most out of this book.

An Introduction to Data Engineering
Data Management Architectures for Analytics
The AWS Data Engineers Toolkit
Avoiding the Data Swamp - Cataloging, Security and Governance
Architecting Data Engineering Pipelines
Ingesting Batch and Streaming Data
Transforming Data to Optimize for Analytics and Create Value for an Organization
Identifying and Enabling Varied Data Consumers
Loading Data into a Data Mart
Orchestrating the Data Pipeline
Ad-Hoc Queries with Amazon Athena
Visualizing Data with Amazon QuickSight
Search with Amazon ElasticSearch and Kibana
Enabling Artificial Intelligence and Machine Learning
Conclusion