Become an efficient data engineer with this easy-to-follow hands-on guide to performing different data engineering techniques with AWS tools
Key Features
- Get to grips with the common data engineering for building data pipelines on AWS
- Explore the different AWS tools to ingest, consume, and transform data and orchestrate pipelines
- Learn how to architect and implement data lakes and data lakehouses for big data analytics
Book Description
Knowing how to architect and implement complex data pipelines is a highly sought-after skill. Data engineers are responsible for building these pipelines and transforming data from one format to another so that it can be processed by a data analyst or data scientist to further work on. Amazon Web Services offers a range of tools to ease the job of a data engineer, making it the preferred platform for performing data engineering tasks.
This data engineering book will take you through the services and the skills you need to architect and implement data pipelines on AWS. You'll begin by understanding data engineering concepts and some of the core AWS tools that form a part of the data engineers toolkit. You'll then architect a data pipeline, review raw data sources, identify varied data consumers, and transform raw datasets to meet their needs. The book will show you how to populate data marts or data warehouses and how a data lakehouse fits into the picture. Next, you'll be introduced to some AWS tools for analyzing your data, including tools for ad-hoc SQL queries, and creating data visualizations and dashboards. In the final chapters, you'll perform predictive analytics using Amazon AI and machine learning tools.
By the end of this book, you'll be able to carry out data engineering tasks and implement a complex data pipeline on AWS independently.
What you will learn
- Configure an AWS Glue crawler to automatically populate data catalogs
- Implement Amazon Kinesis Firehose to ingest streaming data
- Optimize and denormalize your dataset with AWS Glue
- Run complex SQL queries on data in the data lake using Amazon Athena
- Use Redshift Spectrum to join data lake and data warehouse tables
- Create a simple visualization and dashboard using AWS QuickSight
- Load and index data into Amazon ElasticSearch
- Use Amazon Comprehend to get sentiment data from your dataset
Who This Book Is For
The Data Engineering with AWS book is for data analysts and data engineers who are new to AWS and looking to extend their skills to the AWS cloud, as well as anyone who wants to get practical experience with common data engineering services on AWS.
A basic understanding of big data-related topics and knowledge of Python programming will help you to get the most out of this book.
Table of Contents
- An Introduction to Data Engineering
- Data Management Architectures for Analytics
- The AWS Data Engineers Toolkit
- Avoiding the Data Swamp - Cataloging, Security and Governance
- Architecting Data Engineering Pipelines
- Ingesting Batch and Streaming Data
- Transforming Data to Optimize for Analytics and Create Value for an Organization
- Identifying and Enabling Varied Data Consumers
- Loading Data into a Data Mart
- Orchestrating the Data Pipeline
- Ad-Hoc Queries with Amazon Athena
- Visualizing Data with Amazon QuickSight
- Search with Amazon ElasticSearch and Kibana
- Enabling Artificial Intelligence and Machine Learning
- Conclusion