Build end-to-end and reproducible feature engineering pipelines that can be deployed into production using open source Python libraries
Key Features
- Build end-to-end feature engineering pipelines that are performant and reproducible
- Learn and implement feature engineering practices
- Reinforce your learning through multiple hands-on recipes
Book Description
Feature engineering, the process of transforming variables and creating features, can be a tedious, time-consuming process, but it has to be done to ensure your machine learning models are performant.
Python Feature Engineering Cookbook, Second Edition will take the pain out of feature engineering by showing you how to use open source Python libraries to accelerate the process via a plethora of practical, hands-on recipes.
This updated, practical book begins by addressing fundamental data challenges such as missing data and encoding categorical values, then moves on to strategies for dealing with skewed distributions and outliers, and lastly, shows you how to develop new features from various types of data. Using numerous open source Python libraries, you'll learn how to implement each feature engineering method in a performant, reproducible, and elegant manner. The final chapter of the book ties everything together and shows you how to build end-to-end feature engineering pipelines using the different Python libraries that have been covered.
By the end of this book, you will have all the tools and expertise you need to build end-to-end and reproducible feature engineering pipelines that can be deployed into production.
What you will learn
- Impute missing data using KNN imputation
- Encode categorical variables with one-hot encoding
- Transform, discretize, and scale your variables
- Create variables from date and time with pandas and Feature-engine
- Combine variables into new features
- Extract features from transactional data with Featuretools
- Use tsfresh to create features from time series data
- Create end-to-end feature engineering pipelines that can be deployed
Who This Book Is For
This book is for machine learning and data science students and professionals, as well as software engineers working on machine learning model deployment, who want to learn more about how to transform their data and create new features to train better machine learning models.
Table of Contents
- Imputing Missing Data
- Encoding Categorical Variables
- Transforming Numerical Variables
- Performing Variable Discretization
- Working with Outliers
- Extracting Features from Date and Time variables
- Performing feature scaling
- Creating new features from existing data
- Extracting features from relational data with Featuretools
- Creating features from time series with tsfresh
- Extracting Features from Text Variables