Machine Learning with Python Cookbook
2nd Edition - Practical Solutions from Preprocessing to Deep Learning
By: Kyle Gallatin, Chris Albon
Paperback | 11 August 2023 | Edition Number 2
At a Glance
380 Pages
Revised
24 x 18.5 x 2
Paperback
RRP $281.25
$116.25
59%OFF
or 4 interest-free payments of $29.06 with
orEach recipe in this updated edition includes code that you can copy, paste, and run with a toy dataset to ensure that it works. From there, you can adapt these recipes according to your use case or application. Recipes include a discussion that explains the solution and provides meaningful context.
Go beyond theory and concepts by learning the nuts and bolts you need to construct working machine learning applications. You'll find recipes for:
- Vectors, matrices, and arrays
- Working with data from CSV, JSON, SQL, databases, cloud storage, and other sources
- Handling numerical and categorical data, text, images, and dates and times
- Dimensionality reduction using feature extraction or feature selection
- Model evaluation and selection
- Linear and logical regression, trees and forests, and k-nearest neighbors
- Supporting vector machines (SVM), naäve Bayes, clustering, and tree-based models
- Saving, loading, and serving trained models from multiple frameworks
Kyle Gallatin is a software engineer for machine learning infrastructure with years of experience as a data analyst, data scientist and machine learning engineer. He is also a professional data science mentor, volunteer computer science teacher and frequently publishes articles at the intersection of software engineering and machine learning. Currently, Kyle is a software engineer on the machine learning platform team at Etsy. Chris Albon is the Director of Machine Learning at the Wikimedia Foundation, the non-profit that hosts Wikipedia.
- Preface
- Conventions Used in This Book
- Using Code Examples
- O Reilly Online Learning
- How to Contact Us
- Acknowledgments
- 1. Working with Vectors, Matrices, and Arrays in NumPy
- 1.0. Introduction
- 1.1. Creating a Vector
- 1.2. Creating a Matrix
- 1.3. Creating a Sparse Matrix
- 1.4. Preallocating NumPy Arrays
- 1.5. Selecting Elements
- 1.6. Describing a Matrix
- 1.7. Applying Functions over Each Element
- 1.8. Finding the Maximum and Minimum Values
- 1.9. Calculating the Average, Variance, and Standard Deviation
- 1.10. Reshaping Arrays
- 1.11. Transposing a Vector or Matrix
- 1.12. Flattening a Matrix
- 1.13. Finding the Rank of a Matrix
- 1.14. Getting the Diagonal of a Matrix
- 1.15. Calculating the Trace of a Matrix
- 1.16. Calculating Dot Products
- 1.17. Adding and Subtracting Matrices
- 1.18. Multiplying Matrices
- 1.19. Inverting a Matrix
- 1.20. Generating Random Values
- 2. Loading Data
- 2.0. Introduction
- 2.1. Loading a Sample Dataset
- 2.2. Creating a Simulated Dataset
- 2.3. Loading a CSV File
- 2.4. Loading an Excel File
- 2.5. Loading a JSON File
- 2.6. Loading a Parquet File
- 2.7. Loading an Avro File
- 2.8. Querying a SQLite Database
- 2.9. Querying a Remote SQL Database
- 2.10. Loading Data from a Google Sheet
- 2.11. Loading Data from an S3 Bucket
- 2.12. Loading Unstructured Data
- 3. Data Wrangling
- 3.0. Introduction
- 3.1. Creating a Dataframe
- 3.2. Getting Information about the Data
- 3.3. Slicing DataFrames
- 3.4. Selecting Rows Based on Conditionals
- 3.5. Sorting Values
- 3.6. Replacing Values
- 3.7. Renaming Columns
- 3.8. Finding the Minimum, Maximum, Sum, Average, and Count
- 3.9. Finding Unique Values
- 3.10. Handling Missing Values
- 3.11. Deleting a Column
- 3.12. Deleting a Row
- 3.13. Dropping Duplicate Rows
- 3.14. Grouping Rows by Values
- 3.15. Grouping Rows by Time
- 3.16. Aggregating Operations and Statistics
- 3.17. Looping over a Column
- 3.18. Applying a Function over All Elements in a Column
- 3.19. Applying a Function to Groups
- 3.20. Concatenating DataFrames
- 3.21. Merging DataFrames
- 4. Handling Numerical Data
- 4.0. Introduction
- 4.1. Rescaling a Feature
- 4.2. Standardizing a Feature
- 4.3. Normalizing Observations
- 4.4. Generating Polynomial and Interaction Features
- 4.5. Transforming Features
- 4.6. Detecting Outliers
- 4.7. Handling Outliers
- 4.8. Discretizating Features
- 4.9. Grouping Observations Using Clustering
- 4.10. Deleting Observations with Missing Values
- 4.11. Imputing Missing Values
- 5. Handling Categorical Data
- 5.0. Introduction
- 5.1. Encoding Nominal Categorical Features
- 5.2. Encoding Ordinal Categorical Features
- 5.3. Encoding Dictionaries of Features
- 5.4. Imputing Missing Class Values
- 5.5. Handling Imbalanced Classes
- 6. Handling Text
- 6.0. Introduction
- 6.1. Cleaning Text
- 6.2. Parsing and Cleaning HTML
- 6.3. Removing Punctuation
- 6.4. Tokenizing Text
- 6.5. Removing Stop Words
- 6.6. Stemming Words
- 6.7. Tagging Parts of Speech
- 6.8. Performing Named-Entity Recognition
- 6.9. Encoding Text as a Bag of Words
- 6.10. Weighting Word Importance
- 6.11. Using Text Vectors to Calculate Text Similarity in a Search Query
- 6.12. Using a Sentiment Analysis Classifier
- 7. Handling Dates and Times
- 7.0. Introduction
- 7.1. Converting Strings to Dates
- 7.2. Handling Time Zones
- 7.3. Selecting Dates and Times
- 7.4. Breaking Up Date Data into Multiple Features
- 7.5. Calculating the Difference Between Dates
- 7.6. Encoding Days of the Week
- 7.7. Creating a Lagged Feature
- 7.8. Using Rolling Time Windows
- 7.9. Handling Missing Data in Time Series
- 8. Handling Images
- 8.0. Introduction
- 8.1. Loading Images
- 8.2. Saving Images
- 8.3. Resizing Images
- 8.4. Cropping Images
- 8.5. Blurring Images
- 8.6. Sharpening Images
- 8.7. Enhancing Contrast
- 8.8. Isolating Colors
- 8.9. Binarizing Images
- 8.10. Removing Backgrounds
- 8.11. Detecting Edges
- 8.12. Detecting Corners
- 8.13. Creating Features for Machine Learning
- 8.14. Encoding Color Histograms as Features
- 8.15. Using Pretrained Embeddings as Features
- 8.16. Detecting Objects with OpenCV
- 8.17. Classifying Images with Pytorch
- 9. Dimensionality Reduction Using Feature Extraction
- 9.0. Introduction
- 9.1. Reducing Features Using Principal Components
- 9.2. Reducing Features When Data Is Linearly Inseparable
- 9.3. Reducing Features by Maximizing Class Separability
- 9.4. Reducing Features Using Matrix Factorization
- 9.5. Reducing Features on Sparse Data
- 10. Dimensionality Reduction Using Feature Selection
- 10.0. Introduction
- 10.1. Thresholding Numerical Feature Variance
- 10.2. Thresholding Binary Feature Variance
- 10.3. Handling Highly Correlated Features
- 10.4. Removing Irrelevant Features for Classification
- 10.5. Recursively Eliminating Features
- 11. Model Evaluation
- 11.0. Introduction
- 11.1. Cross-Validating Models
- 11.2. Creating a Baseline Regression Model
- 11.3. Creating a Baseline Classification Model
- 11.4. Evaluating Binary Classifier Predictions
- 11.5. Evaluating Binary Classifier Thresholds
- 11.6. Evaluating Multiclass Classifier Predictions
- 11.7. Visualizing a Classifiers Performance
- 11.8. Evaluating Regression Models
- 11.9. Evaluating Clustering Models
- 11.10. Creating a Custom Evaluation Metric
- 11.11. Visualizing the Effect of Training Set Size
- 11.12. Creating a Text Report of Evaluation Metrics
- 11.13. Visualizing the Effect of Hyperparameter Values
- 12. Model Selection
- 12.0. Introduction
- 12.1. Selecting the Best Models Using Exhaustive Search
- 12.2. Selecting the Best Models Using Randomized Search
- 12.3. Selecting the Best Models from Multiple Learning Algorithms
- 12.4. Selecting the Best Models When Preprocessing
- 12.5. Speeding Up Model Selection with Parallelization
- 12.6. Speeding Up Model Selection Using Algorithm-Specific Methods
- 12.7. Evaluating Performance After Model Selection
- 13. Linear Regression
- 13.0. Introduction
- 13.1. Fitting a Line
- 13.2. Handling Interactive Effects
- 13.3. Fitting a Nonlinear Relationship
- 13.4. Reducing Variance with Regularization
- 13.5. Reducing Features with Lasso Regression
- 14. Trees and Forests
- 14.0. Introduction
- 14.1. Training a Decision Tree Classifier
- 14.2. Training a Decision Tree Regressor
- 14.3. Visualizing a Decision Tree Model
- 14.4. Training a Random Forest Classifier
- 14.5. Training a Random Forest Regressor
- 14.6. Evaluating Random Forests with Out-of-Bag Errors
- 14.7. Identifying Important Features in Random Forests
- 14.8. Selecting Important Features in Random Forests
- 14.9. Handling Imbalanced Classes
- 14.10. Controlling Tree Size
- 14.11. Improving Performance Through Boosting
- 14.12. Training an XGBoost Model
- 14.13. Improving Real-Time Performance with LightGBM
- 15. K-Nearest Neighbors
- 15.0. Introduction
- 15.1. Finding an Observations Nearest Neighbors
- 15.2. Creating a K-Nearest Neighbors Classifier
- 15.3. Identifying the Best Neighborhood Size
- 15.4. Creating a Radius-Based Nearest Neighbors Classifier
- 15.5. Finding Approximate Nearest Neighbors
- 15.6. Evaluating Approximate Nearest Neighbors
- 16. Logistic Regression
- 16.0. Introduction
- 16.1. Training a Binary Classifier
- 16.2. Training a Multiclass Classifier
- 16.3. Reducing Variance Through Regularization
- 16.4. Training a Classifier on Very Large Data
- 16.5. Handling Imbalanced Classes
- 17. Support Vector Machines
- 17.0. Introduction
- 17.1. Training a Linear Classifier
- 17.2. Handling Linearly Inseparable Classes Using Kernels
- 17.3. Creating Predicted Probabilities
- 17.4. Identifying Support Vectors
- 17.5. Handling Imbalanced Classes
- 18. Naive Bayes
- 18.0. Introduction
- 18.1. Training a Classifier for Continuous Features
- 18.2. Training a Classifier for Discrete and Count Features
- 18.3. Training a Naive Bayes Classifier for Binary Features
- 18.4. Calibrating Predicted Probabilities
- 19. Clustering
- 19.0. Introduction
- 19.1. Clustering Using K-Means
- 19.2. Speeding Up K-Means Clustering
- 19.3. Clustering Using Mean Shift
- 19.4. Clustering Using DBSCAN
- 19.5. Clustering Using Hierarchical Merging
- 20. Tensors with PyTorch
- 20.0. Introduction
- 20.1. Creating a Tensor
- 20.2. Creating a Tensor from NumPy
- 20.3. Creating a Sparse Tensor
- 20.4. Selecting Elements in a Tensor
- 20.5. Describing a Tensor
- 20.6. Applying Operations to Elements
- 20.7. Finding the Maximum and Minimum Values
- 20.8. Reshaping Tensors
- 20.9. Transposing a Tensor
- 20.10. Flattening a Tensor
- 20.11. Calculating Dot Products
- 20.12. Multiplying Tensors
- 21. Neural Networks
- 21.0. Introduction
- 21.1. Using Autograd with PyTorch
- 21.2. Preprocessing Data for Neural Networks
- 21.3. Designing a Neural Network
- 21.4. Training a Binary Classifier
- 21.5. Training a Multiclass Classifier
- 21.6. Training a Regressor
- 21.7. Making Predictions
- 21.8. Visualize Training History
- 21.9. Reducing Overfitting with Weight Regularization
- 21.10. Reducing Overfitting with Early Stopping
- 21.11. Reducing Overfitting with Dropout
- 21.12. Saving Model Training Progress
- 21.13. Tuning Neural Networks
- 21.14. Visualizing Neural Networks
- 22. Neural Networks for Unstructured Data
- 22.0. Introduction
- 22.1. Training a Neural Network for Image Classification
- 22.2. Training a Neural Network for Text Classification
- 22.3. Fine-Tuning a Pretrained Model for Image Classification
- 22.4. Fine-Tuning a Pretrained Model for Text Classification
- 23. Saving, Loading, and Serving Trained Models
- 23.0. Introduction
- 23.1. Saving and Loading a scikit-learn Model
- 23.2. Saving and Loading a TensorFlow Model
- 23.3. Saving and Loading a PyTorch Model
- 23.4. Serving scikit-learn Models
- 23.5. Serving TensorFlow Models
- 23.6. Serving PyTorch Models in Seldon
- Index
ISBN: 9781098135720
ISBN-10: 1098135725
Published: 11th August 2023
Format: Paperback
Language: English
Number of Pages: 380
Audience: General Adult
Publisher: O'Reilly Media, Inc, USA
Country of Publication: US
Edition Number: 2
Edition Type: Revised
Dimensions (cm): 24 x 18.5 x 2
Weight (kg): 0.75
Shipping
Standard Shipping | Express Shipping | |
---|---|---|
Metro postcodes: | $9.99 | $14.95 |
Regional postcodes: | $9.99 | $14.95 |
Rural postcodes: | $9.99 | $14.95 |
How to return your order
At Booktopia, we offer hassle-free returns in accordance with our returns policy. If you wish to return an item, please get in touch with Booktopia Customer Care.
Additional postage charges may be applicable.
Defective items
If there is a problem with any of the items received for your order then the Booktopia Customer Care team is ready to assist you.
For more info please visit our Help Centre.
You Can Find This Book In
Machine Learning with Python Cookbook
2nd Edition - Practical Solutions from Preprocessing to Deep Learning
Paperback
RRP $152.00
$61.95
OFF
Architecting Data and Machine Learning Platforms
Enable Analytics and Ai-Driven Innovation in the Cloud
Paperback
RRP $125.50
$56.50
OFF
Graph-Powered Analytics and Machine Learning with TigerGraph
Driving Business Outcomes with Connected Data
Paperback
RRP $125.50
$56.25
OFF