Get Free Shipping on orders over $79
Databricks ML in Action : Independently manage a Dask cluster and leverage your analytics and data science workflows - Stephanie Rivera

Databricks ML in Action

Independently manage a Dask cluster and leverage your analytics and data science workflows

By: Stephanie Rivera

eText | 9 September 2021 | Edition Number 1

At a Glance

eText


$54.99

or 4 interest-free payments of $13.75 with

 or 

Instant online reading in your Booktopia eTextbook Library *

Why choose an eTextbook?

Instant Access *

Purchase and read your book immediately

Read Aloud

Listen and follow along as Bookshelf reads to you

Study Tools

Built-in study tools like highlights and more

* eTextbooks are not downloadable to your eReader or an app and can be accessed via web browsers only. You must be connected to the internet and have no technical issues with your device or browser that could prevent the eTextbook from operating.

Scale your data using your existing Python APIs and data structures with the help of Dask clusters

Key Features

  • Build and run your ETL pipeline with Dask delayed and analyze your data
  • Translate a scikit-learn workflow to Dask and perform hyperparameter tuning
  • Model a Dask cluster on the cloud for principal providers such as AWS, Azure, and GCP

Book Description

Data scientists and machine learning engineers are used to building prototypes in pandas, NumPy, and scikit-learn but this approach is most likely to fail when the data increases or in production. Machine Learning and Data Analysis with Dask shows you how Dask can help you tackle this challenge by using existing Python APIs and data structures so you don't have to completely rewrite your code or retrain to scale up.

The book starts with an introduction to Dask and covers the fundamentals of distributed computation as well as the advantages and possible disadvantages of using Dask. You'll then discover how to build an extract, transform, and load (ETL) pipeline with Dask delayed and compare its flexibility to multithreading/multiprocessing when working on a single machine. The book further demonstrates how to analyze data with Dask arrays and DataFrames. Later, you'll explore how to distribute Python and R code with Dask and build a machine learning model with Dask-ML. In addition to this, you will understand how to run a parameter search a hundred times faster than on a single machine and then get to grips with the basics of Rapids. Finally, you'll develop Dask clusters on Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).

By the end of this book, you will have learned how to use Dask for both research and production.

What you will learn

  • Distribute computation both locally and on a cluster
  • Scale and analyze machine learning algorithms on a cluster
  • Create and manage clusters on principal cloud providers
  • Explore distributed computation and translate the usual pandas/scikit-learn workflow to Dask for analytics
  • Manage a massive amount of data effectively and keep cloud costs under control
  • Build a machine learning model step-by-step using Dask to process a huge amount of data

Who This Book Is For

This data analysis machine learning book is for data scientists, ML engineers, and Python users who want to distribute their code using Dask. Beginner-level experience with Python, pandas, and NumPy will help you get the best out of this book.

on
Desktop
Tablet
Mobile

More in 3D Graphics & Modelling

Becoming Homo lucidus - Min Ding

eTEXT

Computer Modeling and Simulation : Reference Text - Stanislaw Raczynski

eBOOK