
Next-Generation Big Data
A Practical Guide to Apache Kudu, Impala, and Spark
By: Butch Quinto
Paperback | 13 June 2018
At a Glance
584 Pages
25 x 17.5 x 3
Paperback
Limited Stock Available
RRP $89.99
$35.00
61%OFF
or 4 interest-free payments of $8.75 with
orUtilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments with next-generation big data technologies.
Next-Generation Big Data takes a holistic approach, covering the most important aspects of modern enterprise big data. The book covers not only the main technology stack but also the next-generation tools and applications used for big data warehousing, data warehouse optimization, real-time and batch data ingestion and processing, real-time data visualization, big data governance, data wrangling, big data cloud deployments, and distributed in-memory big data computing. Finally, the book has an extensive and detailed coverage of big data case studies from Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard.
What You'll Learn
- Install Apache Kudu, Impala, and Spark to modernize enterprise data warehouse and business intelligence environments, complete with real-world, easy-to-follow examples, and practical advice
- Integrate HBase, Solr, Oracle, SQL Server, MySQL, Flume, Kafka, HDFS, and Amazon S3 with Apache Kudu, Impala, and Spark
- Use StreamSets, Talend, Pentaho, and CDAP for real-time and batch data ingestion and processing
- Utilize Trifacta, Alteryx, and Datameer for data wrangling and interactive data processing
- Turbocharge Spark with Alluxio, a distributed in-memory storage platform
- Deploy big data in the cloud using Cloudera Director
- Perform real-time data visualization and time series analysis using Zoomdata, Apache Kudu, Impala, and Spark
- Understand enterprise big data topics such as big data governance, metadata management, data lineage, impact analysis, and policy enforcement, and how to use Cloudera Navigator to perform common data governance tasks
- Implement big data use cases such as big data warehousing, data warehouse optimization, Internet of Things, real-time data ingestion and analytics, complex event processing, and scalable predictive modeling
- Study real-world big data case studies from innovative companies, including Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard
BI and big data warehouse professionals interested in gaining practical and real-world insight into next-generation big data processing and analytics using Apache Kudu, Impala, and Spark; and those who want to learn more about other advanced enterprise topics
Industry Reviews
* The Current Big Data Landscape
* Big Data in the Enterprise
Chapter 2.- Introduction to Kudu
* Introduction to Kudu
* Kudu History
* Kudu-Impala Integration
* Concepts and Terms
* Architectural Overview
* Use Cases
* Getting Started with Kudu
* Installation Guide
* Configuring Kudu
* Administering Kudu
* Troubleshooting Kudu
* Developing Applications with Kudu
* Kudu Schema Design
* Kudu Security
* Kudu Transaction Semantics
* Background Maintenance Tasks
* Kudu Configuration Reference
* Kudu Command Line Tools Reference
* Known Issues and Limitations
Chapter 3.- Introduction to Impala
* Architecture
* Parquet file format
* The impala shell
* Impala Benchmarks
* Impala SQL
* Impala Functions
o Math Functions
o String Functions
o Date and Time Functions
o Analytics Functions
* Impala User Defined Functions
Chapter 4.- High Performance Data Analysis with Impala and Kudu
* Impala and Kudu Integration
* Impala and Kudu vs Relational Data Warehouse
* Impala and Kudu Schema Design
o Data Types
o Partitioning
* Impala and Kudu Monitoring
* Impala and Kudu Performance Tuning
* Impala and Kudu Troubleshooting
* ODBC/JDBC
o Linked Server from SQL Server
o Oracle DB Link
o BI Applications
o PHP ODBC
o Java JDBC
Chapter 5.- Introduction to Spark
* Introduction
* Introduction to Functional Programming
* Introduction to Scala
* Spark Architecture
* Spark Core
* Spark SQL
* Spark Streaming
* Spark MLlib
* Spark GraphX
Chapter 6.- High-Performance Data Processing with Spark and Kudu
* Kudu and Spark Integration
o Spark and the Kudu context
o Spark and Kudu Examples
* Spark and Kudu in the Enterprise
o CSV, JSON and XML to Kudu
o Oracle and Kudu
o SQL Server and Kudu
o MySQL and Kudu
o HBase and Kudu
o Solr and Kudu
o Parquet and ORC and Kudu
o Amazon S3 and Kudu
o Spark Streaming with Kudu
* Spark and Kudu Monitoring
* Spark and Kudu Performance Tuning
* Spark and Kudu Troubleshooting
Chapter 7.- Batch and Real-time Data Ingestion and Processing
* Introduction to Batch data Ingestion
* Introduction to Real-time Data Ingestion
* StreamSets
* NIFI
* Cask CDAP
* Talend
* Pentaho
* Other Players
o Informatica Power Center
o IBM Data Stage
o SQL Server Integration Services
o Oracle Data Integrator
o Syncsort
o Snaplogic
* Native Tools
o Kafka
o Sqoop
o HDFS file commands
o Spark JDBC
o Kudu Java/C++ API
Chapter 8.- Big Data Warehousing and Business Intelligence
* Introduction to Data Warehousing and Business Intelligence
* Data Warehousing and Business Intelligence in the age of Big Data
* EDW Optimization
o ETL Offloading
o Active Archiving
o Data Consolidation
o ODS Replacement
o Data Mart Replacement
o Data Warehouse Replacement
* Data Warehousing
o Star Schema
o Snow Flake Schema
* Microsoft SQL Server 2016 Integration
o SQL Server Analysis Services
o SQL Server Reporting Services
o SQL Server Integration Services
SQL Server Polybase
SQL Server Linked Server
* Oracle 12c
o Oracle Gateway - show example
o JDBC - show example
* OBIEE - describe
Chapter 9.- Self-Service Big Data Analysis and Wrangling
* Introduction
* Zoomdata
* Tableau
* Qlik
* Power BI
* Datameer
* Trifacta
* Altyrix
* AtScale
* Hue
* Ambari Views
* Data Science Workbench
* Jupyter
* Apache Zeppelin
Chapter 10.- Distributed Big Data In-Memory Computing
* Introduction
* Alluxio
* Ignite
* Geode
* MemSQL
Chapter 11.- Big Data Governance and Management
* Cloudera Navigator
* Apache Atlas
* Informatica Metadata Manager
* Collibra
* Smartlogic
Chapter 12.- Big Data in the Cloud
* Cloudera Director
* AWS
* Azure
* Cloudera Altus
* EMR
* Azure
Chapter 13.- Big Data Use Cases
* Data Warehousing
* ETL Offloading
* Data Consolidation
* Data Archiving
* Internet of Things Platform
* Cybersecurity
* Fraud Detection
* Audit and Reporting Platform
Chapter 14.- Big Data Case Studies
* AMD - Data Warehousing
* British Telecom - Data Consolidation
* Mastercard - Anti-fraud, Advanced Search
* Cerner - Sepsis Detection
* Navistar - IoT
* Shopzilla - ETL Offloading and Data Science
* Caesars Entertainment - Customer 360
* Wargaming - Machine Learning, Recommendation Engine
ISBN: 9781484231463
ISBN-10: 1484231465
Published: 13th June 2018
Format: Paperback
Language: English
Number of Pages: 584
Audience: General Adult
Publisher: Springer Nature B.V.
Country of Publication: GB
Dimensions (cm): 25 x 17.5 x 3
Weight (kg): 1.11
Shipping
| Standard Shipping | Express Shipping | |
|---|---|---|
| Metro postcodes: | $9.99 | $14.95 |
| Regional postcodes: | $9.99 | $14.95 |
| Rural postcodes: | $9.99 | $14.95 |
Orders over $79.00 qualify for free shipping.
How to return your order
At Booktopia, we offer hassle-free returns in accordance with our returns policy. If you wish to return an item, please get in touch with Booktopia Customer Care.
Additional postage charges may be applicable.
Defective items
If there is a problem with any of the items received for your order then the Booktopia Customer Care team is ready to assist you.
For more info please visit our Help Centre.
You Can Find This Book In

Deciphering Data Architectures
Choosing Between a Modern Data Warehouse, Data Fabric, Data Lakehouse, and Data Mesh
Paperback
RRP $152.00
$73.75
OFF

Apache Iceberg: The Definitive Guide
Data Lakehouse Functionality, Performance, and Scalability on the Data Lake
Paperback
RRP $133.00
$64.75
OFF






















