Description
Chapter Goal: Provide an overview of Apache SparkNo of pages 15Sub -Topics1. Overview & history2. Spark concepts & architecture3. Spark Unified Stack4. Apache Spark applications
Chapter 2: Working with Apache SparkChapter Goal: Provide details about different ways of interacting with Apache SparkNo of pages: 35Sub - Topics 1. Downloading and Installing Apache Spark2. Exploring Apache Spark using Spark shells3. Exploring Apache Spark using Databricks4. Exploring Apache Spark source code
Chapter 3: Spark SQL - FoundationChapter Goal: Provide an overview to Spark SQL componentNo of pages: 60Sub - Topics 1. Overview & architecture2. Introduction to DataFrames Structured APIs3. Reading & writing data with Spark SQL data sources4. Introduction to datasets
Chapter 4: Spark SQL - AdvanceChapter Goal: Go over the advanced features in Spark SQLNo of pages: 50Sub - Topics: 1. Working with aggregations2. Joining data 3. Working with analytics functions4. Explore Spark SQL catalyst optimizer
Chapter 5: Optimizing Apache Spark ApplicationsChapter Goal: Go over tips and techniques for dealing with performance issues No of pages: 30Sub - Topics: 1. Common performance issues2. Speed up performance by leveraging in-memory computation3. Understand the different support joins in Spark4. Leverage Spark UI to diagnose performance issue
Chapter 6: Structured Streaming - FoundationChapter Goal: Overview of Structured Streaming processing engineNo of pages: 50Sub - Topics: 1. General streaming processing concepts2. Structured Streaming programming model3. Working with streaming data sources and sinks4. Understanding output modes and triggers
Chapter 7: Structured Streaming - AdvancedChapter Goal: Cover complex issues in streaming processingNo of pages: 40Sub - Topics: 1. Streaming processing with event time2. Stateful streaming processing3. Handling duplicate data4. Monitoring streaming processing applications
Chapter 8: Machine Learning with Apache SparkChapter Goal: How to developing Machine Learning applications using Spark MLlibNo of pages: 60Sub - Topics: 1. Machine learning overview2. Taking a tour of supported machine learning algorithms3. Building machine learning pipelines4. Machine learning tasks in action5. Parameters tuning
Chapter 9: Machine Learning Application Development w/ MLflowChapter Goal: Using MLflow to manage the Machine Learning development lifecycle No of pages: 25Sub - Topics: 1. Overview of MLflow2. Tracking machine learning development experiments3. Managing & deploying machine learning models4. Leveraging Spark for batch modeling predictions
Author: Hien Luu
Publisher: Apress
Published: 12/23/2021
Pages: 390
Binding Type: Paperback
Weight: 1.74lbs
Size: 10.00h x 7.00w x 0.93d
ISBN13: 9781484273821
ISBN10: 1484273826
BISAC Categories:
- Computers | Information Theory
- Computers | Artificial Intelligence | General
About the Author
Hien Luu has extensive experience in designing and building big data applications and machine learning infrastructure. He is particularly passionate about the intersection between big data and machine learning. Hien enjoys working with open source software and has contributed to Apache Pig and Azkaban. Teaching is also one of his passions, and he serves as an instructor at the UCSC Silicon Valley Extension school teaching Apache Spark. He has given presentations at various conferences such as Data+AI Summit, MLOps World, QCon SF, QCon London, Hadoop Summit, and JavaOne.