Leverage machine and deep learning models to build applications on real-time data using
PySpark. This book is perfect for those who want to learn to use this language to perform
exploratory data analysis and solve an array of business challenges. You'll start by reviewing
PySpark fundamentals such as Spark's core architecture and see how to use PySpark for big
data processing like data ingestion cleaning and transformations techniques. This is followed
by building workflows for analyzing streaming data using PySpark and a comparison of various
streaming platforms. You'll then see how to schedule different spark jobs using Airflow with
PySpark and book examine tuning machine and deep learning models for real-time predictions.
This book concludes with a discussion on graph frames and performing network analysis using
graph algorithms in PySpark. All the code presented in the book will be available in Python
scripts on Github.What You'll Learn Develop pipelines for streaming data processing using
PySpark Build Machine Learning & Deep Learning models using PySpark latest offerings Use graph
analytics using PySpark Create Sequence Embeddings from Text data Who This Book is For Data
Scientists machine learning and deep learning engineers who want to learn and use PySpark for
real time analysis on streaming data.