Apply supervised and unsupervised learning to solve practical and real-world big data problems.
This book teaches you how to engineer features optimize hyperparameters train and test models
develop pipelines and automate the machine learning (ML) process. The book covers an in-memory
distributed cluster computing framework known as PySpark machine learning framework platforms
known as scikit-learn PySpark MLlib H2O and XGBoost and a deep learning (DL) framework
known as Keras. The book starts off presenting supervised and unsupervised ML and DL models
and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris
Nokeri considers a parametric model known as the Generalized Linear Model and a survival
regression model known as the Cox Proportional Hazards model along with Accelerated Failure
Time (AFT). Also presented is a binary classification model (logistic regression) and an
ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural
network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster
analysis using the K-Means model is covered. Dimension reduction techniques such as Principal
Components Analysis and Linear Discriminant Analysis are explored. And automated machine
learning is unpacked. This book is for intermediate-level data scientists and machine learning
engineers who want to learn how to apply key big data frameworks and ML and DL frameworks. You
will need prior knowledge of the basics of statistics Python programming probability theories
and predictive analytics. What You Will Learn Understand widespread supervised and unsupervised
learning including key dimension reduction techniques Know the big data analytics layers such
as data visualization advanced statistics predictive analytics machine learning and deep
learning Integrate big data frameworks with a hybrid of machine learning frameworks and deep
learning frameworks Design build test and validate skilled machine models and deep learning
models Optimize model performance using data transformation regularization outlier remedying
hyperparameter optimization and data split ratio alteration Who This Book Is For Data
scientists and machine learning engineers with basic knowledge and understanding of Python
programming probability theories and predictive analytics