Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud.
Learn the fundamentals and more of running analytics on large clusters in Azure and AWS
using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your
data at a mere fraction of what classical analytics solutions cost while at the same time
getting the results you need incrementally faster. This book explains how the confluence of
these pivotal technologies gives you enormous power and cheaply when it comes to huge
datasets. You will begin by learning how cloud infrastructure makes it possible to scale your
code to large amounts of processing units without having to pay for the machinery in advance.
From there you will learn how Apache Spark an open source framework can enable all those CPUs
for data analytics use. Finally you will see how services such as Databricks provide the power
of Apache Spark without you having to know anything aboutconfiguring hardware or software. By
removing the need for expensive experts and hardware your resources can instead be allocated
to actually finding business value in the data. This book guides you through some advanced
topics such as analytics in the cloud data lakes data ingestion architecture machine
learning and tools including Apache Spark Apache Hadoop Apache Hive Python and SQL.
Valuable exercises help reinforce what you have learned. What You Will Learn Discover the value
of big data analytics that leverage the power of the cloud Get started with Databricks using
SQL and Python in either Microsoft Azure or AWS Understand the underlying technology and how
the cloud and Apache Spark fit into the bigger picture See how these tools are used in the real
world Run basic analytics including machine learning on billions of rows at a fraction of a
cost or free Who This Book Is For Data engineers data scientists and cloud architects who
want or need to run advanced analytics in the cloud. It is assumed that the reader has data
experience but perhaps minimal exposure to Apache Spark and Azure Databricks. The book is also
recommended for people who want to get started in the analytics field as it provides a strong
foundation.