This book explains how to implement a data lake strategy covering the technical and business
challenges architects commonly face. It also illustrates how and why client requirements should
drive architectural decisions. Drawing upon a specific case from his own experience author
Nayanjyoti Paul begins with the consideration from which all subsequent decisions should flow:
what does your customer need? He also describes the importance of identifying key stakeholders
and the key points to focus on when starting a new project. Next he takes you through the
business and technical requirement-gathering process and how to translate customer
expectations into tangible technical goals. From there you'll gain insight into the security
model that will allow you to establish security and legal guardrails as well as different
aspects of security from the end user's perspective. You'll learn which organizational roles
need to be onboarded into the data lake their responsibilities the services they need access
to and how the hierarchy of escalations should work. Subsequent chapters explore how to divide
your data lakes into zones organize data for security and access manage data sensitivity and
techniques used for data obfuscation. Audit and logging capabilities in the data lake are also
covered before a deep dive into designing data lakes to handle multiple kinds and file formats
and access patterns. The book concludes by focusing on production operationalization and
solutions to implement a production setup. After completing this book you will understand how
to implement a data lake the best practices to employ while doing so and will be armed with
practical tips to solve business problems. What You Will Learn Understand the challenges
associated with implementing a data lake Explore the architectural patterns and processes used
to design a new data lake Design and implement data lake capabilities Associate business
requirements with technical deliverables to drive success Who This Book Is For Data Scientists
and Architects Machine Learning Engineers and Software Engineers.