This book provides a complete and modern guide to web scraping using Python as the programming
language without glossing over important details or best practices. Written with a data
science audience in mind the book explores both scraping and the larger context of web
technologies in which it operates to ensure full understanding. The authors recommend web
scraping as a powerful tool for any data scientist's arsenal as many data science projects
start by obtaining an appropriate data set. Starting with a brief overview on scraping and
real-life use cases the authors explore the core concepts of HTTP HTML and CSS to provide a
solid foundation. Along with a quick Python primer they cover Selenium for JavaScript-heavy
sites and web crawling in detail. The book finishes with a recap of best practices and a
collection of examples that bring together everything you've learned and illustrate various
data science use cases. What You'll Learn Leverage well-established best practices and
commonly-used Python packages Handle today's web including JavaScript cookies and common web
scraping mitigation techniques Understand the managerial and legal concerns regarding web
scraping Who This Book is ForA data science oriented audience that is probably already familiar
with Python or another programming language or analytical toolkit (R SAS SPSS etc). Students
or instructors in university courses may also benefit. Readers unfamiliar with Python will
appreciate a quick Python primer in chapter 1 to catch up with the basics and provide pointers
to other guides as well.