A concise introduction to the emerging field of data science explaining its evolution
relation to machine learning current uses data infrastructure issues and ethical challenges.
The goal of data science is to improve decision making through the analysis of data. Today data
science determines the ads we see online the books and movies that are recommended to us
online which emails are filtered into our spam folders and even how much we pay for health
insurance. This volume in the MIT Press Essential Knowledge series offers a concise
introduction to the emerging field of data science explaining its evolution current uses
data infrastructure issues and ethical challenges. It has never been easier for organizations
to gather store and process data. Use of data science is driven by the rise of big data and
social media the development of high-performance computing and the emergence of such powerful
methods for data analysis and modeling as deep learning. Data science encompasses a set of
principles problem definitions algorithms and processes for extracting non-obvious and
useful patterns from large datasets. It is closely related to the fields of data mining and
machine learning but broader in scope. This book offers a brief history of the field
introduces fundamental data concepts and describes the stages in a data science project. It
considers data infrastructure and the challenges posed by integrating data from multiple
sources introduces the basics of machine learning and discusses how to link machine learning
expertise with real-world problems. The book also reviews ethical and legal issues
developments in data regulation and computational approaches to preserving privacy. Finally
it considers the future impact of data science and offers principles for success in data
science projects.