From news and speeches to informal chatter on social media natural language is one of the
richest and most underutilized sources of data. Not only does it come in a constant stream
always changing and adapting in context it also contains information that is not conveyed by
traditional data sources. The key to unlocking natural language is through the creative
application of text analytics. This practical book presents a data scientist's approach to
building language-aware products with applied machine learning. You'll learn robust repeatable
and scalable techniques for text analysis with Python including contextual and linguistic
feature engineering vectorization classification topic modeling entity resolution graph
analysis and visual steering. By the end of the book you'll be equipped with practical
methods to solve any number of complex real-world problems. Preprocess and vectorize text into
high-dimensional feature representations Perform document classification and topic modeling
Steer the model selection process with visual diagnostics Extract key phrases named entities
and graph structures to reason about data in text Build a dialog framework to enable chatbots
and language-driven interaction Use Spark to scale processing power and neural networks to
scale model complexity