As online information grows dramatically search engines such as Google are playing a more and
more important role in our lives. Critical to all search engines is the problem of designing an
effective retrieval model that can rank documents accurately for a given query. This has been a
central research problem in information retrieval for several decades. In the past ten years a
new generation of retrieval models often referred to as statistical language models has been
successfully applied to solve many different information retrieval problems. Compared with the
traditional models such as the vector space model these new models have a more sound
statistical foundation and can leverage statistical estimation to optimize retrieval
parameters. They can also be more easily adapted to model non-traditional and complex retrieval
problems. Empirically they tend to achieve comparable or better performance than a traditional
model with less effort on parameter tuning. This book systematically reviews the large body of
literature on applying statistical language models to information retrieval with an emphasis on
the underlying principles empirically effective language models and language models developed
for non-traditional retrieval tasks. All the relevant literature has been synthesized to make
it easy for a reader to digest the research progress achieved so far and see the frontier of
research in this area. The book also offers practitioners an informative introduction to a set
of practically useful language models that can effectively solve a variety of retrieval
problems. No prior knowledge about information retrieval is required but some basic knowledge
about probability and statistics would be useful for fully digesting all the details. Table of
Contents: Introduction Overview of Information Retrieval Models Simple Query Likelihood
Retrieval Model Complex Query Likelihood Model Probabilistic Distance Retrieval Model
Language Models for Special Retrieval Tasks Language Models for Latent Topic Analysis
Conclusions