Build an enterprise search engine using Apache Solr: index and search documents ingest data
from varied sources apply various text processing techniques utilize different search
capabilities and customize Solr to retrieve the desired results. Apache Solr: A Practical
Approach to Enterprise Search explains each essential concept-backed by practical and industry
examples--to help you attain expert-level knowledge. The book which assumes a basic knowledge
of Java starts with an introduction to Solr followed by steps to setting it up indexing your
first set of documents and searching them. It then introduces you to information retrieval and
its implementation in Apache Solr this will help you understand your search problem decide
the approach to build an effective solution and use various metrics to evaluate the results.
The book next covers the schema design and techniques to build a text analysis chain for
cleansing normalizing and enriching your documents and addressing different types of search
queries. It describes various popular matching techniques which are generally applied to
improve the precision and recall of searches. You will learn the end-to-end process of data
ingestion from varied sources metadata extraction pre-processing and transformation of
content various search components query parsers and other advanced search capabilities. After
covering out-of-the-box features Solr expert Dikshant Shahi dives into ways you can customize
Solr for your business and its specific requirements along with ways to plug in your own
components. Most important you will learn about implementations for Solr scoring factors
affecting the document score and tuning the score for the application at hand. The book
explains why textual scoring is not sufficient for practical ranking of documents and ways to
integrate real-world factors for contributing to the document ranking. You'll see how to
influence user experience by providing suggestions and recommendations. You'll also see
integration of Solr with important related technologies such as OpenNLP and Tika. Additionally
you will learn about scaling Solr using SolrCloud. This book concludes with coverage of
semantic search capabilities which is crucial for taking the search experience to the next
level. By the end of Apache Solr you will be proficient in designing and developing your
search engine.