In this book the authors first address the research issues by providing a motivating scenario
followed by the exploration of the principles and techniques of the challenging topics. Then
they solve the raised research issues by developing a series of methodologies. More
specifically the authors study the query optimization and tackle the query performance
prediction for knowledge retrieval. They also handle unstructured data processing data
clustering for knowledge extraction. To optimize the queries issued through interfaces against
knowledge bases the authors propose a cache-based optimization layer between consumers and the
querying interface to facilitate the querying and solve the latency issue. The cache depends on
a novel learning method that considers the querying patterns from individual's historical
queries without having knowledge of the backing systems of the knowledge base. To predict the
query performance for appropriate query scheduling the authors examine the queries' structural
and syntactical features and apply multiple widely adopted prediction models. Their feature
modelling approach eschews the knowledge requirement on both the querying languages and
system.To extract knowledge from unstructured Web sources the authors examine two kinds of Web
sources containing unstructured data: the source code from Web repositories and the posts in
programming question-answering communities. They use natural language processing techniques to
pre-process the source codes and obtain the natural language elements. Then they apply
traditional knowledge extraction techniques to extract knowledge. For the data from programming
question-answering communities the authors make the attempt towards building programming
knowledge base by starting with paraphrase identification problems and develop novel features
to accurately identify duplicate posts. For domain specific knowledge extraction the authors
propose to use a clustering technique to separate knowledge into different groups. They focus
on developing a new clustering algorithm that uses manifold constraints in the optimization
task and achieves fast and accurate performance.For each model and approach presented in this
dissertation the authors have conducted extensive experiments to evaluate it using either
public dataset or synthetic data they generated.