Semi-supervised learning is a learning paradigm concerned with the study of how computers and
natural systems such as humans learn in the presence of both labeled and unlabeled data.
Traditionally learning has been studied either in the unsupervised paradigm (e.g. clustering
outlier detection) where all the data are unlabeled or in the supervised paradigm (e.g.
classification regression) where all the data are labeled. The goal of semi-supervised
learning is to understand how combining labeled and unlabeled data may change the learning
behavior and design algorithms that take advantage of such a combination. Semi-supervised
learning is of great interest in machine learning and data mining because it can use readily
available unlabeled data to improve supervised learning tasks when the labeled data are scarce
or expensive. Semi-supervised learning also shows potential as a quantitative tool to
understand human category learning where most of the input is self-evidently unlabeled. In
this introductory book we present some popular semi-supervised learning models including
self-training mixture models co-training and multiview learning graph-based methods and
semi-supervised support vector machines. For each model we discuss its basic mathematical
formulation. The success of semi-supervised learning depends critically on some underlying
assumptions. We emphasize the assumptions made by each model and give counterexamples when
appropriate to demonstrate the limitations of the different models. In addition we discuss
semi-supervised learning for cognitive psychology. Finally we give a computational learning
theoretic perspective on semi-supervised learning and we conclude the book with a brief
discussion of open questions in the field. Table of Contents: Introduction to Statistical
Machine Learning Overview of Semi-Supervised Learning Mixture Models and EM Co-Training
Graph-Based Semi-Supervised Learning Semi-Supervised Support Vector Machines Human
Semi-Supervised Learning Theory and Outlook