Face detection because of its vast array of applications is one of the most active research
areas in computer vision. In this book we review various approaches to face detection
developed in the past decade with more emphasis on boosting-based learning algorithms. We then
present a series of algorithms that are empowered by the statistical view of boosting and the
concept of multiple instance learning. We start by describing a boosting learning framework
that is capable to handle billions of training examples. It differs from traditional
bootstrapping schemes in that no intermediate thresholds need to be set during training yet
the total number of negative examples used for feature selection remains constant and focused
(on the poor performing ones). A multiple instance pruning scheme is then adopted to set the
intermediate thresholds after boosting learning. This algorithm generates detectors that are
both fast and accurate. We then present two multiple instance learning schemes for face
detection multiple instance learning boosting (MILBoost) and winner-take-all multiple category
boosting (WTA-McBoost). MILBoost addresses the uncertainty in accurately pinpointing the
location of the object being detected while WTA-McBoost addresses the uncertainty in
determining the most appropriate subcategory label for multiview object detection. Both schemes
can resolve the ambiguity of the labeling process and reduce outliers during training which
leads to improved detector performances. In many applications a detector trained with generic
data sets may not perform optimally in a new environment. We propose detection adaption which
is a promising solution for this problem. We present an adaptation scheme based on the Taylor
expansion of the boosting learning objective function and we propose to store the second order
statistics of the generic training data for future adaptation. We show that with a small amount
of labeled data in the new environment the detector's performance can be greatly improved. We
also present two interesting applications where boosting learning was applied successfully. The
first application is face verification for filtering and ranking image video search results on
celebrities. We present boosted multi-task learning (MTL) yet another boosting learning
algorithm that extends MILBoost with a graphical model. Since the available number of training
images for each celebrity may be limited learning individual classifiers for each person may
cause overfitting. MTL jointly learns classifiers for multiple people by sharing a few boosting
classifiers in order to avoid overfitting. The second application addresses the need of speaker
detection in conference rooms. The goal is to find who is speaking given a microphone array
and a panoramic video of the room. We show that by combining audio and visual features in a
boosting framework we can determine the speaker's position very accurately. Finally we offer
our thoughts on future directions for face detection. Table of Contents: A Brief Survey of the
Face Detection Literature Cascade-based Real-Time Face Detection Multiple Instance Learning
for Face Detection Detector Adaptation Other Applications Conclusions and Future Work