In its early years the field of computer vision was largely motivated by researchers seeking
computational models of biological vision and solutions to practical problems in manufacturing
defense and medicine. For the past two decades or so there has been an increasing interest in
computer vision as an input modality in the context of human-computer interaction. Such
vision-based interaction can endow interactive systems with visual capabilities similar to
those important to human-human interaction in order to perceive non-verbal cues and
incorporate this information in applications such as interactive gaming visualization art
installations intelligent agent interaction and various kinds of command and control tasks.
Enabling this kind of rich visual and multimodal interaction requires interactive-time
solutions to problems such as detecting and recognizing faces and facial expressions
determining a person's direction of gaze and focus of attention tracking movement of the body
and recognizing various kinds of gestures. In building technologies for vision-based
interaction there are choices to be made as to the range of possible sensors employed (e.g.
single camera stereo rig depth camera) the precision and granularity of the desired outputs
the mobility of the solution usability issues etc. Practical considerations dictate that
there is not a one-size-fits-all solution to the variety of interaction scenarios however
there are principles and methodological approaches common to a wide range of problems in the
domain. While new sensors such as the Microsoft Kinect are having a major influence on the
research and practice of vision-based interaction in various settings they are just a starting
point for continued progress in the area. In this book we discuss the landscape of history
opportunities and challenges in this area of vision-based interaction we review the
state-of-the-art and seminal works in detecting and recognizing the human body and its
components we explore both static and dynamic approaches to looking at people vision problems
and we place the computer vision work in the context of other modalities and multimodal
applications. Readers should gain a thorough understanding of current and future possibilities
of computer vision technologies in the context of human-computer interaction.