This work focuses the recognition of complex human activities in video data. A combination of
new features and techniques from speech recognition is used to realize a recognition of action
units and their combinations in video sequences. The presented approach shows how motion
information gained from video data can be used to interpret the underlying structural
information of actions and how higher level models allow an abstraction of different motion
categories beyond simple classification.