Christian Vogler: Research

I am interested in recognition of gestures and sign languages, as well as tracking the human face and recognizing facial expressions.

American Sign Language recognition

ASL recognition in action

For my Ph.D. research I have worked on a framework for automatic recognition of American Sign Language (ASL). In many ways, this research area is similar to speech recognition (or speech-to-text) systems, in that the ultimate goal is to have a system that automatically converts ASL utterances from a signer to a textual representation.

Still, ASL recognition is much harder than speech recognition, because of the modeling and computational complexity of the task. Much of the complexity stems from the simultaneous events in signed languages: Often multiple things happen at the same time during the execution of a sign. For instance, both the left and the right hand of the signer can move simultaneously, or the handshape can change at the same time as the hand moves from one location to another. In contrast, work on speech recognition has generally, on an abstract level, been able to represent speech in a purely sequential manner; such as a sequence of sounds.

My work focuses on modeling the language in a manner that tackles the complexity problems, and to develop appropriate recognition algorithms. In a nutshell, this approach consists of:

  1. Capturing ASL data with a MotionStar system and a Cyberglove.
  2. Breaking down the signs into their constituent phonemes - the basic building blocks of the language, such as the handshapes, types of hand movements, and body locations at which signs are executed.
  3. Assuming that simultaneous events take place in multiple channels that are independent from one another, so as to avoid a combinatorial explosion of simultaneous events.
  4. Training parallel hidden Markov models, one set per channel, and extending the classical Viterbi decoding algorithm to combine information from multiple channels.

For more information, see my publications.

Deformable model tracking

Face tracking in action The deformable model

More recently, I have also become interested in face tracking for the purposes of facial expression recognition. For instance, the face does not only reflect a person's affect and emotions, but also constitutes a large part of the grammar in sign languages. For instance, negation and questions are both expressed through the signer's face.

Before it is possible to run facial recognition algorithms, however, it is necessary to condense the information about the subject's face into a small set of features. Raw 2D images or videos of the subject's head contain far too much noisy information to serve as features. In addition, 2D images are not invariant with respect to scaling, rotation, and translation.

We use a 3D deformable model approach to track the face from video. The tracking results in a parameter vector describing the orientation and translation of the face, as well as various facial deformations, such as eyebrow raising, jaw opening, lip curving, and so on.

This project is being developed in close collaboration with Siome Goldenstein at UNICAMP, Brazil. For more information, see the project page, and the publications.