This talk will be structured in two parts. In the first half I will provide an overview of the activities in my group. The main emphasis will be placed on recent work conducted as part of FP6 EU projects, in particular integrated project CHIL - "Computers in the Human Interaction Loop". CHIL is a technology driven project that aims to develop robust audio-visual perception technologies of human interaction during meetings and lectures inside smart rooms.
The second part of the talk will delve more deeply into a specific class of audio-visual perceptual technologies, namely the problem of audio-visual speech processing with emphasis on automatic bimodal speech recognition. This line of work aims to exploit visual speeech information to improve speech recognition robustness in noisy environments, in a process akin to human lipreading. I will discuss in detail my work in this field, with emphasis on visual feature extraction in realistic environments and ongoing research in the area of audio-visual fusion.