Lecture
Date: 04 May 2012 Time: 14:00-16:00
Location: "Stelios Orphanoudakis" Seminar Room, FORTH. Heraklion, Crete.
Host: Athanasios Mouchtaris
Abstract:
Speech production has been modeled at the physical level as an accurately-
timed choreography performed by interacting articulators in the vocal tract, e.g.,
the tongue or the lips. Each of them participates in the realization of gestures
that possibly overlap in time and are directly responsible for the generation of
certain phoneme sequences. This abstract view of a system as the composition of
multiple interacting units -each with certain constraints and different behavioral
characteristics that may also entrain with one another- has also been adopted in
a completely different domain, for the study of human-human dyads. Towards
achieving a common goal, each of the participants is assuming a certain role and
tries to fulfil the personal subgoals involved. The multimodal behavior of the
dyad reflects the realization of these efforts as the participants are constrained
by individual personality traits and adapt to the specifics of the interaction at
each instant.
Adopting this system-based perspective (as opposed to a phenomenological
approach), I will present a range of computational techniques to model and
interpret the continuous multimodal observations in the two domains on the
basis of the underlying, synchronously or asynchronously interacting processes.
I will focus on three major subproblems: inversion and prototypical behavior
estimation.
Bio:
Nassos Katsamanis received the Diploma in electrical and computer engi- neering (with highest honors) and the Ph.D. degree from the National Tech- nical University of Athens, Athens, Greece, in 2003 and 2009 respectively. He is currently a Postdoctoral Research Associate at the School of Electrical Engineering in the University of Southern California, member of the Signal Analysis and Interpretation Laboratory. His current research mainly lies in the areas of speech and multimodal signal analysis and processing aiming at the broader goal of interpretation and modeling of human behavior from audiovisual observations. Further, he is strongly interested and has been conducting research in image, acoustic and articulatory data processing for speech production modeling. In the frame of his Ph.D. studies and Euro- pean and U.S. research projects, he has also worked on multimodal speech inversion, aeroacoustics for articulatory speech synthesis, speaker adapta- tion for non-native and children's speech recognition and multimodal fusion for audiovisual speech and sign language recognition.





