Speech production has been modeled at the physical level as an accurately- timed choreography performed by interacting articulators in the vocal tract, e.g., the tongue or the lips. Each of them participates in the realization of gestures that possibly overlap in time and are directly responsible for the generation of certain phoneme sequences. This abstract view of a system as the composition of multiple interacting units -each with certain constraints and different behavioral characteristics that may also entrain with one another- has also been adopted in a completely different domain, for the study of human-human dyads. Towards achieving a common goal, each of the participants is assuming a certain role and tries to fulfil the personal subgoals involved. The multimodal behavior of the dyad reflects the realization of these efforts as the participants are constrained by individual personality traits and adapt to the specifics of the interaction at each instant.
Adopting this system-based perspective (as opposed to a phenomenological approach), I will present a range of computational techniques to model and interpret the continuous multimodal observations in the two domains on the basis of the underlying, synchronously or asynchronously interacting processes. I will focus on three major subproblems: inversion and prototypical behavior estimation.
Nassos Katsamanis received the Diploma in electrical and computer engi- neering (with highest honors) and the Ph.D. degree from the National Tech- nical University of Athens, Athens, Greece, in 2003 and 2009 respectively. He is currently a Postdoctoral Research Associate at the School of Electrical Engineering in the University of Southern California, member of the Signal Analysis and Interpretation Laboratory. His current research mainly lies in the areas of speech and multimodal signal analysis and processing aiming at the broader goal of interpretation and modeling of human behavior from audiovisual observations. Further, he is strongly interested and has been conducting research in image, acoustic and articulatory data processing for speech production modeling. In the frame of his Ph.D. studies and Euro- pean and U.S. research projects, he has also worked on multimodal speech inversion, aeroacoustics for articulatory speech synthesis, speaker adapta- tion for non-native and children's speech recognition and multimodal fusion for audiovisual speech and sign language recognition.