Speech Processing for Voice Conversion & Intelligibility Enhancement
Speaker: Elizabeth Godoy
Date: 08 August 2013 Time: 11:00 - 12:30
Location: "Alkiviades C. Payatakes" Seminar Room, FORTH, Heraklion, Crete
Host: Athanasios Mouchtaris


With growing numbers of applications using speech technologies, speech processing that aims to make these technologies more effective is in high demand. The following presentation addresses two such areas of speech processing, focusing mainly on Voice Conversion (VC) and then briefly discussing intelligibility enhancement. First, in an effort to help personalize speech technologies, VC aims to transform the speech of a source speaker towards that of a different target speaker. While standard approaches to spectral envelope transformation for VC use Gaussian Mixture Models (GMM) that exploit joint statistics of time-aligned source and target frames, the resulting speech suffers from ``over-smoothing" and sounds ``muffled." An alternative to GMM-based spectral envelope transformation is presented here that maps source and target features on an acoustic class level, incidentally reducing existent restrictions on using parallel corpora in VC. Specifically, the proposed Dynamic Frequency Warping with Amplitude scaling (DFWA) approach is described and shown to outperform the GMM-based standard, ultimately yielding high-quality VC. Second, in order to increase the effectiveness of speech technologies for listeners in adverse environments, speech processing for intelligibility enhancement is instrumental. To this end, the approach adopted here is to use acoustic analyses from intelligible human speaking styles to inspire enhancement modifications. In particular, the spectral boosting and vowel space expansion respectively observed in Lombard and Clear speech are mimicked via signal processing techniques, specifically, shaping filters and frequency warping. Evaluations then indicate the intelligibility impact of the proposed modifications, as well as offer insights on the speaking styles and their respective acoustic characteristics.


Dr. Godoy's background is in signal processing, with an emphasis on speech and a passion for conducting research in this field. She received her B.S. (2006) and M.Eng (2007) degrees in electrical engineering from the Massachusetts Institute of Technology. She received her Ph.D. degree in signal processing and communications from Telecom Bretagne in 2011. Her doctoral research focused on spectral envelope transformation for voice conversion and was carried out as part of the speech synthesis team at Orange Labs R& D in Lannion, France. From 2012-2013, Dr. Godoy has worked on speech analyses and modifications for intelligibility enhancement as a member of the EU Listening Talker (LISTA) project at the Foundation for Research and Technology Hellas-Institute of Computer Science (FORTH-ICS) in Crete, Greece. Her current research interests are in signal processing for speech analysis, synthesis and transformation.

Conditions of Use | Privacy Policy