Voice conversion (VC) is a technique for modifying nonlinguistic information such as voice characteristics while keeping linguistic information unchanged. In the traditional VC framework, a conversion model such as a Gaussian mixture model (GMM) is trained in advance using a parallel data set consisting of utterance pairs of source and target voices.
Although this framework works reasonably well, the training process using the parallel data causes many limitations of VC applications. In order to address this problem, I propose two flexible VC frameworks, one-to-many VC and many-to-one VC. One-to-many VC allows the conversion from the source voice to an arbitrary target voice and many-to-one VC allows the conversion vice versa.
An eigenvoice technique, which was originally proposed as a speaker adaptation method for speech recognition, is successfully applied to GMM-based VC for realizing these two frameworks. An eigenvoice GMM (EV-GMM) is trained in advance using multiple parallel data sets, and then the desired conversion model is effectively developed by adapting the EV-GMM to an arbitrary target voice in one-to-many VC or an arbitrary source voice in many-to-one VC. Results of experimental evaluations demonstrate the effectiveness of the proposed VC frameworks.