Inside ObEN Labs – Multilingual Speech and Singing Technology

Hi everybody, my name is Pierre. I’m a Speech Research Scientist at ObEN. We wanted to introduce to you some of the latest research we made on Text-to-Speech synthesis. So the goal of the speech team is actually to build the voice of your PAI. So that’s a challenging task because everybody’s voice is different and human voice can be highly diverse depending on factors such as gender, age, and languages. So we’ve been developing AI systems which are able to learn from a large data set of speakers how to reproduce the different variabilities in speech so that when we want to build your voice, we actually only need a few recordings of your voice because the system already knows how to produce some voices similar to your voice. So that’s not the only thing because during this adaptation process, we also transfer all the knowledge which has been acquired from the data sent to your PAI so that your PAI doesn’t only speak like you but also can get skills that you learn from the data set, such as speaking in other languages for instance. So here we have an example of this. So we have a sample from Adam’s voice, which only speaks here in Chinese. (Adam’s sample voice speaks in Chinese) We’ve been using a system which has been trained actually on both Chinese and English voices, so that even though Adam’s only provided samples in Chinese, his PAI can speak in both English and Chinese. (Adam’s PAI speaks in both English and Chinese) So that can be useful, for instance, if you want your PAI to discuss with people from other countries or if you want your PAI to read some virtual books or documents including both Chinese and English words. Here we also have other examples in which we improve the voice of Adam’s singing voice, by using recordings of singing from other speakers, so that his PAI can now sing with an improved voice compared to Adam’s voice when it sings. (Adam’s PAI sings in Chinese) So that’s an exciting approach because for instance, in the case where our data set is made from the voice of users from the PAI community, each time we have a user from the PAI community, we provide samples of his voice. Actually, the whole system would be updated according to this skill. And this skill can be transferred to all the PAI for everybody so that everybody’s PAI will improve each time someone provides samples. So that was a brief overview of our last research, but there is a lot more to come, so we’ll keep you updated.

