TechNewSources: Google’s DeepMind AI fakes some of the most realistic human voices yet

Monday, September 12, 2016

Google’s DeepMind AI fakes some of the most realistic human voices yet

#Google ’s #DeepMind artificial intelligence has produced what could be some of the most realistic-sounding machine speech yet. WaveNet, as the system is called, generates voices by sampling real human speech and directly modeling audio waveforms based on it, as well as its previously generated audio. In Google’s tests, both English and Mandarin Chinese listeners found #WaveNet more realistic than other types of text-to-speech programs, although it was less convincing than actual human speech. If that weren’t enough, it can also play the piano rather well. Text-to-speech programs are increasingly important for computing, as people begin to rely on bots and AI personal assistants like #Apple ’s #Siri, Microsoft’s #Cortana, #Amazon ’s #Alexa, and the #Google Assistant. If you ask Siri or Cortana a question, though, they’ll reply with actual recordings of a human voice, rearranged and combined in small pieces. This is called concatenative text to speech, and as one expert puts it, it’s a little like a ransom note. The results are often fairly realistic, but as Google writes, producing a new audio persona or tone of voice requires having an actor record every possible sound in a database. Here’s one phrase, created by Google.

http://www.theverge.com/2016/9/9/12860866/google-deepmind-wavenet-ai-text-to-speech-synthesis