In the recent past, the DeepMind unit at Google has been burning the midnight oil in search of super intelligent computers. Now, it has announced the creation of technology that it says rivals existing technology. The artificial intelligence has come up with machine-generated speech called WaveNet and that commentator’s say will be the world’s most realistic. The speech-creation technology samples real human voices and then generates voices based on these. It models speech waveforms that sound realistic.
DeepMind is an outfit that is based in the UK and that Google paid around 400 million pounds ($533 million) to acquire. The outfit WaveNet, is an artificial intelligence technology that mimics human voices as accurately as never before. On a blog published on Friday, the company said that the technology examines the sound waves created by human speech. The company conducted blind tests for Mandarin Chinese and U.S. English. Human listeners reported a resulting speech recognition that was more impressive that any of Google’s text-to-speech programs because they operate on different technologies.
Computing benefits text-to-speech programs
Computing has increasingly benefited from text-to-speech programs. Today, more people rely on robot and digital personal assistants such as Microsoft’s Cortana, Apple’s Siri, Google Assistant and Amazon’s Alexa. If you ask Cortana and Siri a question, these tools respond with actual human voice recordings albeit rearranged and merged in small bits. One digital expert calls this technology concatenative and explains that it is like a ransom letter. While the results are quite realistic, Google states that there needs to be a voice actor recording all possible sounds so that there can be a new tone of voice and audio persona.
Just like many computer speech recognition programs, WaveNet uses large data sets of recording by one human speaker. To create new words, technology combines various speech fragments. Anyone listening hears intelligible human-sounding speech. The challenge is that you cannot easily modify the voice sounds. Some systems create voices electronically based on the rules of proper pronunciation of certain letters. While these enable engineers to manipulate the sound of voices, they usually sound unnatural as compared to programs that use human speakers.
The WaveNet Technology cuts power demands and saves resources
Now, Google has revealed its commercial success from DeepMind’s technology. The search engine says that it has applied DeepMind’s program to cut power demands at data center by a phenomenal 40%. In the process, it has saved enough resources and can now justify its expensive acquire of the London-based outfit. It has also admitted to achieving improvements to services offered by Google Play, YouTube and some of its advertising outfits.
Following the breakthrough, other tech companies will possibly pay closer attention. From car manufacturers to mobile phone developers, speech is increasingly important because it helps human and machine interaction. When you look at Apple, Amazon, Microsoft, Alphabet and Google, there investment on personal assistants is evident.
Last week, an international director at Google Play (the outfit that deals with Android apps), said that 20% of mobile searches on Google are done using voice and not written content. Although technology giants have succeeded in making computers understand human speech and language, the ability of these computers to talk back at users in a human way has not been successful.
Google’s Deepmind to incorporate emotions and accents
Interestingly, DeepMind’s engineers say that they will incorporate accents and emotions as inputs to increase the authenticity of speech sounds. DeepMind researchers have also noted that enabling people to communicate with computers has been a longstanding dream. Now that the dream is now a reality, many people are greeting the news with applause. As we digest the news, let us wait and see how the company’s competitors will develop their own programs in pursuit of excellence in speech recognition technology.