John Legend’s Voice Makes its Debut on Google Home, But it’s a Recording, Not Generative Speech Synthesis
A Reddit user recently noticed the sound of John Legend’s voice emanating from Google Home after saying, “Thank you, Google.” The legendary response was, “You’re very, very welcome.” However, the introduction of John Legend into the Google Home and Google Assistant experience should be viewed as a recording, not his voice. Android Police conveniently captured the exchange in a YouTube video.
It’s a Legendary Easter Egg. That’s All Folks.
If you want to try it out, you can say, “Thank you, Google” or “Thank you, John Legend.” It works both with Google Home devices and using Google Assistant through a smartphone. There is no setting in the Google Assistant app where you select voices to make this occur or choose John Legend as your default “voice” for other interactions. A recording of Mr. Legend is the response when you say something specific. Think of it as an Easter egg. And, it appeared without an app update so you can bet this is simply an audio file occasionally delivered from the cloud without alteration.
There will likely be more of these canned responses in the future. Sundar Pichai announced at Google I/O in 2018 that users would hear Legend saying different things “in certain contexts.” Unpack that and you get a sprinkling of John Legends statements based on recordings.
— Karissa Bell (@karissabe) May 8, 2018
You Need a Generative Speech Synthesis Engine to Deliver a Voice
When most people think about an audible voice coming out of a smart speaker or smartphone, they are thinking about Alexa or Siri or one of the “colors” assigned by Google Assistant. These “voices” can say just about anything and there was no need to record every possible response required to interact with a user. By breaking down the sounds made using various parts of speech, a generative speech synthesis engine can make new sounds. That is similar to how human speech works. We don’t memorize how to say every potential word or phrase. We learn to make sounds and to put them together to form words, phrases, and sentences.
Very often literate humans can read an unfamiliar word and pronounce it correctly because they know what sounds are associated with various letter combinations. Wavenet, the speech synthesis technology behind Google Assistant, Google Duplex and other voice technology services, mimics this approach when responding to user queries. It can generate an infinite variety of speech because it has a sound bank it can draw from to form new word sounds when interpreting text.
Many people are excited about John Legend’s voice in Google Home, but it is not speech synthesis. It is just a recording so the variety of speech will be limited by what Mr. Legend recorded. Google has the capability to mimic John Legend’s vocal imprint and speaking style and likely could make a reasonable facsimile that users would think of as the famous recording artist speaking with them. However, Google would need additional recorded speech from Legend and he would have to agree to have his vocal facsimile say anything. That is a risk. Everyone that has seen South Park knows that you can make voice assistants say just about anything and we could soon be awash with John Legend’s voice making statements the artist never intended and would never endorse.
So, feel free to ignore the hype. It’s just a recording.