Google Duplex Shows the Way to a True Virtual Assistant
We are accustomed to thinking about voice assistants as helpful in fetching information, opening apps and executing simple commands. Today, Google CEO Sundar Pichai demonstrated how a true virtual assistant can be granted agency by the user and execute complex tasks on our behalf. A statement by Google clarified the scope and timeline for Google Duplex.
“This summer, we’ll start testing a new capability within the Google Assistant to help you make restaurant reservations, schedule hair salon appointments, and get holiday hours, over the phone. Powered by a new technology we call Google Duplex, the Assistant can call businesses on your behalf and understand complex sentences, fast speech, and long remarks, so it can get tasks done through a phone conversation.”
The demonstration moderated by Mr. Pichai involved Google Assistant being asked to set an appointment for a hair cut between 10 am and 12 noon on a specific day. Google Assistant called the preferred hair salon and negotiated an appropriate time for the appointment.
A lot happened in that interaction. First, Google Assistant was granted agency to complete a task that had some bounded requirements. Second, Google made a call to a salon and when a person answered asked to set an appointment for a “client.” A specific time as requested and when that time was available an alternate was suggested by the receptionist that was outside of the bounded time behind the original request. Google Assistant then asked for a start time between 10 am and 12 noon and an appointment was set for 10 am. The Assistant then confirmed the time and created a calendar entry for the user.
That interaction took about 1 minute to complete for Google Assistant after the phone was answered. However, it could have taken much longer if the caller was placed on hold or the receptionist wasn’t able to handle the request. The user saved at least one minute because the command to Google Assistant took only 1-2 seconds and zero cognitive load. Mr. Pichai commented:
It will save time for people and develop a lot of value for businesses…A common themes across all of these demonstrations is that we are working hard to give users back time.
This feature represents the next stage in assistant development. It moves virtual assistants out into the world beyond our living room or smartphone screen. Busy people have often wondered what it would be like to be in two places at once and how that could make them more productive. Features like Google Duplex enables Google Assistant to execute tasks on a user’s behalf so it is like having the productivity of personal assistant and being able to do two things at once.
This is precisely the value proposition that startup John Done hopes to tackle. The company’s co-founder and CEO Jeff Smith in 2017 demonstrated a virtual assistant calling florists in mid-town Manhattan to check on availability of a certain type of floral arrangement. The results were then reported back to the user, saving considerable time as the first several shops did not have availability.
Google also plans to use the Duplex capability to improve its own knowledge graph. When you search for local businesses Google will often surface the hours of operation. However, those hours are frequently changed during the holidays. Google Assistant can be used to call the retail outlets and automatically update the knowledge graph with the holiday hours.
Using Speech Disfluency to Increase Humanlike Nature of Voices
You may also have noticed in the video that the speech from Assistant was very humanlike. Speech disfluency interjections such as “Mm, hmm” and “um” were used by the Google Assistant. You can see in the second video below that Google Assistant also uses slang such as “gotcha.” Combined with Google’s new WaveNet powered Assistant voices, this makes the virtual assistant almost indistinguishable on the phone from a human.
Another impressive aspect to this demonstration is how well Google Assistant interpreted heavily accented English and a human speaker that was not answering the questions that were asked. Despite these issues, Assistant was still able to move the conversation forward, learn that you could not make a reservation for a table of four and that wait times on Wednesday evenings were typically not long. It will be interesting to hear how this dynamic NLU and speech disfluency works in the real world when in production this summer, but the demonstration was very impressive.