Amazon Alexa Natural Turn Taking and Preference Learning Could Transform Alexa into a Super Assistant, But Not Yet
- Alexa natural turn taking will enable users to engage Alexa in a conversation and have multiple rounds of interaction with only a single wake word activation
- This feature is impressive and involves many technical capabilities that Alexa has not demonstrated previously, but it will not be available until sometime in 2021
- The ability to tell Alexa your preferences and have your voice assistant experience customized will deliver more immediate benefits for users because the impact will be felt with far higher frequency
After you get over the shock of an in-home security drone and the new 360-degree swivel for the Echo Show 10, there is little doubt that the most memorable segment of Amazon’s 2020 product launch was the Alexa natural turn taking feature demo. Rohit Prasad, Amazon vice president and head scientist for Alexa, introduced the feature by discussing the importance of supporting natural conversations and how that is different than the voice assistant communication experience today.
There is turn taking today when conversing with Alexa. The user speaks then Alexa speaks. That is followed by the user and back to Alexa and so forth. It’s highly structured and doesn’t accommodate interruptions, tangents, or trackbacks very well. The current model is decidedly unlike how humans interact in conversation. Natural turn taking is definitely more accommodating to the vagaries of human conversation.
As good as the natural turn taking demo was, the feature that will probably have a bigger impact is the ability to teach Alexa your preferences. This is long overdue. For Alexa to be a truly personal assistant, it needs to know personal preferences. This knowledge can help make Alexa more useful every day. Prasad demonstrated this feature as well telling Alexa what he meant by certain phrases. However, the practical benefits of Alexa remembering your preferences are easily overshadowed in a two-minute demo by the scope of changes required to support natural turn taking.
This is an Entirely New Alexa
Amazon has some perplexing naming conventions. Natural turn taking undersells what is really going on here. Prasad commented in a blog post about the announcement that natural turn taking, “allows you to speak to Alexa without using a wake word during the course of a conversation. This advancement allows you to interact at your own pace, even when multiple people are talking.”
The demonstration video embedded above offers more context into how this will work. Two women are having a conversation in a kitchen. One suggests they order a pizza and the other agrees. The first woman then invites Alexa to join the conversation by saying, “Alexa join my conversation.” Alexa then starts listening but doesn’t need to start talking. Instead, Alexa is at the ready. Then Alexa is asked, without using the wake word, to order a pizza. While the women discuss what they would like to order, a couple of changes to their preferences are mentioned including toppings and size. Alexa recognizes what information is for the order and clarifies the details. Then Alexa completes the order. This is followed by an equally complex interaction about a movie search and selection by the two women who both speak to Alexa intermittently and each other.
During this entire exchange, Alexa is only invoked once and the session never times out. The session context is maintained. It’s as if Alexa is a helper in the room and doesn’t need to be summoned to do something or even to ensure it is clear what information is relevant for the expected task.
So Many Long-Awaited Features
Even without showing you the video, I can relay a few important features that were demonstrated, many of which you are probably already anticipating. The demo included:
- Wake word free interactions
- Conversation pauses
- Crosstalk management
- Multiple speaker identification
- Session context maintenance
- Context switching
- Proactive engagement
- Barge in
This is a good list to begin our discussion even though there were certainly other smaller changes required to support an exchange like that outlined above. One wake word with a request to join a conversation begins the session. So far, nothing unusual has occurred. Then Alexa isn’t addressed explicitly while the women speak with each other but the session stays open. Many voice developers have been plagued by Alexa session time outs after just a few seconds. This can disrupt or destroy the user experience. In this new natural turn taking mode, it is not an issue, at least to a point. Amazon would not clarify how long Alexa would wait before timing out in this model. A company spokesperson replied to my question saying:
“Alexa will remain wake word free for a short window following your last request directed at Alexa, or Alexa’s last response. Unless the device detects speech that is directed to Alexa, Alexa will stop processing the conversation. You can close the interaction at any time by saying ‘Alexa, leave my conversation.’”
This response confirms there is a time out but dodges the more explicit question about how that period might be. It is fair to say the time out period is both longer and only requires a conversation to be taking place even if Alexa isn’t involved. Alexa needs to hear chatter to know the conversation it was invited to attend is still in process.
Crosstalk, Context, and Barge In
Smart speakers don’t deal well with crosstalk when multiple people are speaking at once. Yet, natural turn taking is designed to manage multiple concurrent speakers and listen for cues as to what might be relevant to a designated task. The demo showed crosstalk and Alexa was not confused by it. Beyond this, there was multiple speaker identification, an existing feature that was highlighted when the order was placed and confirmed.
Session context maintenance is a function of Alexa not timing out and remembering key pieces of information shared during the conversation. Time outs are a big problem today for both first-party and third-party Alexa skills. If you don’t complete your request or transaction before Alexa times out, you are very likely to have to start over or go back some steps when you make another attempt. This capability looks like it builds on Alexa Conversations features demonstrated in 2019 and available for on use case since January 2020, and the recently announced skill resumption feature from Alexa Live. Alexa sessions today are too often fire and forget. There are limits to Alexa’s short-term memory so continuity between sessions is often fraught and complex task execution can be inefficient.
We also saw a demonstration of context switching. Alexa completed the pizza order and ordinarily would have ended the session as the task was finished. However, Alexa stayed open and in the conversation so another, unrelated request was made about movie options — no wake word required. Amazon today doesn’t do context switching. If you want to move on to a new domain or task, a new Alexa wake word utterance and new session are required.
The demo also showed us Alexa proactively asking for information much as it would in an Alexa Conversations were there is a cross-skill goal completion need. And amidst all these fancy new features, there was also barge in — the user could interrupt Alexa without a wake word utterance. This might be the most common of everyday natural language interactions and something that voice assistants simply don’t support well without a cue such as a new wake word utterance.
All of these features are common elements of everyday human conversations. However, voice assistants in general are not designed to accommodate these needs or input methods. The capabilities definitely make Alexa seem like a more robust and useful assistant.
A More Practical Alexa Update
The demonstration was intriguing but isolated and not that common. How often are you standing around with someone and trying to make a joint decision? The value of this capability may be high in some circumstances but the frequency fairly low. It reminded me of the Alexa Conversations demo at RE:MARS 2019. The user was planning a night out and Alexa expertly coordinated multiple third-party skills to fulfill the user request and get everything scheduled. That scenario is not an everyday need for most people.
However, the ability to tell Alexa your preferences does have everyday implications. Prasad went through a series of interactions with Alexa to indicate what he wanted for volume or content related to specific commands. It essence, it is like Siri shortcuts without the need to create a shortcut. You just tell Alexa what you mean when you make certain statements and your voice assistant experience will be customized. That is a practical benefit.
Natural turn taking definitely will not be available before next year. Given its complexity, I suspect it is at least 12 months away. Consider that it took about eight months from our first “night out” demonstration until that feature became generally available. Alexa preferences is far simpler and long overdue. This will arrive sooner and provide more value. An Amazon spokesperson tells me it will “be available in the U.S. in the coming months.” That is not very specific but is also not like saying 2021.
There was not a lot of Alexa on display at the Amazon product launch event. It was about devices first and the Alexa features that were mentioned are future additions. So, for now, Alexa users will have to be content with the voice assistant as is.