boy-child-childhood-346796

Guest Developer Post: Alexa Needs Intent Context in 2019

Dialog and state management are indispensable in delivering the best voice experiences possible and are invaluable tools in an Alexa developer’s tool-belt. But there are times when these features are not enough, and we must wait for a new hero to appear: Intent Context.

TL;DR: Where Dialog and State Management Fall Short

  • One word/one phrase answers are part of natural conversation: Yes, No, Fargo, Harry Potter.
  • Generally, all intents in a skill are candidates for all utterances a user speaks which can lead to intent confusion and the wrong intent being selected by the Alexa service.
  • Dialog management’s purpose is slot filling of one or more slots. It requires an intent with one or more slots with surrounding text or a text phrase with no slots.
  • Once an intent is selected, dialog management can weigh remaining slot values higher than the intents in the skill.
  • State management is not a first-class construct and is enabled by storing session state and round-tripping those values in subsequent requests.
  • The Alexa service makes no use of STATE when determining which intents are candidates in the current context of the conversation. Once an intent is selected, STATE allows different code to run for the same intent.
  • State management can never solve the issue of conflicting intents.
  • What is needed is a way for developers to influence how the Alexa service selects candidate intents for consideration. When conflicts between intents occur, developers can provide additional context to get the needed conversational experience. Intent context solves this issue.
  • One way you can influence getting this feature added is by joining the Alexa User Voice forum and voting. Getting to more than 200 votes will get this feature listed as the top request. If you agree that this is an important feature, please vote here.

Short Responses Naturally Happen in Conversation

Often during conversations, questions are asked and one word or one phrase (or in developer terms one slot/entity value) responses are given. It is a part of natural conversation flow. Although these short clips are written as a conversation between Alexa and a user, they could have just as easily been overheard in a restaurant coming from the people at a nearby table:

ALEXA: Do you want to continue?

USER: No

ALEXA: Do you want fries with that?

USER: Yes

ALEXA: Who is your favorite band?

USER: Yes

 

ALEXA: What city are you from?

USER: Fargo

ALEXA: What is your favorite movie?

USER: Fargo

 

ALEXA: What is your favorite movie?

USER: Harry Potter

ALEXA: What is your favorite book?

USER: Harry Potter

ALEXA: What is your favorite character?

USER: Harry Potter

Language Model

When defining custom intents that allow for this one word/phrase response, then one sample in the list must not include any surrounding helper words:

To this we add the build-in intents for Yes and No which can be optionally extended with values:

Dialog Management Prioritizes Slots, Not Intents

The above examples of utterances don’t work with dialog management because its purpose is to get you into an intent and then do the job of slot filling of one or more slots. It requires an intent with one or more slots with surrounding text or a text phrase. Once an intent is selected, dialog management can weigh remaining slot values higher than the intents in the skill.

State Management is Ignored by Alexa

The concept of state management is not a first-class construct in the Alexa world. What I mean by that is whereas state management is allowed by the round-tripping of session attributes, the name of the variable could be STATE or BLAHBLAH and the Alexa service doesn’t care because it ignores it completely.

The meaning of state management is up to the developer to decide and its power is when an intent can be used multiple times and you want different code executed. If your skill includes multiple yes/no questions, then chances are you want different things to happen for each:

ALEXA: Do you want to continue?

USER: No

ALEXA: Do you want fries with that?

USER: Yes

State management to the rescue! In the response to Alexa to ask the first question, set STATE = ‘CONTINUE_STATE’ so the user’s next utterance can go to the handler for continuing. For the second-question response, set STATE = ‘WANT_FRIES_STATE’ which will lead to its corresponding code. But state management can never solve the issue of conflicting intents.

The Challenge is Guiding Alexa to the Correct Intent Prioritization

Currently, Alexa treats all intents in a custom skill as candidates for all user utterances. If my skill can handle both yes/no questions and the FavoriteBandIntent how can we ensure that the answer “yes” will be sent to the right intent handler? The same goes for CityNameIntent and MovieNameIntent with “Fargo” or MovieNameIntent and BookNameIntent with “Harry Potter”.

There is another issue. Often the list of samples for a given slot type is so large, that is clashes with seemingly unrelated intents or slots. I just ran into an instance where the AMAZON.YesIntent collided with BookNameIntent when the user answered “yes”.

Even though “yes” is explicitly included in AMAZON.YesIntent and “yes” is not a match in AMAZON.Book slot type, BookNameIntent was still selected. This log entry shows that BookNameIntent was chosen even with an ER_SUCCESS_NO_MATCH status code and that AMAZON.YesIntent is not listed in ConsideredIntents:

A New Hero for Developers and Alexa User Experience: Hints

One solution to the challenge would be to allow intent hints to be included in the response. These hints could be instructions to the Alexa service to exclude a list of intents from consideration in the next user utterance. If the user is asked for their favorite band, exclude AMAZON.YesIntent from consideration. When there is a conflict as is the case where AMAZON.YesIntent collided with BookNameIntent then I could exclude BookNameIntent in those cases where I am asking yes/no questions. Other hints could be added to increase the weight of one or more intents or decrease the weight of others. Hints solve the challenge of intent confusion, but a more powerful solution is Intent Context.

A New Super Hero for Developers and Alexa User Experience: Intent Context

Intent context is different than hints in that it is directly associated with the language model. When you define intents, you specify a keyword that identifies which context these intents participate in. Similar to how you can name different STATE values in state management, you can set a context on a skill response that would tell the Alexa service which context is currently active. When the Alexa service picks candidate intents to satisfy a user utterance, it can use the intent context keyword in the selection criteria.

For Google Actions that use Dialogflow, this concept is called input and output contexts.  By following some basic rules and utilizing multiple contexts and lifespan values, you can affect when intents are matched.

Depending on how this is implemented in Alexa, the context could be round-tripped from skill to device and back and act in the same way that state management does. That way the intent context could not only affect which intents are selected but which code in the skill is executed for a given context. This would have the added benefit of making state management a first-class concept.

Vote for Intent Context as New Alexa Skills Kit Feature Request

Dialog and state management are useful and necessary features for developing Alexa skills, but the much-needed intent context capability is missing. I am hoping that Amazon has already been working on this and intent context will be released in early 2019. In my opinion, this should be a top priority of new Alexa developer features. In the meantime, I must use a workaround to resolve the intent confusion in my most-recent skill.

You can vote for this feature by joining the Alexa User Voice forum. Tallying more than 200 votes will get this feature listed as the top request. If you agree that this is an important feature, please vote here. Also, please share this post on social media and encourage 4 others to vote for it.

Mark Tucker is an independent software developer and Alexa Champion with over 20 years of software design and development experience. He is the organizer of the Phoenix Alexa Meetup and the Phoenix Chapter of the Ubiquitous Voice Society. You can find him on Twitter @marktucker or LinkedIn.