Portuguese TWIZ AI Wins Alexa Prize TaskBot Challenge 2 With Generative AI and LLMs
Amazon’s Alexa Prize judges have chosen the TWIZ team from NOVA School of Science and Technology (FCT NOVA) in Lisbon as the winner of the Alexa Prize TaskBot Challenge 2. TWIZ, short for Task Wizard, came in first among the ten teams competing to create a multi-modal voice app capable of guiding users through multi-step, complex tasks.
Team TWIZ earned $500,000 for winning the TaskBot Challenge 2. The University of Glasgow’s GRILLBot came in second place, picking up $100,000 after winning last year over runner-up TWIZ. Third place and $50,000 went to team ISABEL from the University of Pittsburgh. Each entrant team received a $250,000 research grant and various Amazon tech services. Alexa customers supplied feedback on the taskbots as well. Saying, “Alexa, let’s work together,” to Amazon Echo and Fire TV devices linked users with a random contestant. Concluding the conversation would then bring up a request to rate and comment on the experience. Teams used that feedback to improve their entry and the responses played a role in deciding which teams reached the semifinals and finals.
Amazon set up the Alexa TaskBot Challenge as a competition for conversational AI that could act as a partner for users engaging in more complex undertakings. That meant working across multiple interactions and employing both visual and audio elements. The idea is that a TaskBot can extend beyond the standard one-to-one ratio of human requests and AI actions. Even Alexa Routines with multiple parts are still really one command and one collection of errands for the voice assistant. For the first TaskBot Challenge, participants were asked for entries centered around either home improvement or cooking, but this iteration opened the contest to “more hobbies and at-home activities.”
“User dialogues in the Alexa TaskBot are unique, shedding a new light into the execution of manual tasks,” said Rafael Ferreira, the TWIZ team lead. “Leveraged by these dialogues, we learned that using TWIZ allowed us to steer conversations in a more contextual and insightful way.”
The concept of a TaskBot is very reminiscent of the ‘copilot’ metaphor deployed by Perplexity, OpenAI, and other brands with regard to generative AI-powered chatbots and conversational partners. Amazon hasn’t shied from using the term for its own generative AI features. In fact, generative AI and large language models (LLMs) played a major role in the Challenge. In an academic paper about the TaskBot Challenge, Amazon referred to LLMs as “the default workhorse” for successful TaskBots and pointed to the contest as the first time LLMs were “used to drive the whole process of dialog management, rather than just serving as neural generators providing candidate responses.”
“The most encouraging and impressive advances were in the application of large language models to dialog management itself,” said Michael Johnston, Alexa Prize science and engineering support team leader and Alexa AI applied science manager. “Rather than just using LLMs to create candidate responses, teams explored having an instruction-following LLM drive the whole conversation. I think cracking that problem for the task assistance domain was the major contributing factor in the quality and naturalness we saw in the top performing bots.”
The TWIZ team even developed a model to help complete many of its tasks called TWIZ LLM. The TWIZ team relied on generative AI models for several features, including producing recipes to fit a style of cooking and available ingredients listed by the user, finding scenes from videos based on vocal descriptions, and even writing suggestions for tasks that users could try.
“I’m extremely happy about the team’s creativity in designing the groundbreaking TWIZ LLM,” said team TWIZ advisor João Magalhães. “Conversations about video content take CX to an all-new level and I’m very proud for helping to pioneer video dialogue in the Alexa Prize. I think there’s a lot to explore here.”
“Compared to previous challenges, it was interesting to see how broadly generative AI and large language models are applied,” Johnston said. “Previous challenges have used earlier language models for generating candidate responses, but with the rise of large capacity language models with the ability to follow instructions, teams use them for many different tasks needed to improve their bots.”