Is Alexa Getting Smarter? It’s Complicated
In December 2018, several tech websites reported that Alexa devices would soon get access to Wolfram|Alpha’s knowledge engine. It was heralded as a move that would enable Alexa to answer more questions and with greater accuracy. At that time, the headlines on the topic included, Alexa Just Got a Heck of a Lot Smarter and Wolfram Alpha Makes Alexa Smarter. Fast forward to January 2019 and Wolfram|Alpha issued a press release confirming the integration with Amazon Alexa. From the press release, “By sourcing answers from Wolfram|Alpha, Alexa greatly enhances its ability to provide users with factual information about history, economics, geography, math, and astronomy.”
The newly announced Alexa and Wolfram|Alpha integration raises an interesting question. Is Alexa really getting smarter? How do we even quantify the claim? At Vocalize.ai we set out to find the answer and what we discovered is actually quite surprising. It turns out that over time, Alexa was getting the capability to answer more questions, but she was also forgetting answers that she used to know. How is it possible that Alexa could forget facts and answers? To be honest, we are not yet sure, but the data is clear… Alexa is both providing new answers and at the same time, forgetting previously known answers.
Through the course of Vocalize.ai research and client directed projects, we have asked tens of thousands of questions to many of the most popular virtual assistants. The questions include common household queries e.g. “What’s the weather this weekend?” as well as more complex general knowledge questions e.g. “Which island located in the Indian Ocean was also known as Ceylon?” Our history of data enables us to reference Alexa data collected before the Wolfram|Alpha integration and compare it to newly collected data, after the Wolfram|Alpha integration.
For this initial comparison, we chose the category of geography, which was one of the topics called out in the Wolfram|Alpha press release. Starting with a list of 265 general knowledge geography questions, we will compare the results from Alexa-2018 (sans Wolfram|Alpha) and Alexa-2019 (with Wolfram|Alpha). We have access to comprehensive data on several of the categories mentioned in the Wolfram|Alpha press release and future comparisons will include science and history.
What Alexa Got Right
The newer Alexa-2019 with Wolfram|Alpha support correctly answered 18 more questions than the previous Alexa-2018. This equates to a nearly 7% increase in correct answers, which could be thought of as a nice intelligence boost by most anyone. By reviewing the logs, we can see that some of this new intelligence was indeed sourced from Wolfram|Alpha (see picture below). However, some of the newly correct data was sourced from Wikipedia and some did not provide a reference source.
What Alexa Got Wrong
This was the unexpected turn in our analysis. We found 10 questions that Alexa-2018 had correctly answered, but the new and improved Alexa-2019 repeatedly got them wrong. Seemingly the newer and smarter version of Alexa had forgotten this knowledge. This is roughly a 4% decrease in correct answers. We really have no idea why Alexa could not answer these questions anymore, but it definitely gives pause to the claim that Alexa is getting smarter all the time. Some examples of questions and answers that Alexa forgot are provided below.
Question: Which US state is nicknamed Land of Infinite Variety?
Answer: South Dakota
Alexa-2018: The nicknames of South Dakota also called the Land of Infinite Variety.
Alexa-2019: Sorry, I’m not sure.
Question: What is the deepest and oldest freshwater lake on Earth?
Answer: Lake Baikal
Alexa-2018: According to another Amazon customer, the world’s deepest lake, Lake Baikal in Southern Russia is the world’s deepest lake.
Alexa-2019: Sorry, I’m not sure.
Question: Which Caribbean island is shared by France and the Netherlands?
Answer: St. Martin
Alexa-2018: Here’s something I found from the article Hispaniola on Wikipedia. The only other shared island in the Caribbean is St. Martin, which is shared between France and the Netherlands.
Alexa-2019: The flight time between France and The Netherlands is approximately one hour.
The Bottom Line
Within the scope of geography questions, it looks like Alexa-Wolfram integration does make Alexa “smarter” with a 7% gain of correct answers. However, that gain was partially offset by an unexpected 4% loss. Leaving a net gain of just 3% and not too worthy of the headlines.
Vocalize.ai did a quick analysis with a set of general knowledge questions. What we found was quite surprising. We will keep digging deeper on this one and continue to build out our benchmarking services around virtual assistants. In the meanwhile, be wary of virtual assistant companies claiming the latest updates make them: smarter, more conversational, improved, more accurate, greatly enhances, etc. Typically, we do not see these claims backed by test results or shared data. Surprisingly, gains in one area may be offset by losses in that same area.
Joe Murphy is CEO and founder of Vocalize.ai. The company develops tools and provide services designed to audit, interrogate and benchmark the performance of ASR and NLU systems and specifically conducts benchmarks of voice assistants. You can learn more about the company at Vocalize.ai or contact firstname.lastname@example.org.
Google Assistant Wins Another Open Question Test While Apple Siri and Amazon Alexa Improve Substantially