Improved Voice Control Accuracy with Domain-Specific Assistants
Cloud computing provides nearly unlimited resources for speech recognition tasks. That is why Alexa, Siri and Google assistants do such a good job understanding human speech. However, these voice assistants are expected to understand a wide variety of requests, like, “What’s the weather,” “play some music,” “who won the election,” “when is my next appointment,” “turn off the lights,” “set a timer,” “what time is the game,” and on and on. Since cloud-based assistants need to deal with a broad array of requests, they are “generalists” by design and cannot match the accuracy speech solutions tailored for specific domains, “specialists.” A recent product evaluation report by Vocalize.ai1 shows that an embedded, domain-specific solution, can provide more accurate speech recognition and natural language understanding than a cloud-based general assistant.
AI Generalists and Specialists
The product evaluation report2 compared the task completion rate of two voice-enabled microwave ovens, one being a microwave oven from AmazonBasics with cloud-based (generalized) voice control supplied by AVS, and the other a microwave oven from Midea with domain-specific (specialized) voice control from Sensory. Each device was presented with 40 speech utterances, all related to microwave oven cooking functions. The microwave ovens were scored on how well they completed the requested tasks. The table below shows that the Sensory solution completed 93% of the requested tasks and the Amazon solution completed only 55% of the tasks.
Trained, Tuned and Weighted
In the comparison above, it is clear that the Sensory powered microwave is a “specialist” that can understand and execute cooking commands more accurately than Amazon’s “generalist” solution. Just as this was done for the microwave oven domain, Sensory has also created domain-specific models that support streaming media, food ordering, camera controls, fitness/exercise, and even a virtual barista.
To demonstrate the benefits of a domain-specific assistant, it may be useful to share some specific examples from the microwave oven comparison. In this example, “melt chocolate” is more likely to be encountered within the microwave oven domain, and the language model trained by Sensory correctly recognized it and started the melt function. The Amazon general model decided on “milk chocolate” and this results in no action being taken by the Amazon microwave.
Test Utterance: Melt chocolate, 8 ounces please.
Sensory Result: Melt chocolate, 8 ounces please.
Amazon Result: Milk chocolate, 8 ounces please.
A similar experience occurred with the word “thaw.” The language model trained and tuned by Sensory correctly recognizes the word “thaw” and starts the defrost function. The Amazon general model incorrectly recognized the command as “pause.”
Test Utterance: Thaw salmon for 5 minutes.
Sensory Result: Thaw salmon for 5 minutes.
Amazon Result: Pause salmon for 5 minutes.
In the final and most egregious example, the Amazon microwave oven ignored the command to “stop cooking.” It recognized “start cooking” and executed the command, but fails to retain the context to execute a “stop cooking” command.
Test Utterance: Stop cooking.
Sensory Result: Action taken = stops cooking.
Amazon Result: Action taken = continues cooking.