Alexa Moves Most Operations to Amazon-Built Chip to Boost Speed and Efficiency
Alexa is moving most of its operations onto the Inferentia computer chip produced by Amazon. The migration away from the Nvidia chips used by Alexa until now will boost the voice assistant’s speed and reduce its energy demands, according to the company, while also giving Amazon more control of its voice assistant.
Nvidia’s chips have acted as the central depot for Alexa. The chips relay Alexa users’ questions and commands to the Amazon Web Services data centers in the cloud and process the response from text into speech. Now, Inferentia chips will take up the role. Amazon announced the long-rumored custom chips two years ago and unveiled them at a presentation last year. They are built specifically for the machine learning tasks that comprise much of Alexa’s work, like recognizing language and images and generating an appropriate response. Along with Alexa, the Inferentia chips will now be used for its facial recognition feature, Rekognition. The central goal of building the custom chip was improving Alexa’s speed and efficiency compared to the current model.
“[Using the new chips] resulted in 25% lower end-to-end latency, and 30% lower cost compared to GPU-based instances for Alexa’s text-to-speech workloads,” AWS Principal Developer Advocate Sébastien Stormacq explained in a blog post about the change. “The lower latency allows Alexa engineers to innovate with more complex algorithms and to improve the overall Alexa experience for our customers.”
The improvements to Alexa promised by the new chip still involve transferring data to and from the cloud. It’s reminiscent of the low-power Alexa variant Amazon debuted last year, which put all of the processing into the cloud instead of the device, cutting power and memory requirements enormously, making it possible to add Alexa to all kinds of simple devices like lightbulbs that would normally be unable to support the processing power necessary. But, the reliance on the cloud contrasts with the steady rise in companies producing chips designed to keep voice and AI operations on a device.
For instance, Speech tech developer Sensory offers a customizable voice assistant specifically for smart home appliances with no need for the cloud. Sensory also just debuted its new VoiceHub platform to streamline creating custom wake words for those devices. The same features are part of the Picovoice AI platform and ID R&D’s vocal identification engine, which adds vocal security to devices without needing to transmit data. The tech giants aren’t automatically opposed to the on-edge systems. Many Alexa or Google Assistant-enable devices include on-edge elements like the analog wake word identifier from Aspinity, Sensory’s TrulyHandsfree speech recognition software for iOS and Android apps, or the Amazon-approved Alexa headset development kit from Knowles.
Amazon is not alone in looking to replace Nvidia and other dedicated chip makers with internally produced choices, cloud-free or otherwise. Google is supposedly working with Samsung to design a processor, possibly named Whitechapel, for Pixel smartphones and Chromebook computers. The chip is supposed to make Google Assistant perform better in both kinds of devices. The new chip would replace the Qualcomm-built ones currently in use. Google consistently adds new improvements to Google Assistant’s capabilities, and hardware upgrades are needed to go with the software. A custom chip can serve that purpose better than one with broader usage.
Apple has also started looking internally for chips. The company also went in-house for its newest line of Mac computers, using chips it produced in lieu of its usual partner, Intel. Apple is likely planning to improve Siri’s efficiency and speed with its own chips, too, although possibly with an emphasis on the edge processing that Amazon’s new chip ignores. That may be part of why Apple acquired edge-based artificial intelligence startup Xnor.ai in January for a reported $200 million. Xnor’s low-power machine learning technology is designed to run continuously without the cloud, which would mean more efficient, quicker operations.