Dedicated Audio Processors at the Edge are the Future. Here are the Reasons Why
In the evolution of computing, digital devices were initially powered by generic microcontrollers and microprocessors. As specialized applications became more complex, VLIW engines or DSPs rose in popularity. A similar progression also took place for video processing with specially-optimized video processors emerging to enable advanced functionalities for digital television, immersive gaming, 3D and VR. Now, as audio is finally advancing as a key interface for human-to-machine and even machine-to-machine communication and interactions, a comparable evolution is taking place where DSPs are being optimized to do more than just traditional audio processing. For example, by augmenting audio DSPs with the latest machine learning circuitry, they provide more value than just improving audio quality. They bring artificial intelligence (AI) to consumer devices, making them smarter and more powerful at the edge, without needing a network connection. The implications are powerful.
As we look closer at this audio evolution, below are a few reasons why “audio edge processors” are becoming critical requirements for designing the next generation of sound bars, smart home devices, smart speakers, smart TVs and a whole host of intelligent IoT devices.
#1 Higher Quality Performance and Smarter Functionality
Most devices today rely on an applications processor to perform a broad range of duties. With the advent of AI, machine learning and audio playback capabilities being built into devices, the applications processor has become overtasked. When advanced voice processing has to compete with normal device functions, problems such as latency, application processor overheating and power consumption budgets become issues. Adding a second or more powerful applications processor is not practical and means higher cost, increased power consumption, repeating the porting of audio codes or repartitioning algorithms, and possibly having to recertify the end device within its ecosystem. The end result would be a more expensive device delivered late to the market. Likewise, adding a generic DSP would not be sufficient for high quality audio performance or intelligent functions.
The ideal solution is to use a specialized audio edge processor with an audio-specific instruction set, machine learning optimizations, and a focus on the audio fabric. These processors can deliver enough compute power to process complex audio with AI and machine learning algorithms at a fraction of the energy of an application processor or generic DSP implementation. This results in high-quality audio performance and intelligent functions for smart devices such as the ability to identify or locate the person speaking or cancel out noise interference. It also consumes much less power, allowing the device to become battery powered and mobile. Lower power also means less heat which equates to higher reliability and longer device life.
Take a look at the popular Amazon Alexa devices. They take a lot of processing power to recognize wake words such as “Alexa,” particularly in noisy conditions and when the user is farther away from the device. The audio edge processor provides the compute power at the edge needed to handle this and other functions that previously might have been handled in the cloud. However, even cloud providers agree that they can’t continue to send everything into the cloud due to increased latency, network traffic, connection reliability, and privacy concerns. Adding a specialized audio edge processor to handle more functions solves these problems.
#2: Enabling Next Generation Smart Devices with Video and Audio
There are a wide variety of designers building the next generation of smart speakers, and many are adding smart screens. These new video-centric devices are smart speakers integrated with touch screens and audio edge processors. The same is true for next generation Smart-TVs, with the addition of voice activation and control. Traditional TV processors are already overtasked with standard TV control functions, managing increasingly complex menus as well as new streaming requirements. In order to implement voice interaction, TV designers need to consider where to run the complex voice interaction software. Using a dedicated audio edge processor has several critical advantages. First and most obvious, the performance and power dissipation advantages of using such a processor is clear from the discussion above.
In addition, TVs generally have the main processing unit at the bottom of the TV with several microphones at the top. With a dedicated audio edge processor, TV manufacturers can now use an audio processor closer to the embedded microphones to process sound and voice commands, eliminating the need for an expensive shielded cable to carry audio running from a microphone at the top of the TV to the application processor at the bottom. Even more important, TV manufacturers can improve ROI and scale by adding voice to multiple TV production lines through one module that includes an audio edge processor with embedded microphones. This eases the process of integrating hardware and software with a TV applications processor and provides a better experience with voice control. A single audio processor is also capable of listening to multiple microphones – up to eight at a time – which can significantly enhance performance in these next-generation devices, particularly in noisy environments.
#3: Increased Privacy and Security
Audio processors can add a higher level of security and privacy to smart home devices. They provide the compute power needed to process audio locally, which eliminates the need to send all captured voice and other sounds to the cloud for processing. Designers and consumers are demanding more privacy; therefore, designers can incorporate more local voice commands so the device can function without a consumer’s voice being sent to the cloud. This approach is also inherently more robust to prevent hacking through the device’s internet connection as the audio edge processor can act as a further barrier to malicious device manipulation.
#4: More Embedded Memory Results in Better Performance and Differentiating Features
As demand for better performance and more use cases grow, audio edge processors are adding additional memory – in the range of multiple megabytes – enabling them to do many things at the edge that were previously not possible. For example, more memory means that designers can put more complex wake words, large command sets and smarter functions like contextual awareness into smart devices, improving the audio understanding in devices and enhancing the user’s experience.
#5: Open audio platform and broad ecosystem
Not all DSPs are open, but an audio edge processor that opens up its architecture and development environment into a strong ecosystem of audio application developers provides greater innovation, flexibility, and access to tools and support. This enables designers to create truly unique devices and applications. More importantly, the adoption of audio as a powerful user interface relies on enabling devices to interact and understand humans more naturally. This requires years of research and knowledge from the audio community to advance the complex AI or machine learning algorithms that enable natural language understanding.
We’ve already made great strides with low-power, software codecs (which cost-effectively enable enhanced audio); ANC (active noise cancellation); beam-forming; and machine learning algorithms for recognizing glass breaking, alarms, etc. Further advancements and customization enabling the next generation of devices will only improve through an open platform and broad ecosystem of audio partners.
A Sound Investment for the Future
Given all the advantages of dedicated audio edge processors, they represent a very “sound” investment for redefining the audio experience. Smart devices and TVs are changing rapidly and consumers want more features and functionality, which ultimately pushes the power and performance of these devices. Leveraging this type of processor addresses the needs of the rapidly changing smartphone, wireless earbud, smart speaker, smart screen and other IoT devices, enabling the continued introduction of next-generation use cases and products.
John Young is senior product line manager for Knowles Intelligent Audio
Audio Museum of Art Alexa Skill Provides Content Optimized for the Smart Speaker Experience