The Move Towards On-Device Assistants for Performance and Privacy
Voice Assistants are growing in both popularity and capability. They are arriving in our home, cars, mobile devices and seem to now be a standard part of American culture, entering our tv shows, movies, music, and Super Bowl ads. However, this popularity is accompanied by a persistent concern over our privacy and the safety of our personal data when these devices are always listening and always watching.
There is a significant distrust of big companies like Facebook, Google, Apple, and Amazon. Facebook and Google have admitted to misusing our private data, and Apple and Amazon have admitted that system failures have led to a loss of private data.
So naturally, there would be an advantage of not sending our voices or videos into the cloud and doing the processing on-device. Then no data loss is at risk. Cloud-based queries could still occur, but through anonymized text only.
Computing at the Edge Versus the Cloud
There are forces bringing us closer to edge-based assistants and there are other forces leading to data going through the cloud. Here are a few ideas to consider.
- Power and Memory. There is no doubt that cloud-based solutions offer more power and memory, and deep learning approaches can certainly take advantage of those features. However, access speed and available bandwidth are often issues giving an edge to working on-device. Current, state of the art deep net modeling can allow limited domain natural language engines that require substantially less memory and MIPS than general purpose models, making natural language on device realistic today. Furthermore, powerful on-device voice experiences are increasingly realistic as we pack more and more memory and MIPS into smaller and cheaper packages. New chip architectures targeting deep learning methodologies can also lead to on-device breakthroughs and these designs are now hitting the markets.
- Accuracy. Although power and memory may be key factors in influencing accuracy, an on-device assistant may be able to take advantage of sensor and usage data and other embedded information not available to the cloud-based assistant so that it can better adapt to users and their preferences.
- Privacy. Not sending data to the cloud is more private.
Some have argued that we have carried microphones and cameras around with us for years without any issues, but I see this thinking as flawed. Just recently, Apple admitted to a facetime bug on mobile phones enabling “eavesdropping” on others.
Also, if my phone is listening for a wake word it’s a very different technology model than an IoT device that’s “always on.” Phones are usually designed to listen in arms-length situations of 2 or 3 feet. An IoT speaker is designed to listen to 20 feet! If we assume constant noise across a room that could make an assistant “false fire” and start listening, then we can think of 2 listening circles, one with a radius of 3 feet and one with a radius of 20 feet, to compare the listening area of the phone with a far-field IoT device such as a smart speaker. The phone has a listening area of π r2 or 9 π, the IoT device has a listening area of 400 π. So, all else equal the IoT device is about 44 times more likely to false fire and start listening when it wasn’t intended to.
As cloud-based far-field assistants enter the home there is a definite risk of our private data getting intercepted. It’s not just machine errors but human errors too, like the Amazon employee that accidentally sent out the wrong data to a person that requested it.
There are also other means in which we can lose our cloud-connected private data like the “dolphin attack” that can allow outsiders to listen in.
- The will of Amazon, Google, Apple, Government, and others. We should not underestimate the market power and persuasiveness of these tech giants. They want to open our wallets, and the best way to do that is to present us with things we want to buy…whether, food, shelter, gifts or whatever. Amazon is pretty good at selling us stuff. Google is pretty good at making money connecting people with things they want and showing them ads. User data makes all of these things easier and more effective. More effective means they make more money showing us ads and selling us stuff. I suspect that most of these giant players will have strong incentives to keep our assistants and our data flowing into the cloud. Of course, tempering this will is the various govt agencies trying to protect consumer privacy. Europe has launched GDPR (ironically leading to the Amazon accident mentioned above!) which could provide some disincentives around using cloud-based services.
On-Device Voice Assistants Will Become More Common
My conclusion is that there is a lot of opportunity in bringing assistants onto devices. It can not only protect privacy but through adaptation and domain limitation it can create a better-customized user experience. I predict increasingly more products to use on-device voice control and assistants! Of course, I also predict increasingly more devices to use cloud assistants. What wins out, in the long run, will probably depend more on government legislation and individual privacy concerns than anything else.