Sensory is Enabling Offline Smart Speakers with No Cloud Connectivity to Maximize Security
Sensory announced at CES that it can now enable a smart speaker that does not require WiFi, internet or cloud access to operate. The Voice Genie Smart Speaker reference design offers customized wake words enabling device makers to differentiate their speakers with a branded trigger word or phrase. In addition, the embedded natural language processing technology enables voice interaction without the privacy concerns some consumers have about the cloud-based speech recognition systems that are common in today’s leading smart speakers. Todd Mozer, CEO of Sensory, said in an interview for the Voicebot Podcast:
“We’ve been told by a lot of our customers that they don’t really want to have an Alexa branded speaker or Google. They want their own brand. We have now enabled the technology so that they can use their own brand and they can do it in a way that it’s embedded. Nothing goes off to the cloud. So, you can do speaker control with your voice with nothing being recorded, nothing going off to the cloud. We can do a live recognition of what’s said and you can control the volume or set your timers or some of the most popular functions that people do with their smart speakers.”
Why Would You Want a Cloud-free Smart Speaker?
Many consumers are aware that Alexa, Google Assistant, and Siri all run their natural language processing in the cloud and then return information, entertainment or control smart home devices via the internet. That round trip to a cloud-based server farm raises concern among some consumers that their privacy may be compromised. However, many consumers are also drawn to the convenience of smart speakers because the far-field microphones enable you to control music hands-free from across the room. Sensory’s solution is a smart speaker where all of the natural language processing and features are controlled on the device without the need for cloud connectivity and therefore without concern for an online privacy compromise. The trade-off is the breadth of features. Mozer added:
“We can actually deliver a higher quality solution with less features…We’re not trying to do a completely open domain thing where I can ask my smart speaker, ‘What’s the weather in Thailand?’ I can ask some simple local functions and its the kind of functions that people really like to do. If I want to know the weather in Thailand, then I can pick up my phone…It’s real simple functions…in this smart speaker example. It’s things like volume control. It’s pairing with your phone. It’s next song and pause music. It’s setting a timer. It’s asking what time it is.
“We don’t need a special app to be put on the phone. All of the technology will reside on the product itself. You will just need to pair it with your phone. Standard Bluetooth protocols like AVRCP will allow the Bluetooth speaker to talk to your phone with control language not with human speech. So, human speech doesn’t ever get recorded.”
Basic Functions on the Device, Short Range Bluetooth for Audio Content
If the natural language processing unit is not trying to understand all speech but instead is limited to a few commands, the processing can all be done on the device. This is what Sensory does with its wake word and other embedded speech and biometric technologies. The company has been around for more than 20 years and its technology has shipped in more than three billion devices. At CES 2019, more than 100 products were on display using Sensory’s embedded software ranging from LG phones and Samsung appliances to GoPro cameras, and the new UEI Butler universal remote control. Sensory even provides speech recognition software components to some of the leaders in the field such as Amazon and Google.
The Voice Genie Smart Speaker reference design and software offer manufacturers the core elements of an offline smart speaker that can perform basic voice interactive functions either on the device or by pairing with a nearby smartphone. Leveraging OS-level controls in the smartphone that are Bluetooth accessible enables users to send music, podcasts, and other content to the speaker while the voice commands interpreted by the device then sends control commands back to the smartphone to control playback. Brian Roemmele, the author of the forthcoming book “The Last Interface,” says the approach promoted by Sensory is needed:
“This is a very important extension of the voice-first technology. There are many cases where this makes a great deal of sense as over 59% of us smartphone users have purchased or have local music files on their phones and it would make sense to access this via off-line voice-first technology.”
The customer for Sensory’s technology is device makers who market and sell end products to consumers. Mozer tells Voicebot that the technology development was driven by one of the company’s existing customers and that he expects offline smart speakers will be available for demonstration and sale at CES 2020.