Picovoice Launches Speech-to-Text Platform ‘On the Edge’
Voice technology platform developer Picovoice has debuted a new real-time speech-to-text platform. The new engine is designed to be able to run offline, without sending data to the cloud, turning speech into text more quickly while preserving data privacy by keeping the entire process within the user’s device.
Cheap and Easy Speech-to-Text
Speech-to-text services are ubiquitous today. Almost all smartphones and other devices with a voice assistant and a screen can turn voice into text for note-taking, transcription, and related tasks. Those services all record the audio, digitize it, then send the data into cloud servers where it is processed into text and relayed back to the device. The Picovoice engine, on the other hand, performs that same function without ever transmitting data. The entire procedure is local.
“Currently we support Android, iOS, watchOS, Linux, Mac, Windows, [Raspberry Pi Zero], Beagle Bone, and web browsers,” Picovoice founder Alireza Kenarsari told Voicebot in an interview. “The RPi Zero and web browser are the most challenging platforms, actually. There is very little CPU power on [an] RPi Zero, and the model needs to be very small to work on a web browser, otherwise, it would adversely affect websites load time.”
Picovoice aims to do precisely the opposite, Kenarsari said. The engine, appropriately called Cheetah, is faster to turn speech into text without sending information to the cloud, using the computational resources available in most modern devices. It’s also considerably cheaper to translate speech into text on a commercial scale by skipping the cloud. That aspect has helped convince smart device makers with speech-to-text as a feature to add Picovoice’s software to their products.
“[Picovoice is] different from cloud-based speech technology providers,” Kenarsari said. “Our pricing is not usage-based per API call, and hence we can provide 10 times cost reduction to our licensees at scale. We already have dozens of licensees for our IP portfolio and are in fact profitable. We are about to prepare for a venture round in the near future to scale up.”
Kenarsari founded Picovoice in Vancouver, British Columbia at the end of 2017 after working as a software engineer at Amazon developing a fraud detection platform and other products. The company has previously built similarly lightweight wake word and speech to intent engines. To build Picovoice Cheetah, he and his team had to deal with the issue of accuracy, which has usually been the stumbling block for companies who wanted to keep voice-to-text services out of the cloud. The deep learning AI Picovoice developed is built to compete with the accuracy of the engines in the cloud, but with enough efficiency to not strain the resources of a smart device. It’s the same shift that other web applications are undergoing, Kenarsari explained.
Picovoice used the open-source LibriSpeech dataset to train its engine, with five hours of speech data from 40 different speakers. At the moment, only English is supported, though other languages may be added soon. According to Picovoice’s research, Cheetah is a little more accurate than cloud-based Google Speech-to-Text and nearly as accurate as Amazon Transcribe. And while it is slightly less accurate than Mozilla DeepSpeech, which also works offline, Picovoice Cheetah uses only a fraction of the computational resources Mozilla requires, Kenarsari said.
On the Edge
Audio technology on the edge, performing actions without using the cloud, are becoming a bigger aspect of the industry. That’s partly due to the potential for more efficient and cheaper services, but can also be attributed to the way privacy concerns over voice technology have ballooned. After a summer full of stories of every voice assistant sharing audio recordings with contractors, the idea of keeping what you say away from those companies may prove appealing to consumers.
Privacy and efficiency have fueled companies like Aspinity and Knowles to develop software that listens for voice assistant wake words without needing to send audio recordings to the cloud. Knowles created the first Amazon-approved Alexa headset development kit in part because of its efficiency. Aspinity’s system doesn’t even digitize the audio it checks for the keywords, keeping the process analog.