Cerence Introduces a Breathtaking New Synthetic Voice for Reading the News from Reuters
Cerence, the public company spin-out from Nurance focused on voice and AI technologies for the car, has introduced a new synthetic voice called Cerence Reader. It is based on neural text-to-speech (TTS) and designed to “read the news to drivers on the go” according to a press release issued this morning.
“Cerence Reader features long-form reading capabilities, including natural pausing and breathing, and automatic prediction of the appropriate reading style and emotional tone based on content, context, and category of news, including current events, sports, or documentary-style pieces. Cerence Reader can also be adopted for additional scenarios beyond news reading, such as audiobooks, language learning and more.”
What Voices We Like
The introduction of emotion and breathing sounds does make it sound more humanlike based on the demonstration audio posted to YouTube. Data from a study conducted by Voicebot, Voices.com, and Pulse Labs in 2019 found that consumers favored human voices over synthetic voices by a margin of 71%.
However, the data also revealed that some people did prefer the synthetic voices (they were not told which were humans and which were generated), and several inaccurately said some were human that were in fact synthetic. How do humans make this mistake? Well, human characteristics such as pronunciation, emotion, and breathing can help a synthetic voice seem more humanlike and fit with the natural preference for human voices. And, technology advances such as neural TTS are making synthetic voices more similar to human voices.
Opening Up Access to More Content for Listening
This preference for human voices becomes more intense when the passages of content are longer. Voicebot’s What Consumers Want in Voice App Design study found that for a lengthy passage 55.4% said it was “too long” or “far too long” to listen to for a synthetic voice while only 37.5% had a similar view for a human reader presenting identical content. These longer content formats are common for news and audiobooks which has often led publishers to use in-house staff or voice-over artists instead of synthetic speech generation. However, there is far more text content than audio and it is cost-prohibitive to record it all with humans in the loop. So, having a synthetic speech option that sounds more humanlike opens up audio access to far more content than is available today and it can be available in real-time.
Techniques for Human Acceptance
Voicebot hasn’t tested Cerence Reader so we cannot assess its performance over a wide range of content. We have also only heard the one female voice in the demonstration. However, it is interesting how the breathing and inflection makes it seem more humanlike because we don’t expect that from synthetically generated voices.
Google Duplex’s demonstration comes to mind as causing a similar feeling about something known to be synthetic, seeming very humanlike. In that case, the humanlike quality was created largely by the introduction of disfluencies such as “umm,” “ahh,” and “Mm-hmm.” As a result, many people receiving a call from Google Duplex didn’t know it was an AI assistant. Cerence Reader will likely have have a similar outcome.
However, the intent is not to mislead but rather to increase user acceptance and adoption of synthetically generated voices for long-form content. That can increase user satisfaction as drivers become more receptive to the synthetic voice and enable them to listen to a wider range of content than simply being restricted to recorded content alone.
Cerence Reader is launching with support for U.S. English and German and “will source news content from Reuters and others, with additional content partners to come,” according to the announcement.