Custom AI-based Voices Offer a Distinctive Sound for People and Brands and Are Growing Fast

on June 23, 2019 at 6:09 pm

Darwinian forces of natural selection favor uniqueness – from birds to people to corporations. Enterprises capture this uniqueness in their brand. A distinctive logo, a preferred font, a color palate, and now more than ever, a vocal identity. With over 2.5 billion voice-enabled devices globally and 8 billion projected globally by 2023, the demand for unique voice is skyrocketing. Corporations want to be heard and listeners want variety.

Today enterprises often hire a voice-over artist ($4.4 billion global market) to express their brand. Voice talents, however, are ephemeral – their voices change with age and health, they are not always available, and they can defect to a competitor (e.g. Sprint’s poaching of Paul Marcarelli from Verizon). When the volume and velocity of content updates make voice talent cost prohibitive, companies have relied on generic text-to-speech (TTS; $3 billion global market by 2022) at the expense of brand differentiation. Moreover, relying on the voice infrastructure of large tech companies like Amazon, Google, Facebook, Apple, Microsoft and the like means that they now know your business. In the voice-first era, companies that compromise on unique voice, simply will not be heard over their competition, and that’s not good for business.

Custom voices offer the flexibility of synthetic speech and the personalization of human talent. A custom vocal persona allows companies to efficiently deliver high volume, rapidly changing, and contextually-relevant content without compromising brand consistency. They also open up new markets for talent and influencers and provide creative tools for marketing and content delivery. With these exciting opportunities, it’s no surprise that the voice customization market is already $456 million and expected to reach $1.7 billion by 2023 at a compound annual growth rate (CAGR) of 30.7%.

From individuals to brands

Unlike conventional methods of synthetic voice creation, VocaliD’s award-winning technology generates high-quality, natural sounding voices within hours, not months. We leverage cutting-edge machine learning techniques, proprietary voice blending algorithms, and our crowdsourced Voicebank dataset to enable brands and individuals to be heard in a voice that is uniquely theirs.

Based on the discovery that a single vowel contains enough “vocal DNA” to seed the voice personalization process, the company was founded with a strong social mission to create the first-ever custom voices for those living with speechlessness. People that cannot produce audible speech typically rely on a common set of synthetic voices to communicate that all sound identical. VocaliD set out to change that.

Our algorithms combine recipient vocalizations with recordings of matched speaker(s) from the Voicebank to generate unique text-to-speech (TTS) voices. This breakthrough has been life-changing for hundreds of individuals living with speechlessness. Learnings from this mission-focused market have been instrumental in building a scalable, efficient and robust technological infrastructure. The Voicebank dataset is leveraged to build universal base models for speaker adaptation making it possible to generate high-quality voices using minimal data and in a time and resource efficient manner.

A design thinking approach

Rather than focusing solely on the technical challenges associated with creating a custom voice, we have learned the importance of understanding our customer’s perception and desired goal. In the initial design phase, we map out the audience, the desired emotions and impressions it should evoke and the success metrics. Until now, the primary function of synthetic voices has been to transmit information. However, voices convey demographic information such as gender, age, language(s) spoken, and are associated with numerous social and cultural indices. We are now seeing a proliferation of research and commercial activity aimed at mining voice as a biomarker of health and well-being. Voice is a social connection instrument and learning to use it effectively can impact adoption; inspire delight, motivate and open up new channels of interaction in the voice-first era.

Communication is more than just the words – engaging through voice can foster loyalty and deepen trust. Studies such as Accenture’s 2018 report: The Bottom Line on Trust show that an increase or decrease in consumer trust can significantly impact company revenue, by as much as 21% in some industries. When you consider the financial impact, relatability and trustworthiness should be major factors in deciding upon and designing the voice of a brand.

Transformations are opportunities not existential threats

One of the biggest misconceptions about AI is that it will displace workers. The truth is that AI is just the next wave in our march of advancement. While synthetic voice may appear a threat to voice talent, it is, in fact, a tool to enhance their capabilities. Once a custom voice is built, it can fulfill the mundane tasks and return royalties without additional work. This leaves more time to focus on the expressive, meaningful and creative roles that cannot be relegated to machines.

There are seismic transformations that are going to come from voice AI – from the democratization of the voices we hear in the world around us, to sparking social and creative art forms, to creating new jobs focused on writing for voice, to launching products that unite us rather than divide us by celebrating our diversity. Today’s advances in Machine learning, AI, and crowdsourced data coupled with a design thinking approach allow us to imagine a future of voice tech that isn’t merely replicating humans but creating novel sound experiences that do not exist in our universe today. What if we can create truly expressive voices that delight, engage and allow us to enhance our current communication capabilities? And if these new vocal personas can be distinguished as AI, so that our virtual reality doesn’t blur our lived reality but actually augments it – we believe that’s a moonshot worth pursuing.

Rupal Patel is CEO and Founder of VocaliD.

Follow @voicebotai

Voices.com to Offer Custom, Synthetic Voices Through Partnership with VocaliD

Mastercard to Launch “Priceless Experiences” Voice App, Exclusive for Cardholders

Custom AI-based Voices Offer a Distinctive Sound for People and Brands and Are Growing Fast

From individuals to brands

A design thinking approach

Transformations are opportunities not existential threats

Subscribe to Voicebot Weekly

Latest Posts

McDonald’s Abandons Drive Through AI for Order Taking

Apple Debuts ‘Apple Intelligence’ Generative AI Features Across All Devices

Stability AI Shares Open-Source Generative AI Audio Model for Creative Sound Design

Fable Studio Launches Generative AI TV Show Production Platform for Custom Streaming Content