Why it Matters That Google and Spotify Have Joined the Personalized Audio Content Revolution – Guest Column

on January 26, 2020 at 6:00 pm

Google and Spotify have signaled their intention to take over a new market: personalized audio for spoken word content. Google is focused today on news and has taken some steps to expand into podcasting to assert more control over another form of media discovery. Spotify is focused on podcasting to help drive discovery and expand its users’ time spent listening beyond music.

Spoken word audio is a huge market that is about to get much bigger. “Seventy-three percent of the U.S. population listened to spoken word audio,” in the past month according to Edison Research. U.S. adults consume more than 250 million hours of spoken word audio (i.e. talk audio) each day through podcasting, radio, online, and streaming services. Nearly 3-in-5 of those talk audio listeners are characterized as “digital-first” that favor digital channels over analog sources like radio. The share of talk audio as a percentage of all audio has risen 20% over the past five years while music has fallen by 5%. Google and Spotify have this growth market in their sights.

Source: Edison Research / NPR 2019

However, audio has resisted a digital takeover by the big content aggregators because it is harder than text to analyze and organize. Advances in artificial intelligence (AI) are now opening a crack for audio analysis, indexing, and personalized distribution at scale as we have demonstrated at Audioburst. That personalization will not only help grow the audio market, it presents entirely new opportunities for content packaging and monetization.

Personalization Proven to Work for Consumers

Content personalization has been a popular topic for 20 years. There are many examples of personalization delivering benefits for advertising and marketing but only three models have succeeded that are truly about the consumer experience: social media, streaming video, and streaming music.

For better or worse, social media platforms have figured out how to keep consumers engaged for hours with streams of content based on their personal interests and behaviors. Text, images, and video, both highly produced and user-generated, appear in an endless river of content. The success of Facebook, Snap, Instagram, Pinterest, and others demonstrate the power of personalized content.

Like the social media giants, Netflix has succeeded in capturing user attention for hours-at-a-time; sometimes for entire weekends. This phenomenon is often attributed to binge-worthy serialized programs, but Netflix’s personalization engine is the real story. Data presented in June 2019 revealed that 80% of what is watched on Netflix is driven by recommendations from the personalization engine. YouTube is another exemplar of personalization for video. Neal Mohan, YouTube’s chief product officer, said in 2018 that over 70% of user watch time is driven by AI-based recommendations.

In the audio world, Pandora is widely credited with the most important early work on music personalization based on its Music Genome Project. Indexing music content and then delivering a series of songs aligned with user preferences has become so commonplace that it is now simply accepted as a standard feature across all popular music streaming services.

However, talk audio has long remained tied to the undifferentiated broadcast model of radio and podcasts. Users have been left to rely on their own proactive curation efforts or simply accept what a particular program manager decides is best. That is finally changing.

Personalized Spoken Word or Talk Audio

Audioburst set out four years ago to index audio content but took on a more ambitious goal than Pandora. We were not going to simply index a finite music catalog of tens of millions of songs. Instead, we decided to tackle a much larger catalog of continuously-generated talk audio in real-time; a catalog that grows by over seven million minutes each month.

The sheer volume of content makes this a challenging technical problem for anyone providing a personalized audio content feed. However, it also is an insurmountable problem for consumers to find what they want in a talk audio format when they want it and in a digestible form. The result is a great deal of talk audio that would benefit consumers is never accessed because it is essentially inaccessible.

Google and Spotify have both recently set out to address this problem for users with their own take on personalized talk audio. Google announced Your News Update in November 2019 for the Assistant which indexes prepackaged news content from 47 audio publishers.

Based on what Google knows about the user from past use of search and other services, it will serve up 4-6 news segments of 30 seconds to about two minutes in duration, followed by more in-depth pieces with some exceeding 10 minutes in duration. Google relies on its partners to have timely news and broad coverage but is limited to what publishers provide. The service is ephemeral with news for today only available today.

Spotify is taking a different approach. More similar to the Music Genome Project, Spotify recently launched Your Daily Podcasts. After users have listened to four podcasts, Spotify will begin recommending the next best episodes to listen to based on their history and podcast type. This will deliver personalized recommendations from a finite catalog of complete podcast episodes.

Google Assistant’s Your News Update is a prepackaged feed of ephemeral news. Spotify’s Your Daily Podcasts is also prepackaged with a mix of news and evergreen content. Both companies have tremendous consumer reach and will introduce many people to personalized talk audio for the first time. They are likely to establish a new set of consumer expectations and bind their services into another set of daily habits.

Audioburst takes this a step further and offers excerpts (i.e. Bursts) of talk audio that are specific to user interests. These Bursts trim away the off-topic and extraneous elements to deliver just the core information whereas Google and Spotify only deliver full-length segments or episodes. The Bursts are extracted automatically from segments based on Audioburst’s AI processing. Users simply listen to the short excerpts in a news feed sequence aligned with their personal interests. If listeners are intrigued by a Burst (i.e. the excerpt) they can also jump into the full talk audio segment to hear more. This enables easier discovery and more efficient consumption of a personalized talk audio feed.

How Will Amazon, Apple, Facebook, and Others Respond?

Amazon provides a daily news styled feature called Flash Briefings on Alexa. This is a user-curated list of publishers that plays the latest content in a predetermined sequence. It is not personalized based on behavior or interests nor does it account for top trending news items. The user simply receives what the selected publisher puts out that day and it is only updated once-per-day in most situations.

Flash Briefings are a popular feature according to Amazon but the format seems a bit quaint compared to Google’s new Your News Update. The question for Amazon is how Alexa will compete with Google Assistant in terms of timely and personalized news updates to augment Flash Briefings.

Apple faces a similar challenge. There is a personalized news feed from Apple, but that is only for text and video content on iPhones. Siri cannot deliver a stream of personalized talk audio content today. Nor does the Apple Podcasts app provide any meaningful personalization. The lack of personalization and quality discovery features place Apple Podcast’s dominant position with listeners at risk.

Facebook is even further behind than Apple in talk audio. When the new Facebook assistant does come to market, will it only have content similar to its current feed of text, images, and video? Facebook’s business is built on attention. It has largely reached the limits of the attention it can capture from visual focus by the user. However, it could expand user attention on Facebook, Instagram, and WhatsApp by providing automated audio content personalization.

The Rise of Talk Audio as a Consumer Expectation

Of course, as audio content discovery and personalization become consumer expectations, this feature gap will also become problematic for other companies. Samsung has ambitions for Bixby to move from a productivity assistant to a media content guide while automakers want to offer great in-car experiences that are not beholden to more services from the tech giants.

Personalized talk audio is a growth market that has wider implications for the jockeying among tech giants now that Google and Spotify are actively involved. Many companies will soon face a dilemma about how to best meet changing consumer expectations about talk audio personalization without ceding the experience and data once again to Google or providing a vector for Spotify to expand its growing consumer influence.

Amir Hirsh is co-founder and CEO of Audioburst. He is the former head of R&D for Riptech (acquired by Symantec) and President of Collactive (acquired by IAI).

Follow @voicebotai

Google Assistant Adds Personalized News Based on Preferences and Geography, Here’s What to Expect

New Alexa Skill Data Show New U.S. Skills Launched in 2019 Fall to Lowest Level Since 2016

Google Assistant Actions Grew Quickly in Several Languages in 2019, Matched Alexa Growth in English

Why it Matters That Google and Spotify Have Joined the Personalized Audio Content Revolution – Guest Column

Personalization Proven to Work for Consumers

Personalized Spoken Word or Talk Audio

How Will Amazon, Apple, Facebook, and Others Respond?

The Rise of Talk Audio as a Consumer Expectation

Subscribe to Voicebot Weekly

Latest Posts

McDonald’s Abandons Drive Through AI for Order Taking

Apple Debuts ‘Apple Intelligence’ Generative AI Features Across All Devices

Stability AI Shares Open-Source Generative AI Audio Model for Creative Sound Design

Fable Studio Launches Generative AI TV Show Production Platform for Custom Streaming Content