The Cambrian Explosion of Audio Content – Part 2
In last week’s column, I wrote about how we’re in the midst of a Cambrian Explosion of audio content through the lens of the professional publisher’s financial incentive. The publisher’s motivation stems from the increasing number of devices entering the market that are built for audio consumption: smart speakers, connected cars, and hearables. Audio’s reach has never been bigger.
The Importance of Simplified Content Creation
As I mentioned last week, it was mobile computing that really gave way to the internet attention economy because it allowed for quick and easy access to each company’s content in exchange for our attention. The other change that mobile introduced with the attention economy was the democratization of content creation. Content creation tools became so simple to use (image filters, video editing options built into the apps, dissemination tools, etc.), that anyone with a smartphone could generate fairly high-quality content and immediately upload it to whichever sites they pleased. This was dramatically different than any medium in the past, as high fixed costs created barriers to entry for producing and distributing content. This typically limited content production to well-financed professionals and incumbent networks.
Today, if a company is aiming to capture and monetize attention, it can arm its users with cheap and easy-to-use tools and have them generate the content to capture fellow user’s attention. This democratization of production has been evidenced by offerings such as Instagram Stories, Facebook Live, and YouTubers racking up millions of views in their basement. They’re all free tools for people to create, publish and share their own unique content, which has eradicated traditional barriers to entry.
Audio Tools and Access
Spotify announced that it made two acquisitions in the podcasting space last week, Gimlet Media and Anchor. While the Gimlet acquisition signals Spotify’s move into publishing its own content, Anchor is a toolset that is equally interesting and provides insight into Spotify’s future. Anchor provides a full visual interface allowing podcasters to easily create and share their audio content seamlessly from the app. Anchor also powers more than 40% of the estimated 550,000 active podcasts that are currently available for download, so it’s one of the most highly used tools by podcast creators.
The other interesting aspect about Anchor is its monetization program called Anchor Sponsorships. Through this program, Anchor is able to connect brand advertising with smaller sized podcasts. The acquisition means Spotify can now act as an advertising platform similar to Facebook in that it can enable brands to hyper-target the type of podcasts that they want to feature their ads on to the 200 million Spotify listeners. It’s estimated that podcasting advertising spend will reach $700 million by 2020, so an ad-based tool within Spotify makes a lot of sense. This should also bring more advertisers into the fold, which ultimately allows for more creators to get paid.
The Rise of Specialty Tools for Audio Content Creation
While podcasting represents long-form content, micro-content, such as Flash Briefings are sure to be a part of this explosion in new content as well. The voice-centric company, Witlingo, has recently rolled out a tool called Castlingo which allows users to generate a Flash Briefing up to 77-seconds long just by clicking a record button.
It only took eight weeks for Bret Kinsella’s Castlingo-powered, daily Flash Briefing, “Voicebot Daily” and Alexa Skill and Google Action both named “Voicebot Says” to rack up 10,000 sessions, displaying the full utility of a tool like Castlingo. Voicebot has a lot of content and a long-form weekly podcast. However, Mr. Kinsella tells me he couldn’t spare the time to create a daily audio news summary until he had a tool that enabled the process to be completed in less than five minutes per day and could go to multiple channels simultaneously. This is an example of user-generated content that would not exist today without a tool that simplified creation and distribution.
Cheap and easy-to-use tools are critical to growing any type of content ecosystem. These types of tools tend to supercharge the ecosystem, as they enable more people to contribute and generate content, as well for allowing for easier ways for high-quality content creators to monetize their efforts.
Smarter Discovery is Required
There’s one last component needed to take advantage of this Cambrian Explosion of audio content and that’s curation. All of this content is only as good as our ability to parse through what’s out there and connect with what we’d each find interesting. Otherwise, we’re going to hit “peak audio.” As Steven Goldstein, CEO of Amplifi Media, mentioned at the Alexa Conference, podcasting may very well be on the same path as blogging a decade ago, where, “everyone had a blog, but over time, they didn’t.” Steven estimates that 75% of the 550,000 podcasts are no longer in production.
This is the same type of paradox that we’re facing more broadly with smart assistants. Yes, we can access 80,000 Alexa skills, but how many people use more than a handful? It’s not a matter of utility but discoverability; therefore, we need our smart assistant’s help. The answer to the problem would seem to lie in a personalized smart assistant having a contextual understanding of what we want. In the context of audio consumption, smart assistants would need to learn from our listening habits and behavior what it is that we like based on the context that it can infer from the various data available. These data points would include signals such as, the time (post-work hours; work hours), geo-location (airport; office), peer behavior (what our friends are listening to), past listening habits, and any other information our assistants can glean from our behavior.
The companies that monetized our attention over the last decade found ways to effectively curate our timelines (Twitter, Facebook and Instagram) and create compelling recommendation engines to keep us from devoting our attention elsewhere (YouTube). The same can be done with audio content, but the circumstances are different this time around as there’s often no visual aid to lean on. Therefore, the winners of the attention economy for our ears will likely be the ones who can best optimize the content for discovery by a smart assistant, machine learning, or some other intelligent curation tool that understands us well enough to deliver content we like.
Over the last decade, entirely new companies became enabled as smartphones became ubiquitous. Uber, Lyft, and Postmates could leverage our GPS to connect drivers to users. Snapchat and Instagram flourished because we could capture images and video and share them seamlessly with our smartphones. Venmo and the Cash app allowed for easy peer-to-peer payment. Lime and Bird use a combination of GPS and QR readers to unlock and lock scooters and bikes. The paradigm shift to mobile unlocked entirely new business opportunities by providing new building blocks and tools for companies to mix together to provide new services and offerings.
As the paradigm shifts again toward a more “Voice First” future, we have a new set of building blocks beginning to emerge that will enable new companies and offerings. Spotify’s Gimlet acquisition signals that there is a demand for high-quality audio-content, which should encourage more premium audio-based story-telling. The Anchor acquisition fosters growth on the supply-side of audio content, by centralizing the podcasting industry’s advertising to Spotify and its large user base, equating to better monetization for podcasters and more targeted advertising for brands. Smart assistants can be utilized to learn from our behaviors and better surface the type of content we’d enjoy. Finally, there are more ways to access audio than ever with smart speakers, connected cars and hearables. The supply, the demand, the ways the content can be curated, and the how we access the content have all evolved to the point where they can be mixed together to create an explosion in content built for this new voice era in computing.