The Voice Experience with Spotify, Pandora and Amazon Music
At Vocalize.ai we predict that a company branded voice user interface (VUI) will soon be a must-have feature for almost all mobile phone apps. We already see music streaming services leading the path forward for the multimodal app evolution of voice. Amazon brought voice control to the Amazon Music app back in 2017 and added the Alexa wake word in 2018. Also, in 2018 Spotify enabled Spotify Voice. More recently in 2019, Pandora introduced Voice Mode. All of this makes a lot of sense because reports have shown that owners of a smart speaker listen more and listen longer. No Alexa, I don’t need you to ask Spotify to skip this song. I will speak to Spotify about it directly!
Vocalize.ai set out to compare the voice user experience (VUX) across some of the most popular music streaming apps. Adding voice to any app is definitely a tricky task. This is especially true in a category as broad and subjective as music; which typically includes a database with millions of songs, albums, genres, moods, etc.
Comparing Voice Performance on Four Leading iOS Music Streaming Apps
The goal is to compare the voice user experience across several popular music apps. We will use an iPhone XR running the latest version of iOS. The iPhone default/native music app is Apple Music with voice control by Siri. The third-party apps being evaluated are Amazon Music, Spotify, and Pandora. Each app supports voice mode and each app has been signed up for a premium/unlimited account. It is important to note that Pandora is in Beta mode and not yet readily available to the general public.
Evaluating Navigation, Playback, and Volume Control Capabilities
Vocalize.ai created a music-specific dataset with over 900 utterances. For this initial evaluation, we chose a subset of 230 utterances which have been curated to focus on navigation, playback control, and volume control. Examples of each are provided below.
- Music Navigation: 136 utterances across genre, mood, artist, album, track
- Play Imagine
- Play music for working out
- Shuffle songs by Drake
- Playback Control: 44 utterances for starting, stopping and adjusting playback
- Skip this song
- Pause/Resume the music
- Volume Control: 49 utterances focused on adjusting the volume
- Raise the volume
- Turn it up
Apple’s iOS Advantage Sets the Benchmark
Of course, Apple has home field advantage on an iPhone. Accordingly, Apple Music scored highest and nearly perfect across all categories. This is to be expected given their tight integration of hardware and software, as well as a long history with music (iPod, iTunes, Beats, etc.). It is also safe to assume that Apple engineers have access to parts of iOS that are restricted from third parties. In any event, this serves as a useful voice user experience benchmark for the third-party apps. The table below shows that the voice bar is set high by Siri.
|Apple Music with Siri||Score|
Apple Benchmark Scores
Spotify Lags Peers, Pandora Marks a Strong Debut
For third party apps Pandora was the most accurate, but still has some room for improvement to catch up with Apple’s Siri performance. Pandora delivered passing results on 76% of the utterances. Amazon came in second at 66% and Spotify was last at 56%. It may seem counterintuitive that the newest entrant to voice, Pandora, outperformed Amazon and Spotify, but digging a bit deeper reveals how this happened.
The above graph reveals an overall spread of 36% from the first (Apple) and worst (Spotify). This indicates that early users can expect great variability and potential frustration when using voice commands with their music streaming service.
Turn up the Volume, Please
When assessed by category we see a clearer picture and trends are revealed. The most eye-popping being that neither Amazon Music nor Spotify support volume control with voice commands. Initially, we thought this may be a limitation of iOS, but then Pandora demonstrated that it is possible for a third party to control the volume. This is also why Pandora skewed much higher in the overall accuracy.
Another trend revealed is that Spotify does not outscore the competition in any of the three categories. Amazon and Pandora score higher than Spotify in navigation and playback. Pandora again beats Spotify with volume control support.
Building a Frictionless Music Experience with Voice
First and foremost, Amazon and Spotify need to implement volume control ASAP. It is a huge miss for music and voice experience. “Can’t control the volume” is no longer an acceptable answer. Using voice should be as simple as, help me find what I want, play what I want…turn it up!
navigation + playback + volume = frictionless listening experience
Second, the most common failure mode was selecting a specific song, when the user had requested a genre or playlist. This failure was observed across all apps. For example, the request “Play party mood music” causes Amazon Music to start the obscure track, “Party Mood” by Jaye Hammer. A more relevant response would be selecting an existing station or playlist. For example, the “Forever Fun” playlist or the “Dance Favorites” station. I completely understand that this is subjective and musical tastes vary, but I am pretty sure no one wants to hear Jaye Hammer at their party (sorry Jaye). Plus, the request was “Party Mood music” and not “Party Mood song.” The natural language understanding (NLU) should be able to make the distinction. Sorry to repeat myself, but this happened time and time again across all apps.
Thirdly, Siri never seems to get good press when compared to other virtual assistants like Alexa and Google Assistant. It always seems like Siri scores lower on answering questions, and providing information. However, when it comes to music, this is where Siri really shines. This may also help explain why the HomePod has been marketed as a ‘music first’ smart speaker. When a company is considering adding voice to their music streaming service, Siri performance is a good place to start.
About the Author: Joe Murphy is founder and CEO of Vocalize.ai. The company was created to help provide an excellent user experience with voice and specializes in evaluating voice interfaces by answering two basic questions: 1) Can you hear me? 2) Can you understand me? You can reach Joe to learn more about competitive benchmarking and how to improve voice experiences at firstname.lastname@example.org.