CMU Game

How Livestreaming Video Games on Twitch is Building an Audio Database for AI Training

Researchers at Carnegie Mellon University are building an open-source database of sounds in people’s homes using a video game they designed to be played on the popular livestreaming platform Twitch. The Rolling Rhapsody game the team developed is played by volunteers who donate the sounds recorded from their homes through an app, building up a database that can be used to train artificial intelligence programs in the future.

Game On

Any audio-based smart device, especially those that might be used in the home, could benefit from being better able to identify and filter out irrelevant noises. Smart security devices and child-monitoring tech could benefit from refining their ability to distinguish regular house noises with those that require an alarm. That’s in addition to helping cure the plague of accidental awakenings that smart speaker owners deal with. A recent study found more than a thousand words and phrases that might alert a voice assistant unintentionally, for instance. Developers are continually rolling out partial solutions, including upping Alexa security requirements allowing Google Assistant users to adjust wake word sensitivity.

To gather the sounds that could alleviate some of those problems, the researchers designed Rolling Rhapsody. The game is not complicated. Players roll a ball around the screen, collecting treasure from a pirate base as they go, with each level offering a more complex map and obstacles to the treasures. Before playing, users agree to upload sounds through an iOS or Android app, although they have the chance to delete any recording that may have personal information on it. During the game, sounds from around the player are picked up and recorded for the database. The player and anyone watching the stream hears the sounds that were recorded whenever any treasure is collected. Players have to label the sounds manually at the moment, which leaves plenty of room for user error. Still, the database is growing thanks partly to how players get rewarded within the game for contributing a lot of audio or any unusual sounds. Every noise in the house, a door closing, someone sighing, a dishwasher starting its cycle, even the creak of a chair is relevant for comprehensive AI training, according to the researchers.

“This data could be used to create extremely useful technologies,” CMU Professor Jessica Hammer, part of the Polyphonic faculty, explained in a statement. “For example, if AI can detect a loud thud coming from my daughter’s room, it could wake me up. It can notify me if my dryer sounds different and I need to change the lint trap, or it can create an alert if it hears someone who can’t stop coughing.”

Listen Learning

The livestream sound project is part of Polyphonic, a collection of audio research projects at Carnegie Mellon, including sound labeling and a shared database of sound for researchers to use in their studies. CMU is also home to the researchers working with Apple on the Listen Learner project to train AI with minimal training. Listen Learner takes almost the opposite approach of the Rolling Rhapsody project and its goal of a vast database of sounds for training an AI. Listen Learner is designed around keeping microphones on all the time, with the AI questioning its user about what different sounds it hears mean as a way of labeling them for future reference. While theoretically a good method for quickly train a personalized voice assistant, Listen Learner also opens up a lot of privacy concerns as the microphones are always listening.

That’s problematic when privacy is central to so many people’s reluctance to use voice assistants. After a year when almost every voice assistant developer had to deal with accusations of privacy violation and subsequently change how they collect data, the appeal of an extensive database of sounds provided voluntarily to train an AI is likely to appeal to any developer looking to sell a voice assistant to the general public. It also sets up a model for other data collection research and development, which the CMU researchers acknowledge.

“We can use this as a proof of concept for a new kind of game experience that can result in ethical data collection from the home,” Hammer said. “We can collect data in a way that’s fun and feels good for everybody involved.”


Listen Learner from Apple and CMU Raises Entirely New Privacy Concerns for Voice Assistants

Google Assistant Begins Wake Word Sensitivity Control Roll Out

More than 1,000 Phrases Will Accidentally Awaken Alexa, Siri, and Google Assistant: Study