Skyrim Fan Video Replaces Actors With AI-Cloned Voices
An avid fan of the video game Skyrim named Adriac has made a trailer voiced solely by synthetic voices created using machine learning. The original voices from the game speak his script thanks to an AI tool that can analyze voices well enough to imitate them saying different lines, albeit not quite as smoothly.
Adriac has made fan films before, but they all relied on stitching together existing audio. The Skyrim video’s dialogue was entirely original. After composing his script, Adriac put together his faux-voice cast by feeding real audio from the game into the xVASynth synthetic voice tool, designed specifically for Skyrim’s voices. Getting the audio into the tool was not a fast process, according to Adriace, and required a lot of rewriting because the tool could only handle up to five seconds of audio, anything more than that and it fell apart. The same tool is used for getting characters to say all kinds of things, but the effort put into the video is impressive considering the limitations of the software. And the final result is far from flawless. The audio uncanny valley of technology seems a lot deeper with Scandinavian accents, and the tool used is free for a reason. That said, the difference between the existing video and a mostly flawless imitation of the human voices is arguably just a matter of degree.
“I don’t think it will ever surpass a voice actor, but i think this could be incredible for the future to keep voice actors’ voices alive even hundreds of years after they’ve passed,” Adriac wrote in a Reddit post about his video. “I mean can you imagine some sci-fi school in the future hav[ing] an AI Laura Bailey to teach actors how they did things in the 21st century? Imagine if we had something like this now for William Shakespeare.”
The applications of synthetic speech generation are enormous. Voice acting is one of the potential uses. Voice cloning startups like Replica Studios and Resemble AI have their own versions of home tools for generating an artificial voice from recorded audio. Beyond homemade entertainment synthetic voice generation is worth a lot of money to companies large and small. Amazon used its advanced voice synthesis to give Alexa the voice of Samuel L. Jackson, and its Brand Voice feature is giving a unique voice to branded Alexa skills, like making KFC Canada’s Alexa skill speak like Colonel Sanders. Automotive AI developer Cerence even offers a way for people to make their car’s voice assistant sound like themselves. While Adriac used a tool designed for Skyrim, there are plenty of companies large and small offering custom voices for businesses and the general public. More seriously, concerns about deepfakes are percolating in the public consciousness as highly advanced versions of the technology can get anyone to say anything. Combined with video tech, it can make even Queen Elizabeth II do a table dance.