10 Minutes On Text-to-Video and GPT-3 with Arnon Kahani of Hour One

Hour One announced its new Script Wizard feature this week, which employs GPT-3 to help users write or update their video scripts. The company’s Reals solution enables users to select a digital human presenter and background scene from a gallery, enter text and images and automatically generate a video. Script Wizard will write or rewrite the text for you.

Arnon Kahani is head of engineering at Hour One. He joined me for our 10 Minutes On video series to discuss Script Wizard and its key use cases. The video includes a short demo of the product, the motivation behind the new feature, and a wide ranging discussion about how synthetic media can be used to help people better express themselves. We also discuss how ChatGPT has changed expectations.

I think the Stable Diffusions and the GPT-3s of this world will be the infrastructure of the generative AI space. They are the tools that need to be used within the platforms. I can give you the raw tools, but you might not know how to use them. That’s our idea to abstract this and give you the best results that you can…We think people should be able to generate content with ease and better express their ideas…The narrative is what makes it compelling and more interesting. Until three months ago, it was really hard to create good narratives and good stories…What you saw in the video was us using GPT and large language models to enable this storytelling and bridge the gap between an idea and a compelling story and a compelling message. – Arnon Kahani of Hour One

Generative AI and Virtual Humans

A key thesis at Voicebot is that the various technology segments within synthetic media and generative AI have value on their own. However, they are more powerful when combined and provide new value. Below is an example I created using Hour One’s Reals virtual human video generator and the new Script Wizard feature. In this case, I started a short script about the company’s announcement and used the GPT-3-powered feature to rewrite and extend the ideas in the text. I did this for multiple scenes, which comprise the one-minute video.

This took me about the same amount of time as writing an article from scratch but included a full video which I can post here and make available through different channels. Using the large language model as a writing assistant, along with the Stable Diffusion image generator feature now in Reals, I could do this much more quickly than applying all of these tools independently. We cannot use GPT-3 for every task, but it is a nice addition to these types of presenter-led text-to-video solutions.

More About 10 Minutes On with Hour One

