Nico Perony, Director of AI Research at Unity on Generative AI, Complex Systems, and More – Voicebot Podcast Ep 314
What we set out to do was to introduce empathy in the understanding that computers can have of human conversations. Back when we started OTO in 2017, we had speech recognition systems. You could talk to a computer, and the computer would use tokens, and that reflected what you said but never how you said it. So, when we are speaking now, you listen to me and you try to understand how I feel, what I am actually thinking, what I mean when I say a certain thing. If I say, ‘HEY BRET!’ or ‘Hey, Bret,’ these are literally two different things, yet their transcription in a normal speech recognition system is the same. These two tokens with no nuance to them. So, what we set out to do is build machine learning artificial intelligence systems that could decipher the true meaning of human conversations,” said Nico Perony.
Nico Perony is the director of AI research at the game development platform Unity. He was a co-founder and CTO of OTO, which was acquired by Unity in 2021. OTO was a pioneer in emotional intelligence for conversation data. He characterized the company as “Enabling emotional intelligence everywhere, so human and artificial intelligence can interact with awareness and empathy.”
Perony led the integration of OTO technology into the Unity platform and, more recently, has focused on new conversation AI features and generative AI tooling for game developers. These include tools for player safety and for streamlining game development. He was previously the founder of Slow Motion Projects and an engineer at Hyperloop Transportation Technologies. Perony has a PhD in complex systems and a Master’s degree in electrical engineering.
Peroni and I also discussed conversational understanding and large language models. Some of this covered the impact of hallucinations, positive emergent behaviors of the models, and what the systems are actually designed to do. One area where he succinctly broke down a common misperception is for search. According to Perony:
Fundamentally, LLMs are not made to spit out facts. They are trained to predict the next token. So, they are trained to predict statistical regularities in text at a very large scale that is so large we humans cannot comprehend … Language models are not trained to produce facts. They are trained to approximate some amount of reasoning about the world. It’s incredibly crude in terms of approximating intelligence. The fact that they can actually talk about stuff and state facts is a byproduct of the entire training scheme. So, if you want a language model to search–call it generative search–what you need to do it pair it with a structured knowledge base.
Nico Perony Interview – Show Notes Unity
- Follow Unity on Twitter and the web: @unitygames unity.com
- Follow Nico Perony on Twitter: @nicholasperony
Follow @bretkinsella Follow @voicebotai