When you think of voice assistants like Amazon’s Alexa and Apple’s Siri, the words “emotional” and “expressive” probably don’t come to mind. Instead, there’s that recognizably flat and polite voice, devoid of all affect — which is fine for an assistant, but isn’t going to work if you want to use synthetic voices in games, movies and other storytelling media.
That’s why a startup called Sonantic is trying to create AI that can convincingly cry and convey “deep human emotion.” The U.K.-based startup announced last month that it has raised €2.3 million in funding led by EQT Ventures, and today it’s releasing a video that shows off what its technology is capable of.
You can judge the results for yourself in the video below; Sonantic says all the voices were created by its technology. Personally, I’m not sure I’d say the performances were interchangeable with a talented human voice actor — but they’re certainly more impressive than anything synthetic that I’ve heard before.
Sonantic’s actual product is an audio editor that it’s already testing with game makers. The editor includes a variety of different voice models, and co-founder and CEO Zeena Qureshi said those models are based on and developed with actual voice actors, who then get to share in the profits.
“We delve into the details of voice, the nuances of breath,” Qureshi said. “That voice itself needs to tell a story.”
Co-founder and CTO John Flynn added that game studios are an obvious starting point, as they often need to record tens of thousands of lines of dialogue. This could allow them to iterate more quickly, he said, to alter voices for different in-game circumstances (like when a character is running and should sound like they’re out of breath) and to avoid voice strain when characters are supposed to do things like cry or shout.
At the same time, Flynn comes from the world of movie post-production, and he suggested that the technology applies to many industries beyond gaming. The goal isn’t to replace actors, but instead to explore new kinds of storytelling opportunities.
“Look how much CGI technology has supported live-action films,” he said. “It’s not an either-or. A new technology allows you to tell new stories in a fantastic way.”
Sonantic also put me in touch with Arabella Day, one of the actors who helped develop the initial voice models. Day remembered spending hours recording different lines, then finally getting a phone call from Flynn, who proceeded to play her a synthesized version of her own voice.
“I said to him, ‘Is that me? Did I record that?’ ” she recalled.
She described the work with Sonantic as “a real partnership,” one in which she provides new recordings and feedback to continually improve the model (apparently her latest work involves American accents). She said the company wanted her to be comfortable with how her voice might be used, even asking her if there were any companies she wanted to blacklist.
“As an actor, I’m not at all thinking that the future of acting is AI,” Day said. “I’m hoping this is one component of what I’m doing, an extra possible edge that I have.”
At the same time, she said that there are “legitimate” concerns in many fields about AI replacing human workers.
“If it’s going to be the future of entertainment, I want to be a part of it,” she said. “But I want to be a part of it and work with it.”
Powered by WPeMatico