Prompt: Hollywood loses its minds over an advanced new generative AI model called Sora that can create lifelike, movie-trailer quality videos from a few short lines of text in minutes.
That scenario unfolded last week when OpenAI, the San Francisco-based tech company behind the text-generating app ChatGPT and the image-generating tool DALL-E, teased its latest project, text-to-video AI model Sora. (The name is a Japanese word meaning sky that the creators chose because it “evokes the idea of limitless creative potential.” Or maybe they’re “Kingdom Hearts” fans).
After seeing what Sora could do, Tyler Perry was the biggest name to sound the alarm. He told THR he put an $800 million planned expansion of his Atlanta studio space on hold. “Jobs are going to be lost,” he said.
The Sora videos are striking. Woolly mammoths march toward you in cascading snow. People walk through a snowy, bustling Tokyo street as the camera swoops over the buildings. “A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.” That last one is a specific prompt written by OpenAI to create this impressive, 20-second video clip.
Filmmakers and developers within the AI community see Sora as a huge leap forward and significant step for generative AI, a tool already capable of things once viewed as distant possibilities.
“It was 18 months ahead of where I thought we were. I was totally mind-blown,” said Edward Saatchi, an AI producer with Fable Studios. For him, the hype and excitement of seeing people create brief clips or images in generative AI was “dying down” and people were thinking “more realistically about how soon we’d see an AI movie in the cinemas.” Sora, on the other hand, feels like a game changer.
“It was getting a bit much. ‘Check out the latest 30 clips!’,” he said. “And they were okay, but they weren’t at that level of polish.”
While there were rumblings of OpenAI working on a video tool, the launch came as a shock. Even people on other teams at OpenAI didn’t know it was coming. We’ve only seen what OpenAI created with Sora, but the tech company said the tool is in the hands of some visual artists, designers, and filmmakers as well as “red teamers” who look for ways it can be misused. The public has yet to see those results or try the tool in the wild.
Filmmaker Paul Trillo, known for his acclaimed AI short “Thank You For Not Answering” and who consults with various AI companies on the development of their tools, says he’s impressed by the video quality and its capabilities. But until it becomes an open-sourced app that gives creators full customization and control, he’s unsure whether it’s capable of disrupting the industry or is simply a “great tech company product demo.”
“There’s a long way to go from isolated clips to making a tool that works in the form of a story that doesn’t take the audience out of it when they’re watching them,” he said. “I think it’s going to be amazing for people that are still getting into filmmaking and want to play around and test their ideas but they don’t have a lot of resources. But I am a little skeptical from a professional standpoint because it’s all about control and how much our true intention and vision can be executed.”
Sora is a step up from competitors’ models at startup Runway or tech giants like Meta and Google. Sora’s higher resolutions mask the pixel-y aesthetic of many generative videos, improving details like skin texture, hair, reflections, water, leaves, and more. Sora also allows for videos up to 60 seconds; previously, the limit was 3-8 seconds.
Saatchi said it’s the biggest sign yet that AI movies will go beyond two-minute shorts to approximate a short film or TV episode.
“We were at the limit of the kinds of stories that you could tell with 3-8 second shots,” Saatchi said. “We were in a rut as a community. This opens up the ability to tell much more complex stories.”
Sora also has a strong understanding of how things move in the world. Other generative AI video tools allow prompts to add directorial movements or instructions that simulate camera movements. But Saatchi said Sora has distinct background characters, realistic movements, and subjects capable of interacting and reacting. Videos released by OpenAI include waves crashing against cliffs, baby animals playing, or reflections in the window of a moving train.
Trillo also said he’s blown away by what he called Sora’s “temporal coherence.” AI video doesn’t comprehend what happens in a shot from beginning to end; from a single generated frame, it extrapolates (or, guesses) motion. Glitchy sequences result, as do “Gumby legs.” Yes, there’s a Sora video in which a woman’s legs swap places mid-stride, but when it comes to walking other models walked so Sora could run.
Runway got closer to temporal coherence, but Trillo said it’s more of an “illusion.” Rather than a typical text-to-video generator, OpenAI is calling Sora a “world model” that works on space-time. If generative video “is ever going to be taken seriously, it needs to have this level of coherence and control,” he said. “[Sora] doesn’t feel like it’s guessing. It feels like there’s a determined path.”
Another thing Trillo described as a major breakthrough (and a little “unsettling”) is Sora’s ability to break down a prompt into time. In this video of a woodland creature hopping through the forest, the clip ends with the creature coming upon a mushroom with fairies dancing on top. Sora understands the sequence of events of a complicated prompt in which multiple things are supposed to happen, making it “a step closer to being a usable storytelling tool.”
“It just did not do that before,” he said. “It approximates what it thinks you’re asking of it.”
Other Sora assets include seamless video looping that stems from its ability to understand motion and “sampling flexibility” that allows you to view the same prompt from alternate perspectives, framings, or different aspect ratios.
There’s also video-to-video editing that allows the user to connect videos. OpenAI offered a demonstration in which it showed a drone flying through the Colosseum and a butterfly floating through a coral reef, then merged the two videos seamlessly.
While most people stared at the subjects of Sora’s videos, Trillo was transfixed by their backgrounds. AI often has a problem with “occluding,” in which a foreground object passes a background object that changes or disappears. OpenAI said Sora still has some imperfections in this regard, but Trillo noticed Sora videos in which a person passed in front of text on a wall and the text remained consistent. He said that indicates Sora isn’t just a diffusion-based model but a hybrid of more traditional 3-D animation environments and special effects.
So should Hollywood be more scared today about being replaced by a machine than they were a few weeks ago, let alone six months ago?
“This is the first I’ve felt the ground was a little uneven or ground was starting to give, in the same way illustrators felt a few years ago,” Trillo said. “It is unsettling, but it’s hard not to be excited about it at the same time.”
Sora still has obvious shortcomings. For one, there’s no dialogue. A person’s mouth, Saatchi said, is something AI still can’t get right; making that happen will be key. And while Sora can create one incredible 60-second shot, that doesn’t translate to creating a coherent film.
“It looks great in a blog post, but we’ll see how it works if you want to do 10 shots of the same person in the same location,” Trillo said.
Sora also looks a little too perfect; Trillo says it may lack the unpredictable, hallucinatory or imaginative quality of other AI tools. And since OpenAI is extremely concerned about the tool’s misuse, there are strict parameters that prevent applications around sex and violence. (Filmmakers who tried telling AI that it’s ketchup, not blood, have been disappointed.)
“We got a new Hayes Code,” Saatchi said. “Maybe you can make a very dramatic, theatrical movie, but that’s the worst thing for AI.”
Any AI tool will also be only as good as its interface. If Sora is limited in its customization, or if the functionality is clunky, it won’t be adopted by filmmakers or at-home creators. Still, Trillo said these are “temporary hurdles,” and it could be the Sora copycat that reaches wide-scale adoption.
“Maybe in two years from now there’s an open-source model that has a lot of control and gives filmmakers the level of detail that they need,” Trillo said. “The easier, faster tool always wins.”
Even if Hollywood wanted to use generative AI today, content created by AI cannot be copyrighted. Edward Klaris, an attorney and managing partner with Klaris Law, said studios have to worry that anything they create can be protected and aren’t viewed by the copyright office as machine generated.
“Studios will have to be very careful not to integrate generative AI into their process,” he said. “They’re basically producing public-domain works, so there’s a real risk of incorporating generative AI into the workflow.”
While cinema may not be ripe for disruption, marketing certainly could be; Sora’s 60-second clips are perfect for ads. Trillo said the stock footage industry should also be worried. Shutterstock recently formed a partnership with OpenAI and much of Sora’s model is likely trained on its library. Trillo imagines a near future in which Shutterstock would allow a service to create AI-generated video in place of using existing stock footage.
Trillo believes that while Sora might allow some people to fake their way into the industry, the artists who will succeed are those with a traditional skillset and vision. “My overly optimistic view is people will still get paid the same amount, but won’t have to kill themselves to do it,” he said.
Saatchi, who was part of the research team that developed the AI tool capable of self-generating episodes of “South Park,” believes that we’re inching closer to a world of automated showrunners. Content generated without real input from people could easily compete for eyeballs with film and TV.
“Is cinema a collaborative medium? Fully automated content would lose that,” Saatchi said.
Still, he offered a caveat: A year ago, AI advocates were ready to declare that everything had changed. It’s hard to a imagine a role that won’t be affected by AI, but so far “nothing changed.”
“Every single three years, Silicon Valley tells Hollywood that they’re gonna totally disaggregate them and change everything and they’re finished, and Hollywood survives and thrives,” Saatchi said. “I don’t want people to worry too much without the perspective of the Valley is always trying to be rude to Hollywood.”