OpenAI Sora's new text-to-video model

How does Nvidia scientist @DrJimFan evaluate Sora?

1/Sora is a data-driven physics engine that is a simulation of many worlds
2/Seemingly simple steps involve a lot of technology and simulation
3/In the future, it will replace all hand-designed graphic pipelines.

The following is the main text:

Sora is a data-driven physical engine. It is a simulation of many worlds, whether real or imaginary. The simulator uses some denoising and gradient mathematics to learn the basics of complex rendering,”intuitive” physics, long-term reasoning and semantics.

I wouldn’t be surprised if Sora used Unreal Engine 5 to train on large amounts of synthetic data. It has to be!

Let’s break down the video below. Tip: “Realistic close-up video of two pirate ships fighting each other as they sail within a cup of coffee.”

The simulator instantiates two beautiful 3D assets: pirate ships with different decorations. Sora must implicitly solve the text-to-3D problem in its potential space.
3D objects always remain animated as they sail and avoid each other’s paths.
The fluid dynamics of coffee, even the foam that forms around the ship. Fluid simulation is a complete sub-field of computer graphics that has traditionally required very complex algorithms and equations.
Photo realism is almost like ray tracing rendering.
The simulator takes into account the small size of the cup compared to the ocean and uses axis shifting photography to create a “tiny” atmosphere.
The semantics of the scene do not exist in the real world, but the engine still implements the correct physical rules we expect.

X Original post:https://x.com/DrJimFan/status/1758210245799920123? s=20

Video: