OpenAI Sora’s new text-to-video model

How does Nvidia scientist @DrJimFan evaluate Sora?

1/Sora is a data-driven physics engine that is a simulation of many worlds
2/Seemingly simple steps involve a lot of technology and simulation
3/In the future, it will replace all hand-designed graphic pipelines.

The following is the main text:

Sora is a data-driven physical engine. It is a simulation of many worlds, whether real or imaginary. The simulator uses some denoising and gradient mathematics to learn the basics of complex rendering,”intuitive” physics, long-term reasoning and semantics.

I wouldn’t be surprised if Sora used Unreal Engine 5 to train on large amounts of synthetic data. It has to be!

Let’s break down the video below. Tip: “Realistic close-up video of two pirate ships fighting each other as they sail within a cup of coffee.”

  • The simulator instantiates two beautiful 3D assets: pirate ships with different decorations. Sora must implicitly solve the text-to-3D problem in its potential space.
  • 3D objects always remain animated as they sail and avoid each other’s paths.
  • The fluid dynamics of coffee, even the foam that forms around the ship. Fluid simulation is a complete sub-field of computer graphics that has traditionally required very complex algorithms and equations.
  • Photo realism is almost like ray tracing rendering.
  • The simulator takes into account the small size of the cup compared to the ocean and uses axis shifting photography to create a “tiny” atmosphere.
  • The semantics of the scene do not exist in the real world, but the engine still implements the correct physical rules we expect.

X Original post:https://x.com/DrJimFan/status/1758210245799920123? s=20

Video:

Scroll to Top