Revolutionary breakthrough in AI video creation

ChatGPT, DALL-E and Midjourney are now on the agenda not only in the tech media but also in other traditional news outlets. But the next AI revolution is not over text and image output, but video output.
 Revolutionary breakthrough in AI video creation
READING NOW Revolutionary breakthrough in AI video creation
ChatGPT, DALL-E and Midjourney are now on the agenda not only in the tech media but also in other traditional news outlets. But the next AI revolution will take place on video output, not text and image output. VideoLDM, the new text-to-video artificial intelligence model introduced by Nvidia a while ago, seems to have opened the door to the revolution we are talking about.

Breakthrough from Nvidia

Just a few months ago, prolific text-to-video AIs were seen as just a joke, with the example of “Will Smith eating spaghetti.” However, Nvidia’s VideoLDM model is a tool that will make you forget the previous examples. Let’s also mention that Nvidia created this technology by collaborating with Cornell University researchers. In simple terms, this AI model can create videos with a resolution of up to 2048 x 1280 pixels, 24 frames per second, and up to 4.7 seconds based on text.

Nvidia uses 4.1 billion parameters in its developed model, but only 2.7 billion of them were used in video training. While you might think that’s a huge number, it’s a small number by today’s AI standards. Nvidia uses the trained Latent Diffusion (LDM) model to create video. This model perceives time as a monitored dimension and tries to predict what might change in each area of ​​an image over a given period of time. The tool creates a series of keyframes throughout the sequence, then uses another LDM to interpolate the frames between keyframes.

Of course, VideoLDM cannot produce quality videos that will fool anyone in its current form. However, compared to the examples we saw a month or two ago, the scale of development is huge. Currently, text-to-video AI is used to create GIFs, as Nvidia introduced. Therefore, we anticipate that it will not be long before Nvidia introduces more advanced technologies for creating video clips from longer text. The technology prepared by the company will be presented at the Machine Vision and Pattern Recognition Conference, which will be held in Vancouver between June 18-22.

Comments
Leave a Comment

Details
136 read
okunma55510
0 comments