- Nvidia develops a new deep learning-based technique that converts a standard video to superior-quality slow-motion video.
- It’s a variable-length multi-frame interpolation technique for generating any number of intermediate frames.
In video editing, one of the most difficult effects is slow motion. It requires software to generate hundreds of new in-between frames that are both spatially and temporally coherent. For example, to generate 240fps (frames per second) videos from standard 30fps sequences, software needs to interpolate 7 intermediate frames between 2 consecutive frames.
The existing state-of-the-art methods output stuttered and unconvincing slow motion videos. To create high-quality interpolation results, a software needs to understand occlusion and motion between two pictures.
Nvidia has built a new deep learning-based technique that converts a standard video to superior-quality slow-motion video. It can be used to create smooth view transitions, and also has some interesting applications in self-supervised learning.
Limitation of Existing Methods
The techniques we use today for applying slow motion effects mainly work on single-frame video interpolation. You can’t use them to create arbitrary higher frame-rate clips, as they’ve two big drawbacks-
A. They are slow because they use recursive single-frame interpolation, in which some frames can’t be generated until other frames are completely processed.
B. They can only produce 2n– 1 number of intermediate frames (like 1, 3, 7, 15). You can’t simply use these techniques to create 1008fps clip from 24fps, which requires 41 in-between frames.
New Interpolation Method
The new variable-length multi-frame interpolation technique is capable of generating any number of intermediate frames. It’s developed by combining two Convolutional Neural Networks (CNN)-
- Flow Computation CNN: To estimate bi-directional optical flow between two pictures.
- Flow Interpolation CNN: To refine flow approximation and predict soft visibility maps.
Both CNNs are independent of the particular time step being interpolated, and thus the system can create as many in-between frames as required in parallel.
Training and Testing
They used Nvidia Tesla V100 GPUs and PyTorch deep learning framework powered by Nvidia CUDA deep neural network library, to train their system on more than 11,000 videos shot at 240fps.
Once trained, the CNN computed the extra frames more accurately than existing methods. Developers used several independent datasets to validate the accuracy of their system. The resulting clips appear less blurry and more fluid.
To demonstrate the AI, they took some videos from a slow-motion based YouTube channel, The Slow Mo Guys, and made their videos even slower.
The AI can be used to capture some of your precious moments and slow them down to make them even more special, giving a cinematic look and adding emphasis or suspense.
Although it’s possible to record 240fps videos with smartphones, recording everything at high frame rates isn’t always practical. As you increase the frame rate, the resolution decreases because of the large bandwidth of data being created on the fly.
The AI offers a cheaper solution to expensive high-speed digital cameras. However, the results are not as fast as with an advanced camera. Even with Nvidia’s high end GPUs, the algorithm requires some time to process the video after it’s recorded.
Since the processing power of handheld devices is increasing at an impressive rate, soon you will be able to use this AI to fake slow motion 8K videos with a single tap.