Samsung AI Can Create Talking Footage From A Single Photo

  • New AI model can fabricate talking avatars from a single image. 
  • Developers applied this model on popular paintings, including Leonardo Da Vinci and Mona Lisa. 
  • The outcomes have some visual glitches, but they are far impressive than previous techniques. 

Software for generating deepfakes (an artificial intelligence-based technique for human image synthesis) requires large sets of images to build a realistic forgery. Recent advances in neural networks have shown how highly realistic human images can be obtained, by training the network on a broad range of datasets.

However, developers at Samsung research center in Moscow have now developed a new artificial intelligence (AI) model that can create talking avatars from a single image. Although it’s possible to fabricate a video clip from one image, training it through several pictures results in better identify preservation and higher realism.

The talking heads generated by this model can handle various poses, including ones that go beyond the abilities of warping-based systems. You may find some visual glitches, but the outcomes are far impressive compared to previous techniques. The model leads to the creation of multimedia that will ultimately be hard-to-distinguish from the real video.

Challenges Involved

Fabricating realistic talking avatar sequences is difficult mainly because of two reasons –

  1. Human heads have high kinematic, geometric and photometric complexity. It’s necessary to accurately model hair, eyes, mouth cavity and many other elements.
  2. Acuteness of visual system towards tiny errors in the appearance modeling human heads.

To address these issues, the new AI model creates 3 neural networks during the learning process. It builds an embedded network which connects face landmark frames with vectors. Then it builds a generator network to map landmarks into the synthesized clips. In the final step, the discriminator network evaluates the pose and realism of frames.

Reference: arXiv:1905.08233 | YouTube

To better understand face landmarks and movements, researchers trained the networks on thousands of YouTube videos of humans talking. The outcomes (talking heads) were then compared with alternative neural networks via quantitative measurements.

Results

The team applied this model on images of many popular figures, such as Mona Lisa, Leonardo Da Vinci, and Albert Einstein. The AI was able to fabricate talking videos from a single image, bringing classic portraits to life. It only needs one photo to create videos. However, a model trained on 32 pictures can achieve a better personalization score and perfect realism.

talking footage from single photo

This type of AI can have several practical applications in telepresence, including multi-player games, video conferencing, as well as special effects industry.

Read: IBM Develops An AI That Detects Scene In A Video

On the downside, rapid development of such techniques could raise risks of misinformation, impersonations, fraud and election tampering.

Written by
Varun Kumar

Varun Kumar is a professional science and technology journalist and a big fan of AI, machines, and space exploration. He received a Master's degree in computer science from Indraprastha University. To find out about his latest projects, feel free to directly email him at [email protected] 

View all articles
Leave a reply