Samsung AI Can Create Talking Footage From A Single Photo

  • A New AI model can fabricate talking avatars from a single image. 
  • Developers applied this model to popular paintings, including Leonardo Da Vinci and Mona Lisa. 
  • The outcomes have some visual glitches, but they are far impressive than previous techniques. 

Software for generating deepfakes (an artificial intelligence-based technique for human image synthesis) requires large sets of images to build a realistic forgery. Recent advances in neural networks have shown how highly realistic human images can be obtained by training the network on a broad range of datasets.

However, developers at Samsung research center in Moscow have now developed a new artificial intelligence (AI) model that can create talking avatars from a single image. Although it’s possible to fabricate a video clip from one image, training it through several pictures results in better identify preservation and higher realism.

The talking heads generated by this model can handle various poses, including ones that go beyond the abilities of warping-based systems. You may find some visual glitches, but the outcomes are far impressive compared to previous techniques. The model leads to the creation of multimedia that will ultimately be hard-to-distinguish from the real video.

Challenges Involved

Fabricating realistic talking avatar sequences is difficult mainly because of two reasons –

  1. Human heads have high kinematic, geometric, and photometric complexity. It’s necessary to accurately model hair, eyes, mouth cavity, and many other elements.
  2. The acuteness of the visual system towards tiny errors in the appearance modeling human heads.

To address these issues, the new AI model creates three neural networks during the learning process. It builds an embedded network that connects face landmark frames with vectors. Then it builds a generator network to map landmarks into the synthesized clips. In the final step, the discriminator network evaluates the pose and realism of frames.

Reference: arXiv:1905.08233 | YouTube

To better understand face landmarks and movements, researchers trained the networks on thousands of YouTube videos of humans talking. The outcomes (talking heads) were then compared with alternative neural networks via quantitative measurements.


The team applied this model on images of many popular figures, such as Mona Lisa, Leonardo Da Vinci, and Albert Einstein. The AI was able to fabricate talking videos from a single image, bringing classic portraits to life. It only needs one photo to create videos. However, a model trained on 32 pictures can achieve a better personalization score and perfect realism.

talking footage from single photo

This type of AI can have several practical applications in telepresence, including multi-player games, video conferencing, as well as special effects industries.

Read: IBM Develops An AI That Detects Scene In A Video

On the downside, the rapid development of such techniques could raise risks of misinformation, impersonations, fraud, and election tampering.

Written by
Varun Kumar

I am a professional technology and business research analyst with more than a decade of experience in the field. My main areas of expertise include software technologies, business strategies, competitive analysis, and staying up-to-date with market trends.

I hold a Master's degree in computer science from GGSIPU University. If you'd like to learn more about my latest projects and insights, please don't hesitate to reach out to me via email at [email protected].

View all articles
Leave a reply