- Researchers train a deep learning model to generate new dance steps that are realistic, diverse, and style-consistent.
- It can assist and expand content creation in several areas, such as rhythmic gymnastics and theatrical performance.
As scientists gradually inch computers toward human levels of intelligence, they have started tackling some very human endeavors. We have certainly reached a point where artificial intelligence could help choreographers mix things up by suggesting thousands of different styles.
Recently, researchers at the University of California developed a deep learning model to generate new dance steps that are realistic, diverse, and style-consistent. It contains a synthesis-by-analysis learning framework to generate beat-matching dance from music.
Building such a music-to-dance framework is a challenging task but it can assist and expand content creation in several areas, such as figure skating, rhythmic gymnastics, and theatrical performance.
The Core Of The AI Choreographer
To synthesize dance from music, researchers developed a decomposition-to-composition framework, which first learns how to move (in decomposition phase) and then how to arrange basic movements into a sequence (in composition phase).
In the first phase, they extracted movement beats from a dancing sequence, using a kinematic beat detector. Each dancing sequence is then temporally normalized into a series of dance units. Individual dance units were untangled into initial poses and movements.
In the second phase, researchers proposed a music-to-movement model to create a sequence of movements that match the input music. At run time, they extracted the beat and style information, and then sequentially produced a series of dance units according to the music style. Finally, they warped the dance units by extracted audio beats.
To train the network, the team collected more than 360,000 video clips totaling 71 hours. These videos included three dance categories: Hip-Hop, Zumba, and Ballet.
To process different poses, they used OpenPose, a real-time multi-person system to jointly detect the human body, facial, hand, and foot keypoints on single images. And for performance evaluation, they used different metrics to examine style consistency, realism, diversity and beat matching.
Mapping generated dances to photo-realistic videos | Courtesy of researchers
Researchers also synthesized pose sequences to photo-realistic videos to better visualize the outcomes. The large-scale paired dance and music dataset along with the source code is available on the GitHub.
The generative adversarial network is trained using the PyTorch deep learning framework on NVIDIA V100 GPUs. In the near future, researchers will add more dancing styles (such as partner dance) to make the system even better.