- A new machine learning model predicts what audiences will most likely watch a movie based on its trailer.
- It uses Collaborative Filtering method to extract trailer features like faces, objects, landscapes, etc.
- These features are then combined with attendance and demographic data to the predict audience attendance.
Trailers are the most important part of the marketing campaign for new movies. They present characters, communicate plot, reveal some hints about the storyline, and increase awareness among movie lovers.
For filmmakers, it’s an opportunity to learn audience perspective: what they liked and what didn’t impress them. Usually, these details help them plan the next strategy of the marketing campaign.
To help pick up the best previews for trailer, engineers at 20th Century Fox film studios built a machine learning method named Merlin Video, that predicts what audiences will most likely watch a movie, based on its trailer.
How It Works?
Merlin Video generates dense representations of a trailer and uses them to analyze and predict the behavior of an audience. According to the research team, this is the first time any movie studio is using a low-level representation of trailers to measure audience interests.
It is based on the state-of-the-art Collaborative Filtering model, which extracts features like illumination, objects, colors and faces, and combines them with attendance and demographic data to accurately forecast the audience attendance for existing films, as well as yet to be released films.
The convolutional neural networks extract frame-by-frame low level features. The pre-trained networks can be used to detect and analyze the features in the relevant frames of a trailer. By feeding appropriate representations of these features to neural networks trained on historical records, one can discover significant links between the features of the movie trailer and future audience preferences.
Overview of Merlin Video | Courtesy of researchers
More specifically, Merlin Video contains a logistic regression layer that merges the distance-based Collaborative Filtering model with user recency and user frequency to generate the probability of audience attendance. The system is trained end-to-end, and logistic regression loss is propagated back to all trainable modules.
In summary, engineers have made three main contributions in this study:
- A recommendation model for movie releases developed specifically to handle the cold-start and theatrical recommendations using contents of the trailer.
- They measured the performance of multiple versions of Merlin Video and demonstrated how it can be utilized in decision making procedures in real-world scenarios.
- They discussed the feasible ways to combine video and text inputs to enhance the prediction accuracy.
The neural network is trained on hundreds of trailers released over the past years, and millions of attendances records. They used NVIDIA Tesla P100 GPU on the Google Cloud, with TensorFlow powered by CUDA deep neural network, to train the model.
In future work, engineers will focus on building a model that utilizes both video and text features to forecast a film’s success.