- A new self-learning program can predict future activities within a time horizon of up to 5 minutes.
- This includes sequence, duration and type of activities.
- The program achieved over 40% accuracy, but its performance got worse when it had to look far into the future.
Over the last few years, we’ve seen an enormous growth in the capabilities of computer programs to analyze, classify and segment tasks in video clips. However, this is not sufficient for applications where the system has to interact with humans.
Developing algorithms that can accurately estimate future activities from video data is quite a challenging job. Exiting methods are capable of predicting activities only for a very short time (in seconds).
Now, researchers at the University of Bonn, Germany has developed a new self-learning software that can predict future activities within a time horizon of up to 5 minutes. This includes the type, sequence and duration of activities that are going to happen in next 5 minutes.
How It Is Developed?
The authors wanted to estimate the duration and timing of activities (from video clips), minutes and even hours before they actually happen. The program they have developed first examines the order of activities from videos like cooking tutorial. Based on what it has learned, it predicts what the chef will most likely do at what moment.
While existing methods focus on predicting only one action in the future, this is the first software to forecast content of the video up to a length of several minutes.
Proposed approach for predicting future actions
To do this, they used 2 novel techniques-
Recurring Neural Network to forecast remaining duration of ongoing activity, as well as the class and duration of the next activity.
Convolutional Neural Network to predict a matrix that encodes action labels and the length of anticipated activities.
Training and Testing
Researchers trained their program on 4 hours of cooking videos, which include preparing different types of salads. All videos had accurate details of when a specific action is started and how long it took to complete.
After analyzing each and every single task in the video, the program learned what actions follow each other and how long they usually last. Of course, the order of the action performed depends on the recipe.
After the completion of training, researchers tested how accurately it learned all of these tasks. They fed the program with new cooking videos. But this time, they only showed first 30% of the videos, and on the basis of only 20-30% of starting part of the video, the program had to estimate what would likely happen during the rest of the clips.
While both the recurring- and the convolutional neural network performed similarly for a long time horizon of over 40 seconds, the recurring neural network performed better for short time horizons of less than 20 seconds.
Reference: Anticipating Temporal Occurrences of Activities | University of Bonn
Overall, the program achieved more than 40% accuracy for short prediction periods. However, its performance got worse when it had to look far into the future. The accuracy level dropped to 15% for 3-minutes later activities.
The program can be embedded in kitchen robot. Then it could be able to perform some meaningful minor actions, like preheat the oven to save time, pass the ingredients as soon as they’re required, and even warn the chef if she is forgetting something.
Needless to say, the program has many other applications, for example, it could be used in automatic vacuum cleaner so that it can make smart decisions: if you’re in the living room with friends, the vacuum cleaner automatically knows it has no business in the living room at that moment, and instead takes care of the kitchen or bedrooms.