- Researchers at Facebook AI develop a new reinforcement learning algorithm named DD-PPO.
- It can navigate through complex environments using only compass data, RGB-D camera, and GPS.
Developing intelligent machines that interact smartly with the physical world has been a long-term goal of the AI community. The major challenge is to teach these machines so that they can efficiently navigate through complex, unfamiliar environments without using any map.
Usually, real-world maps become outdated within months, as buildings and structures change, and objects are moved around. That’s why it’s quite necessary to build AI for the physical world that can navigate without a map.
Keeping these things in mind, researchers at Facebook AI have developed a new reinforcement learning (RL) algorithm that effectively solves the point-goal navigation task using only compass data, RGB-D camera, and GPS. This large scale algorithm is named DD-PPO (decentralized distributed proximal policy optimization).
New RL Distributed Architecture Scales Well
Nowadays, machine learning-based systems are capable of outperforming human experts in various complex games. But since these systems rely on a massive volume of training samples, it is quite impossible to build them without large-scale, distributed parallelization.
Current distributed reinforcement learning architecture — includes thousands of workers (CPUs) and a single parameter server — do not scale well. That’s why researchers proposed a synchronous, distributed reinforcement learning technique.
DD-PPO runs across several machines and has no parameter server. Each worker (CPU) alternates between gathering experience in a GPU-accelerated, resource-intensive simulated surrounding and the optimizing the model. In an explicit communication state, all workers synchronize their updates to the model. In other words, the distribution is synchronous.
All workers simulate an agent performing point-goal navigation, and then optimize the model and synchronize their updates | This is how data is shared during training with DD-PPO
Using this approach, DD-PPO exhibited near-linear scaling: it was able to achieve a speedup of 107 times on 128 GPUs over a serial implementation.
Near-Perfect Point-Goal Navigation
In point-goal navigation, an agent is set at a random initial position/orientation in an unfamiliar environment and tasked to navigate to target coordinates without using any map. It can use only a compass, GPS, and either an RGB or RGB-D camera.
Researchers leveraged the scaling feature of DD-PPO to train the agent for 2.5 billion steps, which is equivalent to 80 years of human experience. Instead of months, the training was completed in less than three days with 64 GPUs.
The results showed that 90% of peak performance was obtained in the first 100 million steps with fewer computing resources (8 GPUs). With billions of steps of experience, the agent obtains a success rate of 99.9%. In contrast, previous systems achieved a 92% success rate.
The agent backtracks after choosing the wrong path to get to its target position | Courtesy of researchers
These AI agents can assist people in the physical world. For example, they could show relevant information to users wearing augmented reality glasses, robots can retrieve items from a desk upstairs, and AI-powered systems can help people with visual impairments.
The models built in this study can work in usual settings, such as inside laboratories and office buildings, where additional data points (maps and GPS data) are not available.
Although the model outperforms ImageNet pre-trained convolutional neural networks and can serve as a universal resource, there is still a lot to do to develop systems that learn to navigate through complex environments. Researchers are currently exploring new approaches to implement RGB-only point-goal navigation.