- The new deep learning system teaches a robot how to perform specific tasks by observing humans.
- It observes human actions and uses an image-centric domain randomization method to generate human-readable programs.
- Then robot executes these human-readable programs in n number of steps.
Now researchers at NVIDIA have built a new deep learning system that teaches robots how to do specific tasks by observing humans. This is the first AI system developed to teach machines directly from actions performed by humans. According to the developers, it will improve the communication between robots and humans, while enabling people to seamlessly work alongside robots.
How It Works?
Robots can efficiently execute helpful tasks in real-world if there is a better way to communicate the task to the robot, which includes telling the final outcome and any clues as to the best means to obtain that outcome.
Also, the robot should be capable of executing tasks robustly in varying environments. Teaching a machine by demonstration is an effective way to solve issues like uncertain sensor input, less accurate control output.
Humans can tell robots what to perform by demonstrating it and providing hints as to how to execute the task efficiently. A single demonstration would be enough for a perfect AI system to teach robots how to perform a new task.
In this study, developers have made a system of multiple-level neural network trained in simulation. In order to make the problem controllable, they focused on stacking cubes (of different colors) either in pyramid-shape or vertically.
Developers have executed a series neural networks on NVIDIA Titan X GPUs. The multiple-level neural network performs tasks associated with perception, program creation and execution. These neural networks helped robot learn a task from a single real-world demonstration.
The robot uses an image-centric domain randomization method to generate synthetic, human-readable data with high diversity. It makes the perception network to see training data as real-world information. The data are processed in an image-centric approach so that networks are independent of camera or surrounding.
More specifically, this technique makes fewer assumptions about the position of camera and the visibility of fixed objects (like chair, table) within the surrounding, and thus it’s portable to new scenarios without retraining.
This is a snapshot of demonstration (left column) and robotic execution of specific tasks (remaining columns). The three rows show how a robot executes the automatically generated programs-
- Put yellow on red, green on yellow, and blue on green (upper row).
- Put yellow on green, and blue on red (robot recovered from misplacing the yellow cube).
- Put car on yellow.
The study doesn’t address many issues, like improving robustness of domain randomization to variation in color and lighting, utilizing data of pervious execution to overcome sensing limitations, integrating context to better control occlusion, and extending vocabulary of human-readable programs.