- The new machine learning-based perception framework can recognize over 90 different objects by touch.
- It uses both visual and tactile observations to identify whether these observations correspond to the same object.
Humans are good at associating the appearance and material properties of objects across multiple modalities. When we see a scissor, we can imagine what our fingers would feel touching the metal surface, we can picture it in our mind – not just its identify, but also its size, shape, and proportions.
The perception of robots, on the other hand, isn’t inherently multi-modal. Although existing robots equipped with advanced cameras are capable of distinguishing between two different objects, vision alone can often prove inadequate, especially in the presence of occlusion and poor light conditions.
Now, researchers at the University of California, Berkeley have developed a method that allows the robotic manipulator to learn human-like multi-modal associations. It uses both visual and tactile observations to find out whether or not these observations correspond to the same object.
What Exactly They Did?
The research team employed a high-resolution touch sensing via two GelSight sensors (attached on the robot’s finger) and convolutional neural networks (CNNs) for multi-modal association.
These sensors generate readings by means of a camera integrated with an elastomer gel, which records indentations in the gel created by contact with objects. These readings are then fed to CNNs for data processing.
Researchers trained these CNNs to take in the tactile readings from sensors and object image from a camera, and identify whether these inputs represent the same object or not. To perform instance recognition, they combined the robot’s tactile readings with the visual observation of the query object.
Reference: arXiv:1903.03591 | UC Berkeley
They used NVIDIA GeForce GTX 1080 and TITAN X GPUs with CUDA deep learning framework to train and test the CNN for multi-modal association on more than 33,000 images.
Robot (left) consisting of two GelSight tactile sensors (one on each finger) and a frontal RGB camera | Examples of tactile observations (middle) and object images (right) corresponding to a single object | Courtesy of researchers
The results demonstrate that it’s possible to recognize object instances by tactile readings alone, including instances that were never used in training. In fact, CNN outperformed some human volunteers and alternative methods.
So far, researchers have only considered individual grasps. In the next study, they will use multiple tactile interactions to obtain a more complete picture of the query object.
The team also plans to extend their system to robotic warehouses where robots look at product images and retrieve them by feeling for objects on shelves. This new method can be applied to robots in a home environment to make them retrieve objects from hard-to-reach spots.