- The new deep reinforcement learning approach named DeepCubeA can solve the Rubik’s cube within a few seconds.
- The deep learning model can be applied to various other fields, including robotics and natural sciences.
Artificial Intelligence (AI) has already proven successful in chess and Go, but more difficult puzzles like Rubik’s cube haven’t been solved via machine intelligence. It’s a classic combinational puzzle that poses unique and intriguing challenges for machine learning.
Now, researchers at the University of California, Irvine, have built a deep reinforcement learning approach named DeepCubeA that can solve an incredibly complex puzzle without any specific domain knowledge. It can solve a Rubik’s cube within a few seconds with no in-game coaching from humans.
As the dimensions increase, the complexity of the underlying combinatorial puzzle increases dramatically. Finding an optimal solution to the 15 puzzle, for instance, takes a fraction of a second on a conventional computer, whereas finding an optimal solution to the 24 puzzle could take days on the same machine.
In this study, researchers tried to develop a machine learning model that can learn how to solve a variety of puzzles without relying on domain-specific human knowledge. They combined three state-of-the-art approaches to develop DeepCubeA –
- Deep learning
- Classical Reinforcement (approximate value iteration)
- Path-finding methods (weight A* search)
It consists of a deep reinforcement learning algorithm which uses a policy and value function combined with Monte Carlo Tree Search to solve the Rubik’s Cube.
Researchers used TensorFlow deep learning framework to train the network – it was trained on about 10 billion simulations of the scrambled and completed puzzle. The whole process was carried out for about 1,000,000 iterations which took 36 hours.
Reference: Nature | DOI:10.1038/s42256-019-0070-z | UCI | Online Demo
Once trained, DeepCubeA was able to achieve 100% accuracy during each test configuration, finding the shortest path to the final state 60.3% of the time.
DeepCubeA uses a heuristic function which never overestimates the cost of a shortest path. The weighted A* search has certain bounds on how much the solution length can vary from the length of an optimal solution.
Applications Beyond Combinatorial Puzzles
The research team also trained DeepCubeA on other puzzles, including 24 puzzle, Lights Out, and Sokoban. It was able to find the shortest path in the majority of verifiable cases.
The ultimate objective of studies [like this one] is to develop the next-generation deep learning models that can be applied in fields beyond combinational puzzles, ranging from robotics to natural sciences.
We already interact with AI on a daily basis through search engines and apps such as Alexa and Siri. However, these systems are not really intelligent: they can be easily manipulated or fooled.
Read: New Poker-Playing AI Can Destroy Many Online Companies – So Developers Are Not Releasing It
We need to build AI that is more robust, smarter and capable of understanding, reasoning, and planning. The study is a small step toward this massive goal.