- A new algorithm exponentially speeds up computation by drastically decreasing the number of iterations required to solve a problem.
- It performs far better than conventional (sequential) algorithms on large scale datasets, such as social media analytics and clustering genetic data.
Thousands of optimization problems (problem of finding the best solution from all feasible solutions) like allocate funds to stocks to minimize risk on returns, or assign employees to available offices to maximize workflow and employee statistician, heavily rely on sequential algorithms.
The basic working pattern of these algorithms hasn’t been altered (enhanced) since they were first established in the early 1970s. They solve any particular problem sequentially in “n” number of steps.
The number of steps depends on the size of the problem (which provides certain values to algorithm as input). This kind of method usually results in a computational bottleneck. The relative gain from each iteration becomes smaller and smaller as the algorithm progresses.
What if an algorithm could take a few jumps, rather than taking thousands of small steps to solve the problem? What if we could make a large set of today’s widely used algorithms work exponentially faster? We are talking about the algorithms that help us discover new drugs and avoid traffic.
To make this possible, researchers at Harvard University have come up with a new type of algorithm, what they call “Breakthrough”, that exponentially speeds up computation by drastically decreasing the number of iterations required to solve a problem.
It speeds up computation for a wide range of problems across several different areas, such as information extraction, auction design, machine vision, computational biology, network analysis, and more.
According to the developers, it’s capable of performing large computations in several seconds that would have previously taken days or weeks. This could open the doors to new wide-scale parallelization approaches, enabling practical summarization processes to be built at exceptional scale.
How It Works?
Sequential algorithms work by narrowing down the number of feasible solutions one step at a time. Whereas, the new algorithm parallelly samples different directions, and then eliminates less-relevant directions and selects the most favorable (high-valued) directions to reach the solution. It selectively discards values that will be ignored in future iterations.
Breakthrough algorithm uses adaptive sampling | Courtesy of researchers
More specifically, the algorithm requires O (log n) sequential steps and reaches to an approximation arbitrarily close to 1/3. When enabling parallelization, the algorithms reaches a constant factor approximation exponentially faster than any existing method for sub-modular maximization.
For example, if the task is to recommend movies similar to Star Wars, a conventional algorithm would add one movie at each step that has similar attributes (action, adventure, fantasy) to those of Star Wars.
The newly developed algorithm, on the other hand, randomly samples a set of movies, pruning those that are not matching to Star Wars at all. This gives a diverse collection of movies (obviously, you do not want 10 Superman movies in your recommendation) that are similar to Star Wars.
The algorithm will continue to add variety of movies in each step until it has enough items to recommend. The key to make valuable decisions at every single step lies in the adaptive sampling process.
No. of steps taken by a sequential (black) and breakthrough (red) algorithm to solve a problem
Testing And Applications
Researchers tested their adaptive-sampling algorithm on a large dataset containing 1 million rating on 4,000 movies from 6,000 users. It successfully recommended personalized and variety of movies for a person 20 times faster than conventional algorithms.
They also applied this algorithm on a taxi dispatch problem – pick the best locations to serve the maximum number of customers with limited taxis. For 2 million taxi trips, the algorithm worked 6 times faster than the state-of-the-art.
It can yield far better results on large scale datasets, like social media analytics or clustering genetic data. Other than this, the algorithm could be applied for developing clinical trials for multiple disease, sensor arrays for medical imaging, and detecting interaction between drugs.
Nowadays, finding an effective subset of data from millions of images/videos to train deep learning networks has become a challenging task. This study could help extract valuable subsets rapidly and have a significant impact on massive-scale data summarization problems.