- Engineers designed a new device that could be connected to a low-powered PC/laptop to process large graphs.
- It could handle massive graphs with up to 4 billion nodes and 128 billion edges.
- Tech companies could use this device to decrease energy footprint by using fewer servers.
In computer science, graphs are structures of nodes and edges (lines connecting those nodes) that are used to map weights of complex data structure efficiently. They have a wide range of application, like data organization, representation of communication networks, flow of computation, and more.
For instance, graphs are used to connect friends on social media, ranking web pages, find routes on Google Maps, show recommendations on eCommerce websites, or plot neuron structures in the brain.
A large graph consists of billions of vertices and edges, and occupies terabytes of space on memory. Usually they are executed in DRAM (dynamic random access memory) across several power-consuming servers.
Now MIT engineers have designed a device that allows a personal computer or laptop to process huge graphs. This device contains an array of flash chip (common in large business storage systems and small computing devices like smartphones) and a computation accelerator.
Although DRAM are faster than flash storage at processing graph data structure, they are quite expensive. To enhance the performance of flash chips, they designed a computation accelerator, and a novel algorithm for sorting all access requests of graph data into a sequential order.
The algorithm combines some request to decrease the overhead (computation time, bandwidth and memory) of sorting. Flash storage can process these sequentially-arranged request easily and quickly.
How It Works?
Conventional systems process graph in DRAM, which is both expensive as well as power-consuming. To reduce the cost, some systems offload a portion of data to flash storage (less efficient and slower), but they still need significant size of DRAM.
The new device is packed with a ‘sort-reduce‘ algorithm that puts all direct access requests in a sequence, ordered by identifiers. These identifiers represent request destinations that are grouped together, for example, all updates for vertex X, all updates for vertex Y, updates for vertex Z, and so on. Flash is capable of accessing thousands of those chunks (KB-sized) at once.
Device with 8 black flash chips and 1 square accelerator (on the left) | Image source: MIT’s CSAIL
Also, the algorithm concurrently combines the data into smallest clusters, saving more bandwidth and computation power. When it detects similar identifiers, it adds them to create a single data packet, for instance, X1 and X2 become X3.
The algorithm continues to do so, until it generates a packet with smallest possible size. This eliminates the duplicate node-access requests, and allows developers to decrease up to 90% of the total data that is required to be updated in flash storage.
Since this is a resource-hungry algorithm for a host PC, engineers integrated a custom accelerator in the device, which serve as an interface between flash storage and host. It executes all calculation required by sort-reduce algorithm so that the host can be any low-powered laptop or PC, running other minor computations while managing sorted data.
Engineers tested this device against different massive graphs, including hyperlink graph by web data commons, which consists of 3.5 billion nodes (webpages) and 128 billion edges (hyperlink).
A small graph with 11 nodes and 13 edges
To process this graph, a conventional system requires 128 GB of DRAM, which overall costs thousands of dollars. The new system, on the other hand, can do the same processing using two devices plugged into a PC. It consists of 1 GB or DRAM and 1 TB of flash storage.
If you connect multiple devices, it could handle even large graphs — with 4 billion nodes and 128 billion edges — that no other conventional system could process on the 128 GB DRAM server.
So the device could cut costs and energy consumption while improving performance. Tech giants like IBM, Google and Microsoft could use this device to decrease energy footprint by using fewer servers to process graphs.
According to researchers, the system can be easily scaled horizontally to run on a cluster, by dividing the intermediate update list across servers. Furthermore, it should work well on graphical processing units (GPUs), that can boost the speed of sorting and merging tasks to make better use of flash bandwidth.