Microsoft releases a new real-time hardware architecture, dubbed “Brainwave”. It is a deep learning acceleration system that achieves a big leap forward in performance as well as flexibility. With rapidly increasing deep learning application like image style transfer, voice synthesis, handwriting generation, automatic game playing, etc., Brainwave would serve as a bonus for AI based services.
The real time AI means that the hardware handles and processes requests the moment it receives them, with very low latency. It would benefit cloud infrastructures process live data streams that includes videos, search queries or interaction with users.
Three Main Components of Brainwave
- A distributed system architecture that improves resource utilization while keeping latencies ultra low.
- A DNN processor synthesized on 14nm class Altera FPGAs.
- A compiler and runtime environment for better deployment of trained neural network models.
Brainwave takes advantage of big FPGA (Field Programmable Gate Arrays) that Microsoft has been working on for over the past few years. The architecture reduces latency time and allows high throughput.
Brainwave uses “soft” DNN Processing Unit (DPU), synthesized onto FPGAs. Its design includes ASIC (Application-specific integrated circuit) digital signal processing blocks and synthesizable logic that provide more optimized functional units.
Brainwave has a program stack developed to support different deep learning frameworks. For now, it supports Google’s Tensorflow and of course, Microsoft Cognitive Toolkit, and plan to support more in the future.
The system is designed for real time AI. It can hand memory intensive complex models like LSTMs (which is a recurrent neural network) without using batching to juice throughput.
On early Stratix 10 silicon, Brainwave ran a big GRU model with no batching and got far superior results. It used custom 8-bit floating point format (built by Microsoft itself) that doesn’t suffer accuracy losses on different models.
Stratix 10 with 39.5 Teraflops executed each request within one millisecond. At this performance level, Brainwave supports execution of more than 130,000 operations per cycle (with one macro instruction per 10 cycles). Thus it achieves exceptional levels of demonstrated real time artificial intelligence performance on complex and highly challenging models.
Other Similar System
At Hot Chips conference 2017, Baidu announced a new architecture “XPU” that combines GPU, CPU and FPGA on a Xilinx platform. According to the company, it will be a lot easier to program than conventional low-level methodologies programmers use today for FPGAs. For this, Amazon Web Services reported their progress on F1 acceleration platform that supports 8 node Xilinx EC2 instance.
Google has also developed its own machine learning platform called Tensor Processing Unit. It delivered 30 times higher performance than contemporary GPUs and CPUs. They are commercializing the second generation TPU hardware as a cloud service. However, Microsoft has an edge over its rivals as the Azure already has a good user base.
As researchers continue to develop and tune the architecture for over the next few months, further significant performance improvement is expected.
Microsoft is currently working to bring this real time AI to Azure users, and other services like Bing. The company will soon announce when Azure customers will be able to run their complex deep learning models at higher performance.