The New Graphcore AI Chips Are 100 Times faster

One of the hottest U.K. based startups, Graphcore focuses on accelerating the processing of complex machine learning models for training and inference. They are working on an Artificial Intelligence chip to lower the cost of AI applications in enterprise datacenters and cloud, and increase the performance by up to 100 times compared to the fastest system available today.

The company has planned to ship its first AI chip to its first customer in early 2018. However, it will take longer for chips to be installed in devices like smartphones because many internet services are processed in the cloud.

Many internet companies, including Facebook and Google struggle to make their video ads more relevant to what people are watching. Their algorithms cannot see the photos flashing across each video, to help it target human eyeballs more effectively. This would be an extremely complex process, involving huge computations in real time on a massive scale.

This is just an example of one of the tedious pursuits for artificial intelligence. Targeting users more effectively for video ads requires more powerful chips to process all the visual information uploaded each day. Graphcore is building cutting edge processors to power the next generation of AI software that will understand its users better, and power the ad-targeting software on big video platforms like YouTube.

Intelligence Processing Unit

Graphcore has developed a new processor dedicated to machine intelligence workloads, and named it Intelligence Processing Unit (IPU). It has been tuned to work efficiently on large and complex high-dimensional models required for machine intelligence workloads. The system features more than 14,000 independent processor threads, and emphasizes parallel, low precision point computation and offers higher compute density compared to other solutions.

How Intelligence Compute Works 

  1. LEARNING – put data into a high-dimensional knowledge model
  2. JUDGEMENT – summarize the data of the knowledge model
  3. PREDICTION – give some inputs via knowledge model to deduce suitable outputs
  4. INFERENCE – give some outputs via knowledge model to deduce suitable inputs

They all are optimization processes appropriate for one compute machine. Knowledge models are usually represented as graphs, in which vertices are features and edges are correlations.

Poplar

Graphcore uses a C++ framework called Poplar for IPU accelerated platforms. It is designed to provide a seamless interface to popular machine learning frameworks, including MXNet and Tensorflow, so existing applications are compatible with IPU.

The Poplar graph framework consists of a full set of drivers, application libraries, analysis/debugging tool for optimizing performance, and Python and C++ interface for developing applications.

Poplar is packed with a graph compiler that has been developed from the scratch to convert the standard operations used by machine learning frameworks into optimized application code for the IPU. It creates an intermediate representation of the graph to be deployed across several IPUs. Moreover, the compiler has the ability to show this graph, so the code written at machine learning framework level reveals an image of the graph that run on the IPU.

The compiler has converted a network description into a computational graph of about 18.7 million vertices and 115.8 million edges, which represents a powerful deep neural network named AlexNet as an execution plan for IPU. The vertices are associated with processes computation and the edges represent communication between processes.

A graph processor like IPU is especially designed for creating and executing computational graph network for deep learning models of all kinds. We all know machine learning is the future of computing and Graphcore is promising that IPU-like architecture will carry this next wave of computing forward.

Benchmarks

The batch size is the number of items of data, one need to process in parallel with current set of parameters while training machine learning models. The estimated performance of IPU in terms of images per second of training ResNet 50 neural network to learn image classification on ImageNet dataset, is shown in the chart.

The chart shows the significant performance gain even at low batch sizes. The batch size of 64 is used when IPU accelerator cards were scaled up to eight. At any certain point, using an IPU system would result in notable performance gain over existing technologies. For instance, the best performance gained on a 300 watt GPU accelerator is around 580 images/second.

Long Short-Term Memory (LSTM) Inference

The data dependencies in recurrent networks limit the parallelism capability as well as the number of operations per data. The IPU and Poplar libraries take care of these limitations. The graph shows the single layer LSTM network performance for different latency constraints compared to the GPU.

Generative Networks

The next chart shows the performance of Deep Voice generation algorithm compared to other platforms. The experiment takes two types of performance metric into consideration. How quickly can samples be generated, if real time stream can be generated, then how many audio channels can be produced at once.

Funding

In nearly one year, the company has raised $110 million in 3 rounds of funding. Recently, on 12th November 2017, Graphcore closed a $50 million Series C funding round with Sequoia Capital, a firm not known for large investments in Europe. This follows the $60 million that Graphcore already raised in previous funding rounds.

Read: 19 Most Innovative Artificial Intelligence Startups

Their other investors include Samsung, Bosch, Atomico, Dell Technologies Capital, Draper Esprit, Amadeus Capital, Foundation Capital and C4 Ventures. The CEO, Nigel Toon said they will be using this money to scale up production, attract more developers to its platform and spent time at potential customers to understand whether their product is solving real problems.

Rivals

No doubt, Nvidia and Intel are making strong claims on this market, and they bring decades of experience and huge money for investment in research and development and marketing. In fact, they are turning into a giant provider of hardware for almost everything, from gaming and machine learning to mining cryptocurrency.

Read: Inside A11 Bionic Chip | Apple’s Approach to Artificial Intelligence

Apple has its own hardware in A11 Bionic Chip, Google has the Tensor Processing Unit that plays nicely with TensorFlow. And then there are reports that suggest Tesla may be working with AMD to build its own AI chip for self driving cars. They all are big players, but given the market opportunity, it seems big enough for AI startups like Graphcore to go after those giants. .

Written by
Varun Kumar

I am a professional technology and business research analyst with more than a decade of experience in the field. My main areas of expertise include software technologies, business strategies, competitive analysis, and staying up-to-date with market trends.

I hold a Master's degree in computer science from GGSIPU University. If you'd like to learn more about my latest projects and insights, please don't hesitate to reach out to me via email at [email protected].

View all articles
Leave a reply