One of the hottest U.K. based startups, Graphcore focuses on accelerating the processing of complex machine learning models for training and inference. They are working on an Artificial Intelligence chip to lower the cost of AI applications in enterprise datacenters and cloud, and increase the performance by up to 100 times compared to the fastest system available today.
The company has planned to ship its first AI chip to its first customer in early 2018. However, it will take longer for chips to be installed in devices like smartphones because many internet services are processed in the cloud.
Many internet companies, including Facebook and Google struggle to make their video ads more relevant to what people are watching. Their algorithms cannot see the photos flashing across each video, to help it target human eyeballs more effectively. This would be an extremely complex process, involving huge computations in real time on a massive scale.
This is just an example of one of the tedious pursuits for artificial intelligence. Targeting users more effectively for video ads requires more powerful chips to process all the visual information uploaded each day. Graphcore is building cutting edge processors to power the next generation of AI software that will understand its users better, and power the ad-targeting software on big video platforms like YouTube.
Table of Contents
Intelligence Processing Unit
Graphcore has developed a new processor dedicated to machine intelligence workloads, and named it Intelligence Processing Unit (IPU). It has been tuned to work efficiently on large and complex high-dimensional models required for machine intelligence workloads. The system features more than 14,000 independent processor threads, and emphasizes parallel, low precision point computation and offers higher compute density compared to other solutions.
How Intelligence Compute Works
- LEARNING – put data into a high-dimensional knowledge model
- JUDGEMENT – summarize the data of the knowledge model
- PREDICTION – give some inputs via knowledge model to deduce suitable outputs
- INFERENCE – give some outputs via knowledge model to deduce suitable inputs
They all are optimization processes appropriate for one compute machine. Knowledge models are usually represented as graphs, in which vertices are features and edges are correlations.
Poplar
Graphcore uses a C++ framework called Poplar for IPU accelerated platforms. It is designed to provide a seamless interface to popular machine learning frameworks, including MXNet and Tensorflow, so existing applications are compatible with IPU.
The Poplar graph framework consists of a full set of drivers, application libraries, analysis/debugging tool for optimizing performance, and Python and C++ interface for developing applications.
Poplar is packed with a graph compiler that has been developed from the scratch to convert the standard operations used by machine learning frameworks into optimized application code for the IPU. It creates an intermediate representation of the graph to be deployed across several IPUs. Moreover, the compiler has the ability to show this graph, so the code written at machine learning framework level reveals an image of the graph that run on the IPU.
The compiler has converted a network description into a computational graph of about 18.7 million vertices and 115.8 million edges, which represents a powerful deep neural network named AlexNet as an execution plan for IPU. The vertices are associated with processes computation and the edges represent communication between processes.

Benchmarks
The batch size is the number of items of data, one need to process in parallel with current set of parameters while training machine learning models. The estimated performance of IPU in terms of images per second of training ResNet 50 neural network to learn image classification on ImageNet dataset, is shown in the chart.
The chart shows the significant performance gain even at low batch sizes. The batch size of 64 is used when IPU accelerator cards were scaled up to eight. At any certain point, using an IPU system would result in notable performance gain over existing technologies. For instance, the best performance gained on a 300 watt GPU accelerator is around 580 images/second.
Long Short-Term Memory (LSTM) Inference
The data dependencies in recurrent networks limit the parallelism capability as well as the number of operations per data. The IPU and Poplar libraries take care of these limitations. The graph shows the single layer LSTM network performance for different latency constraints compared to the GPU.
Generative Networks
The next chart shows the performance of Deep Voice generation algorithm compared to other platforms. The experiment takes two types of performance metric into consideration. How quickly can samples be generated, if real time stream can be generated, then how many audio channels can be produced at once.
Funding
In nearly one year, the company has raised $110 million in 3 rounds of funding. Recently, on 12th November 2017, Graphcore closed a $50 million Series C funding round with Sequoia Capital, a firm not known for large investments in Europe. This follows the $60 million that Graphcore already raised in previous funding rounds.
Read: 19 Most Innovative Artificial Intelligence Startups
Their other investors include Samsung, Bosch, Atomico, Dell Technologies Capital, Draper Esprit, Amadeus Capital, Foundation Capital and C4 Ventures. The CEO, Nigel Toon said they will be using this money to scale up production, attract more developers to its platform and spent time at potential customers to understand whether their product is solving real problems.
Rivals
No doubt, Nvidia and Intel are making strong claims on this market, and they bring decades of experience and huge money for investment in research and development and marketing. In fact, they are turning into a giant provider of hardware for almost everything, from gaming and machine learning to mining cryptocurrency.
Read: Inside A11 Bionic Chip | Apple’s Approach to Artificial Intelligence
Apple has its own hardware in A11 Bionic Chip, Google has the Tensor Processing Unit that plays nicely with TensorFlow. And then there are reports that suggest Tesla may be working with AMD to build its own AI chip for self driving cars. They all are big players, but given the market opportunity, it seems big enough for AI startups like Graphcore to go after those giants. .