- The Cerebras Wafer-Scale Engine a 46,225 mm2 chip consisting of 1.2 trillion transistors and 40,000 AI-optimized cores.
- The chip has 3,000 times more capacity and 10,000 times greater bandwidth than the largest GPU available in the market.
In the recent decade, deep learning has risen from obscurity to top-of-mind awareness. Complex tasks that historically were only performed by humans are now being carried out on computers at superhumans levels.
Deep learning is a computationally intensive method. According to OpenAI report, the calculations used to train the largest model increased by 300,000 times between 2012 and 2018. This means artificial intelligence (AI) computing is growing at a phenomenal rate: the computational demand is doubling every 3.5 months.
Now, a team of computer architects and deep learning researchers have come up with a new chip — Cerebras Wafer-Scale Engine — that can accelerate AI by orders of magnitude beyond the present state of the art.
The Cerebras Architecture
The chip is only built for AI applications. The primary aim is to speed up both calculation and communication. The approach is a straightforward function of the size of the Wafer-Scale Engine.
As of 2019, it is the largest chip ever built. It’s a 46,225 mm2 chip consisting of 1.2 trillion transistors and 40,000 AI-optimized cores. In contrast, the largest GPU is 815 mm2 and has 21.1 billion transistors.
Developing such as large chip is not easy. The design, manufacturing, coordination, power, and cooling challenges are immense. However, the promises [in terms of performance] are huge.
With 56 times more silicon area than the largest GPU, the Wafer-Scale Engine provides 78 times more cores to do calculations, 18 GB on-chip RAM so that cores can operate faster, and low latency bandwidth (100 petabits/second) between cores so that they can collaborate efficiently.
Source: Cerebras | White Paper
All cores on the chip are linked via the Swarm communication fabric in a two-dimensional mesh, which offers a hardware routing engine capable of delivering exceptionally high bandwidth and low latency at much less power.
All cores can be configured via Cerebras software. This enables precise communication for training user-specific models. Unlike CPU and GPU that have hard-coded communication path, Swarm provides a unique and optimized communication path to run neural networks optimally.
It doesn’t require communication software such as MPI and TCP/IP, eliminating associated performance penalties. To make things simple and intuitive for developers, the Cerebras software offers a simple-to-use interface for popular high-level machine learning frameworks, including PyTorch and TensorFlow.
Along with conventional execution modes, the Wafer-Scale Engine enables novel techniques of model-parallel execution. It can run the whole neural network on the fabric at once. Therefore, users can quickly stream data through the pipeline, running all layers of the neural network concurrently. Such an approach is only possible in this chip because of its large scale.
Read: What Is Tensor Processing Unit (TPU)? How Is It Different From GPU?
Overall, the Cerebras Wafer-Scale Engine allows developers to test hypotheses rapidly and explore methods that today are untestable on legacy architectures.