What Is Tensor Processing Unit (TPU)? How Is It Different From GPU?

  • TPU is an AI accelerator chip specifically designed for neural network machine learning.
  • TPUs are good for deep learning tasks involving TensorFlow, while GPUs are more general purpose and flexible massively-parallel processors. 

Machine learning, a branch of artificial intelligence (AI), is a buzzword in the tech field right now. Although it is gaining relevance and importance at a significant rate, the traditional processors haven’t been able to efficiently handle it, be it training or neural network processing.

Several manufacturers have built GPUs with highly parallel architecture for fast graphics processing, which are way more powerful than CPUs but are somewhat lacking.

To address this problem, Google developed an AI accelerator application-specific integrated circuit that would be used for neural network machine learning. Since the device is specifically designed for TensorFlow Framework, they named it Tensor Processing Unit (TPU).

For those who don’t know, TensorFlow is an open source library for dataflow programming and various machine learning tasks. It was developed by Google and now available for MacOS, Windows and Linux distributions. In recent years, it has become one of the most starred AI frameworks on GitHub.

How TPUs Are Different From GPUs?

GPUs were originally built to handle the large calculations required for rotating graphics on the screen and speed up other types of graphical tasks, such as rendering screens in gaming applications. The aim was to reduce the excessive burden on the CPU and make it available for other processes.

In 2007, NVIDIA developed a parallel computing platform and application programming interface called CUDA, which enabled developers to use GPUs for almost all types of vector, scalar, or matrix multiplication and addition.

A conventional CPU has one to sixteen cores, whereas a GPU has hundreds. The TPU and GPU are the same technology. Given the appropriate compiler support, they both can achieve the same computational task.

TPU version 3.0

In the current scenario, GPUs can be used as a conventional processor and can be programmed to efficiently carry out neural network operations. TPU, on the other hand, is not a fully generic processor. Since no support (such as compilers) is available yet, TPU can hardly run something other than a TensorFlow model.

TPUs Are Now Commercially Available

Google offers its ‘Cloud TPU’ to train and run machine learning models. Cloud TPU v2 charges for using on-demand and preemptible resources: its custom high-speed network provides 180 petaflops of performance and 64 GB high bandwidth memory. 

According to Google’s pricing information, each TPU costs $4.50 per hour (on-demand). They have featured multiple models on their product page; each version has different clock speeds and memory sizes.

Read: 5 Quantum Processors That Features New Computing Paradigm

TPU Architecture

Users can execute machine learning workloads on TPU accelerator hardware using TensorFlow. Cloud TPU can help programmers develop TensorFlow compute clusters that can use GPUs, TPUs, and CPUs. Also, users can easily run replicated models on the Cloud TPU hardware using high-level Tensorflow APIs.

TPUs are connected to Google cloud machines through a PCI interface. This is how NVIDIA allows gamers to add graphical extension cards to enhance the performance on the PC.

The hardware support integrated into the TPU chips enables linear performance scaling across a wide range of deep learning tasks.

Cloud TPU hardware consists of 4 independent chips and each chip contains 2 compute cores known as Tensor Cores. A Tensor Core is comprised of vector, scalar, and matrix units (MXU), and 16 GB of on-chip memory (HBM).

Each of the eight cores can run user tasks independently, while high-bandwidth memory enables the chips to communicate directly with each other.

Advantages Of TPU

Using Google’s TPU can be beneficial in many ways. We have listed a few advantages –

  1. It can significantly improve the performance of linear algebra computations.
  2. Minimize the time-to-accuracy while training massive, complex machine learning models.
  3. Lets you scale computations across several machines, without needing to write any code.
  4. New TPU versions can train models in hours which previously took weeks on other hardware platforms.

Read: World’s Fastest Optical RAM That Stores Light Instead Of Electricity

When To Use Cloud TPUs

TPUs are optimized for certain tasks only. Therefore, in some cases, you should prefer GPUs over TPUs, especially if –

  1. Models aren’t written in TensorFlow
  2. Models with TensorFlow ops aren’t available on Google’s Cloud TPU.
  3. Models for which source is too difficult to change or source does not exist at all.
  4. Models with a custom TensorFlow operations that must run at least partially on CPUs.

The following are the scenarios where you should prefer TPUs over GPUs –

  1. Models that have no custom TensorFlow operations
  2. Models that include a lot of matrix calculations
  3. Large models with huge effective batch sizes
  4. Models that train for months


TPUs have been used in several Google products and new inventions, including AlphaGo and AlphaZero system developed by Google DeepMind. AlphaZero was developed to master the games of go, shogi, and chess, and it was able to achieve a superhuman play level within 24 hours, beating the leading programs in those games.

The company also used this hardware for text processing of Google Street View, and was able to extract all the text in the Street View database within 5 days.

Moreover, TPU is used in a machine learning-based search engine algorithm named RankBrain, to provide more relevant search results.

In Google Photos, each TPU can process more than 100 millions images per day.

Competition For Google’s TPU

At present, NVIDIA is dominating the machine learning processor market with its latest series of GPUs, TITAN. They are driven by the world’s most advanced architecture — NVIDIA Volta — to deliver new levels of performances.

NVIDIA TITAN V, for instance, features 21.1 billion transistors and 640 Tensor Cores that can deliver 110 TeraFLOPS of performance.


Movidius (acquired by Intel) manufactures Visual Processing Units (VPUs) called Myriad 2, that can efficiently work on power-constrained devices. The third generation VPU, Myriad X is a strong option for on-device neural networks and computer vision applications.

In 2016, Intel revealed an AI processor named Nervana for both training and inference. It is designed with high-speed on-chip and off-chip interconnects to achieve true model parallelism where neural network parameters are distributed across multiple chips.

Microsoft is working on Project Brainwave, a deep learning platform for real-time AI serving in the cloud. They have been using high-performance field-programmable gate array (FPGA) in data centers to accelerates deep neural network inferencing.

Read: Intel Shows Brain-Like Processor And Quantum Chip | “Loihi” & “Tangle Lake”

Others in the AI chip include IBM, ARM, Cerebras, Graphcore, and Vathys. Soon, we will see these machine learning chips everywhere.

Written by
Varun Kumar

I am a professional technology and business research analyst with more than a decade of experience in the field. My main areas of expertise include software technologies, business strategies, competitive analysis, and staying up-to-date with market trends.

I hold a Master's degree in computer science from GGSIPU University. If you'd like to learn more about my latest projects and insights, please don't hesitate to reach out to me via email at [email protected].

View all articles
Leave a reply