Tensor processing unit (TPU)

Also known as: TPU

A tensor processing unit (TPU) is a custom chip designed by Google to accelerate machine-learning workloads, built around large arrays that perform the matrix multiplications at the heart of neural networks.¹

Overview

A TPU is an application-specific integrated circuit (ASIC): rather than the general flexibility of a CPU or GPU, it dedicates almost all its silicon to a systolic array — a grid of multiply-accumulate units that streams data through to compute matrix products with very high throughput per watt. The trade-off is narrow specialization: a TPU runs tensor math (typically in reduced precision such as bfloat16) extremely well and little else. Google introduced TPUs in 2016 to power its own services and now offers them through its cloud, with the small Coral Edge TPU bringing the design to embedded devices.²

Where it fits

The TPU is the canonical example of a purpose-built AI accelerator, competing with the GPU for training and inference and with the on-device NPU at the edge. Its strength is data-center scale neural-network training and serving; it is not a general DSP engine, so it has no direct role in GopherTrunk’s signal chain, though the same edge-AI parts (see Google Coral) could classify decoded traffic.

Sources

Tensor Processing Unit — Wikipedia, on Google’s machine-learning accelerator. ↩
Cloud TPU — Google’s documentation for the TPU and its architecture. ↩

Overview

Where it fits

Sources

See also

Join the GopherTrunk community