Also known as: Compute Unified Device Architecture
CUDA is NVIDIA’s parallel computing platform and programming model that lets ordinary, general-purpose code run on the GPU instead of only the CPU.1
Overview
A CUDA program splits work into a kernel — a small function executed in parallel by thousands of lightweight threads, each handling one element of the data. The platform exposes the GPU through extensions to C and C++ (and bindings for Python, Fortran, and others), plus tuned libraries such as cuBLAS for linear algebra and cuDNN for neural networks. Because it is proprietary to NVIDIA hardware, CUDA competes with the cross-vendor OpenCL and with newer portable frameworks, but its mature tooling made it the de facto standard for GPU computing.2
Where it fits
CUDA is the bridge that turned the GPU from a graphics device into a general accelerator (see GPGPU), and it underpins most modern AI accelerator workloads on NVIDIA hardware. For a signal-processing pipeline like GopherTrunk, a CUDA kernel can run massively parallel work — large FFTs across many channels, or batched filtering — far faster than a CPU, though for a handful of narrowband channels the data-transfer overhead to the GPU often outweighs the gain.