Before running any lecture code, you need a working CUDA environment with PyTorch, Triton, and the NVIDIA profiling tools. This page walks through everything you need from a fresh Linux machine with an NVIDIA GPU.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/gpu-mode/lectures/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- An NVIDIA GPU (Ampere or newer recommended for all lectures)
- Linux (Ubuntu 20.04+ or similar)
- NVIDIA driver installed (
nvidia-smishould run without errors) - Python 3.10+
Installation
Install PyTorch with CUDA
Install PyTorch with the CUDA toolkit bundled. Choose the version that matches your driver.
Replace
cu121 / pytorch-cuda=12.1 with the CUDA version matching your driver. Run nvidia-smi to check — the top-right corner shows the maximum supported CUDA version.Install Triton
Triton is used in several lectures for high-level GPU kernel authoring. It ships with recent PyTorch installs, but you can install it directly:Verify it works:
Install Numba and other dependencies
Some lectures use Numba for CUDA kernel authoring and Matplotlib for visualisations.
Numba requires the CUDA toolkit to be installed separately on the host (not just bundled with PyTorch). Install it via
conda install cudatoolkit or from developer.nvidia.com/cuda-downloads.Profiling tools
Two NVIDIA tools are used throughout the lectures. Both are installed with the CUDA toolkit.NSight Systems (nsys)
NSight Systems provides a timeline view of CPU and GPU activity. It is the right tool for finding where time is spent at a coarse level — which kernels launch, how long they take, and whether the CPU is the bottleneck.
NSight Compute (ncu)
NSight Compute profiles individual CUDA kernels with hardware-counter metrics — memory throughput, occupancy, warp stalls, and more. Use it after nsys has told you which kernel to optimise.
CUDA boilerplate from utils.py
Every lecture that writes a custom CUDA extension starts from the cuda_begin constant in utils.py. It includes the standard headers, input-checking macros, and a GPU error handler:
utils.py — cuda_begin
Running notebooks and Colab
Most lectures are accompanied by Jupyter notebooks.If you don’t have a local NVIDIA GPU, Google Colab provides free T4 access. Open any
.ipynb from the repository in Colab and set the runtime to GPU under Runtime → Change runtime type. The utils.py helpers work unchanged in Colab — clone the repo or copy the file into your Colab session first.