Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/gpu-mode/lectures/llms.txt

Use this file to discover all available pages before exploring further.

GPU Mode is an open community dedicated to teaching GPU programming for AI and machine learning. The lecture series covers everything from CUDA fundamentals and memory architecture to cutting-edge topics like Flash Attention, quantized training, and multi-GPU communication — all delivered by engineers and researchers from NVIDIA, Meta, Google, and more.

Get Set Up

Install PyTorch, CUDA, and the tools you need to run lecture code locally

Profiling & CUDA in PyTorch

Lecture 1: Profile GPU kernels and integrate CUDA into PyTorch

CUDA Fundamentals

Start from the PMPP book and build up to real kernel writing

Flash Attention

Understand the IO-aware attention algorithm that powers modern LLMs

What you’ll find here

GPU Mode lectures span the full stack of GPU programming — from writing your first CUDA kernel to optimizing inference for production LLMs. Each lecture includes slides, code, and often a runnable Jupyter notebook or Colab link.

CUDA Fundamentals

Threads, blocks, memory hierarchy, reductions, and performance checklists

Advanced GPU Programming

Flash Attention, fused kernels, tensor cores, CUTLASS, and SASS

Triton & Frameworks

Write GPU kernels in Python with Triton, including internals deep-dives

Quantization & Optimization

INT8, low-bit training, BitBLAS, and numerical precision for AI

Multi-GPU & Systems

NCCL collectives, speculative decoding, SGLang, and distributed training

Hardware Targets

Apple Metal, ARM low-bit kernels, and AMD ROCm / Composable Kernel

Community

GPU Mode is built around a Discord community where practitioners share kernels, benchmarks, and research. Join the conversation and follow along with new lectures on the YouTube channel.

Discord Community

Join thousands of GPU programmers on the GPU Mode Discord

YouTube Channel

Watch lecture recordings on the GPU Mode YouTube channel
1

Set up your environment

Follow the setup guide to install CUDA, PyTorch, and profiling tools.
2

Start with CUDA fundamentals

3

Write kernels in Triton

The Practitioner’s Guide to Triton gets you writing GPU kernels in Python quickly.
4

Explore advanced topics

Dive into Flash Attention, quantization, or multi-GPU systems depending on your goals.

Build docs developers (and LLMs) love