GPU Mode is an open community dedicated to teaching GPU programming for AI and machine learning. The lecture series covers everything from CUDA fundamentals and memory architecture to cutting-edge topics like Flash Attention, quantized training, and multi-GPU communication — all delivered by engineers and researchers from NVIDIA, Meta, Google, and more.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/gpu-mode/lectures/llms.txt
Use this file to discover all available pages before exploring further.
Get Set Up
Install PyTorch, CUDA, and the tools you need to run lecture code locally
Profiling & CUDA in PyTorch
Lecture 1: Profile GPU kernels and integrate CUDA into PyTorch
CUDA Fundamentals
Start from the PMPP book and build up to real kernel writing
Flash Attention
Understand the IO-aware attention algorithm that powers modern LLMs
What you’ll find here
GPU Mode lectures span the full stack of GPU programming — from writing your first CUDA kernel to optimizing inference for production LLMs. Each lecture includes slides, code, and often a runnable Jupyter notebook or Colab link.CUDA Fundamentals
Threads, blocks, memory hierarchy, reductions, and performance checklists
Advanced GPU Programming
Flash Attention, fused kernels, tensor cores, CUTLASS, and SASS
Triton & Frameworks
Write GPU kernels in Python with Triton, including internals deep-dives
Quantization & Optimization
INT8, low-bit training, BitBLAS, and numerical precision for AI
Multi-GPU & Systems
NCCL collectives, speculative decoding, SGLang, and distributed training
Hardware Targets
Apple Metal, ARM low-bit kernels, and AMD ROCm / Composable Kernel
Community
GPU Mode is built around a Discord community where practitioners share kernels, benchmarks, and research. Join the conversation and follow along with new lectures on the YouTube channel.Discord Community
Join thousands of GPU programmers on the GPU Mode Discord
YouTube Channel
Watch lecture recordings on the GPU Mode YouTube channel
Recommended learning path
Set up your environment
Follow the setup guide to install CUDA, PyTorch, and profiling tools.
Start with CUDA fundamentals
Work through PMPP overview, memory architecture, and the performance checklist.
Write kernels in Triton
The Practitioner’s Guide to Triton gets you writing GPU kernels in Python quickly.
Explore advanced topics
Dive into Flash Attention, quantization, or multi-GPU systems depending on your goals.