GPU Mode Lectures: Master GPU Programming for AI

GPU Mode is an open community dedicated to teaching GPU programming for AI and machine learning. The lecture series covers everything from CUDA fundamentals and memory architecture to cutting-edge topics like Flash Attention, quantized training, and multi-GPU communication — all delivered by engineers and researchers from NVIDIA, Meta, Google, and more.

Get Set Up

Install PyTorch, CUDA, and the tools you need to run lecture code locally

Profiling & CUDA in PyTorch

Lecture 1: Profile GPU kernels and integrate CUDA into PyTorch

CUDA Fundamentals

Start from the PMPP book and build up to real kernel writing

Flash Attention

Understand the IO-aware attention algorithm that powers modern LLMs

What you’ll find here

GPU Mode lectures span the full stack of GPU programming — from writing your first CUDA kernel to optimizing inference for production LLMs. Each lecture includes slides, code, and often a runnable Jupyter notebook or Colab link.

CUDA Fundamentals

Threads, blocks, memory hierarchy, reductions, and performance checklists

Advanced GPU Programming

Flash Attention, fused kernels, tensor cores, CUTLASS, and SASS

Triton & Frameworks

Write GPU kernels in Python with Triton, including internals deep-dives

Quantization & Optimization

INT8, low-bit training, BitBLAS, and numerical precision for AI

Multi-GPU & Systems

NCCL collectives, speculative decoding, SGLang, and distributed training

Hardware Targets

Apple Metal, ARM low-bit kernels, and AMD ROCm / Composable Kernel

Community

GPU Mode is built around a Discord community where practitioners share kernels, benchmarks, and research. Join the conversation and follow along with new lectures on the YouTube channel.

Discord Community

Join thousands of GPU programmers on the GPU Mode Discord

YouTube Channel

Watch lecture recordings on the GPU Mode YouTube channel

Recommended learning path

Set up your environment

Follow the setup guide to install CUDA, PyTorch, and profiling tools.

Start with CUDA fundamentals

Work through PMPP overview, memory architecture, and the performance checklist.

Write kernels in Triton

The Practitioner’s Guide to Triton gets you writing GPU kernels in Python quickly.

Explore advanced topics

Dive into Flash Attention, quantization, or multi-GPU systems depending on your goals.

Set up your GPU programming environment

Getting Started

CUDA Fundamentals

Advanced GPU Programming

Triton & High-Level Frameworks

Quantization & Optimization

Multi-GPU & Systems

Hardware Targets

ScaleML Series

GPU Mode Lectures: Master GPU Programming for AI

Get Set Up

Profiling & CUDA in PyTorch

CUDA Fundamentals

Flash Attention

What you’ll find here

CUDA Fundamentals

Advanced GPU Programming

Triton & Frameworks

Quantization & Optimization

Multi-GPU & Systems

Hardware Targets

Community

Discord Community

YouTube Channel

Recommended learning path

Build docs developers (and LLMs) love

Getting Started

CUDA Fundamentals

Advanced GPU Programming

Triton & High-Level Frameworks

Quantization & Optimization

Multi-GPU & Systems

Hardware Targets

ScaleML Series

Documentation Index

Get Set Up

Profiling & CUDA in PyTorch

CUDA Fundamentals

Flash Attention

​What you’ll find here

CUDA Fundamentals

Advanced GPU Programming

Triton & Frameworks

Quantization & Optimization

Multi-GPU & Systems

Hardware Targets

​Community

Discord Community

YouTube Channel

​Recommended learning path

Build docs developers (and LLMs) love

What you’ll find here

Community

Recommended learning path