The Linux memory management subsystem is one of the most complex parts of the kernel. It is responsible for mapping physical RAM into a form that processes and the kernel itself can use safely and efficiently — handling everything from raw page frame allocation to per-object caching, NUMA-aware placement, and memory reclaim under pressure.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/DeelerDev/linux/llms.txt
Use this file to discover all available pages before exploring further.
Memory models
Linux abstracts physical memory diversity using one of two memory models selected at build time: FLATMEM and SPARSEMEM. FLATMEM suits non-NUMA systems with contiguous physical memory. A globalmem_map array maps every page frame number (PFN) directly:
struct mem_section. With CONFIG_SPARSEMEM_VMEMMAP enabled, a virtually contiguous vmemmap array makes pfn_to_page() as cheap as an array index.
SPARSEMEM with VMEMMAP is the default on most 64-bit architectures, including x86-64 and arm64.
Memory zones
Within each NUMA node the kernel divides physical memory into zones, each constraining what kind of allocations it satisfies:| Zone | Purpose |
|---|---|
ZONE_DMA | Pages reachable by legacy ISA DMA (first 16 MiB on x86) |
ZONE_DMA32 | Pages reachable by 32-bit DMA devices |
ZONE_NORMAL | Directly mapped kernel memory; the primary allocation zone |
ZONE_HIGHMEM | Memory above the kernel’s direct mapping (32-bit only) |
ZONE_MOVABLE | Physically movable pages for memory hot-remove |
ZONE_DEVICE | Memory-mapped device ranges (persistent memory, GPU memory) |
Page allocator
The buddy allocator is the foundation of physical memory allocation. It tracks free memory in power-of-two orders (order 0 = 4 KiB, order 1 = 8 KiB, …, orderMAX_ORDER = 4 MiB by default).
| Flag | Meaning |
|---|---|
GFP_KERNEL | May sleep; standard kernel allocation |
GFP_ATOMIC | Must not sleep; for interrupt context |
GFP_USER | Userspace allocation; may be reclaimed |
GFP_NOWAIT | Non-blocking; fail rather than wait |
__GFP_ZERO | Zero the allocated page(s) |
__GFP_NOFAIL | Retry until successful (use sparingly) |
Slab allocator
The slab layer sits above the buddy allocator and provides efficient, cache-friendly allocation of fixed-size kernel objects. The current implementation is SLUB (the default since 2.6.23).Virtual memory areas and mm_struct
Every process has anmm_struct that describes its entire virtual address space. Individual mappings — anonymous memory, file-backed pages, stack, heap — are each represented by a struct vm_area_struct (VMA).
mmap(), and munmap().
Page fault handling flow
Page fault handling flow
When a userspace access hits an unmapped or not-present address, the CPU raises a page fault. The kernel’s fault handler (
do_page_fault() on x86) looks up the faulting address in the VMA tree:- If no VMA covers the address → SIGSEGV.
- If the VMA is present but the page table entry is absent → allocate a physical page, map it, and return.
- If the page is in swap → read it back from swap and remap it.
- For copy-on-write (COW) faults → allocate a new page, copy the content, and update the PTE.
mmap() and anonymous memory
mmap() and anonymous memory
mmap(MAP_ANONYMOUS) creates a new VMA without backing storage. Pages are not allocated until first access (demand paging). The kernel uses zero pages for the initial read and copy-on-write for the first write.File-backed mappings (mmap(fd, ...)) integrate with the page cache: the same physical page can be shared between multiple processes mapping the same file.NUMA memory topology
On NUMA systems Linux divides hardware into nodes, each with CPUs and local memory. Allocations default to the node of the CPU executing the request (local allocation), minimising cross-interconnect traffic.numactl(1) or the MPOL_BIND memory policy, and can inspect topology via /sys/devices/system/node/.
Memory reclaim and OOM killer
When free memory falls below a watermark,kswapd wakes and scans the LRU lists for pages to reclaim. The multi-generational LRU (MGLRU, merged in 6.1) tracks access age across multiple generations to make better eviction decisions.
Memory reclaim paths:
- Anonymous pages: written to swap if a swap device exists.
- File-backed clean pages: simply discarded; re-read from disk on next access.
- File-backed dirty pages: written back (flushed) before being freed.
Transparent huge pages
Transparent Huge Pages (THP) allow the kernel to use 2 MiB (or larger) pages for anonymous and file-backed memory without requiring application changes. THP reduces TLB pressure on workloads with large working sets.khugepaged scans for suitable 4 KiB page clusters and promotes them in the background.
DAMON — data access monitor
DAMON (Data Access MONitor) provides lightweight, accurate monitoring of actual memory access patterns. It operates in kernel space but exposes results through a sysfs interface (/sys/kernel/mm/damon/) and can drive memory management actions such as reclaim, THP promotion/demotion, and NUMA migration.
Process scheduling
How the kernel schedules tasks across CPUs using CFS, real-time policies, and NUMA-aware load balancing.
Locking primitives
Spinlocks, mutexes, RCU, and other synchronisation mechanisms used throughout the MM subsystem.
Filesystems
How the page cache and VFS layer interact with filesystem implementations.
Networking
Socket buffers and how the networking stack allocates and manages memory.
