Linux Kernel Memory Management Internals

The Linux memory management subsystem is responsible for every byte of RAM on the system. It partitions physical memory into zones, maintains free-page lists per node and zone, routes allocation requests through a hierarchy of allocators, and reclaims pages under pressure. Understanding these layers is essential before writing kernel code that allocates or frees memory.

Memory zones

Linux divides physical memory into zones whose boundaries are set by architectural constraints. Each zone is described by struct zone and belongs to a pg_data_t node.

Zone	Description
`ZONE_DMA`	Memory below 16 MB on x86. Required by legacy ISA DMA devices. Enabled with `CONFIG_ZONE_DMA`.
`ZONE_DMA32`	Memory addressable by 32-bit DMA engines on 64-bit platforms. Enabled with `CONFIG_ZONE_DMA32`.
`ZONE_NORMAL`	Directly mapped kernel memory. Always enabled and the most performance-critical zone.
`ZONE_HIGHMEM`	Physical memory not permanently mapped by the kernel on 32-bit architectures. Enabled with `CONFIG_HIGHMEM`.
`ZONE_MOVABLE`	Normal memory whose pages may be migrated or reclaimed, used mainly for memory hot-plug.
`ZONE_DEVICE`	Memory residing on devices such as persistent memory (PMEM) and GPU. Enabled with `CONFIG_ZONE_DEVICE`.

On a 32-bit x86 system with 2 GB of RAM the layout looks like this:

/* Documentation/mm/physical_memory.rst */
0         16M                    896M                        2G
+----------+-----------------------+--------------------------+
| ZONE_DMA |      ZONE_NORMAL      |       ZONE_HIGHMEM       |
+----------+-----------------------+--------------------------+

Many kernel operations require ZONE_NORMAL memory. Requesting memory from ZONE_DMA or ZONE_DMA32 exhausts a scarce resource — avoid those zones unless the hardware explicitly requires them.

GFP flags

Every allocation in the kernel carries a gfp_t bitmask that tells the allocator which zones are acceptable, whether it may sleep, and how hard it should try to reclaim memory. The acronym stands for get free pages, the name of the underlying page allocator function. The high-level composite flags defined in include/linux/gfp_types.h are:

/* include/linux/gfp_types.h */
#define GFP_ATOMIC   (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)
#define GFP_KERNEL   (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
#define GFP_NOIO     (__GFP_RECLAIM)
#define GFP_NOFS     (__GFP_RECLAIM | __GFP_IO)
#define GFP_NOWAIT   (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN)
#define GFP_DMA      __GFP_DMA
#define GFP_DMA32    __GFP_DMA32

GFP_KERNEL

The default flag for most kernel allocations. The allocator may sleep, perform direct reclaim, start I/O, and call into the filesystem. Use this in any process context that can block.

GFP_ATOMIC

Must not sleep. Grants access to atomic reserves via __GFP_HIGH. Use inside interrupt handlers, spinlock-protected sections, or any context where sleeping is forbidden.

GFP_NOIO

May reclaim clean pages but must not start physical I/O. Use inside block-layer code paths to prevent recursion into I/O submission.

GFP_NOFS

May start physical I/O but must not call into the filesystem. Use inside filesystem code to prevent re-entrancy.

GFP_DMA / GFP_DMA32

Restrict allocation to ZONE_DMA or ZONE_DMA32. Use only when the hardware cannot address higher memory.

Useful modifier flags include __GFP_ZERO (return a zeroed allocation), __GFP_NOWARN (suppress allocation failure messages), __GFP_NOFAIL (retry infinitely — use with extreme care), and __GFP_NORETRY (fail quickly without invoking the OOM killer).

Page allocator

The page allocator (also called the buddy allocator) is the lowest-level allocator. It works in units of physically contiguous pages, with sizes expressed as a power-of-two order.

/* include/linux/gfp.h */
struct page *alloc_pages(gfp_t gfp_mask, unsigned int order);
unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
unsigned long get_zeroed_page(gfp_t gfp_mask);

void __free_pages(struct page *page, unsigned int order);
void free_pages(unsigned long addr, unsigned int order);

alloc_pages() returns a struct page * pointer; __get_free_pages() returns the virtual address of the first page. order is the log₂ of the number of pages — order 0 is one page, order 1 is two pages, order 2 is four pages, and so on.

Order-0 is the only allocation order that is always expected to succeed under memory pressure. Higher orders may fail even with __GFP_NOFAIL. If you need large contiguous buffers, consider vmalloc or the DMA API instead.

NUMA-aware page allocation

On NUMA systems you can constrain the page allocator to a specific node:

/* include/linux/gfp.h */
struct page *alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order);

Pass NUMA_NO_NODE to allow allocation from the current CPU’s local node with automatic fallback. Use numa_node_id() or cpu_to_node() to obtain the node ID of the calling CPU. Linux builds an ordered zonelist per node so that, when the local zone overflows, it falls back to the nearest node before trying remote nodes.

Slab allocator

The slab allocator sits above the page allocator and carves pages into fixed-size object caches. The current implementation is SLUB. It reduces fragmentation, improves cache locality through per-CPU free lists, and optionally validates allocation/free patterns when CONFIG_SLUB_DEBUG is enabled.

kmalloc and friends

kmalloc is the general-purpose slab allocation function, defined in include/linux/slab.h:

/* include/linux/slab.h */
void *kmalloc(size_t size, gfp_t flags);
void *kzalloc(size_t size, gfp_t flags);    /* zero-initialised */
void *krealloc(const void *p, size_t new_size, gfp_t flags);
void  kfree(const void *objp);
void  kfree_sensitive(const void *objp);    /* zeroes memory before freeing */
size_t ksize(const void *objp);             /* actual usable size */

kzalloc is equivalent to kmalloc(...) + memset(0) but expressed as a single call. Prefer it whenever you need a zeroed buffer.

/* Typical allocation pattern */
struct my_data *data = kzalloc(sizeof(*data), GFP_KERNEL);
if (!data)
    return -ENOMEM;

/* ... use data ... */

kfree(data);

Always check the return value of kmalloc/kzalloc. Under memory pressure both can return NULL even with GFP_KERNEL. Do not use __GFP_NOFAIL to paper over missing error handling.

kmalloc has a maximum allocation size of KMALLOC_MAX_SIZE (1UL << KMALLOC_SHIFT_MAX). SLUB maps requests directly to a slab cache for sizes up to KMALLOC_MAX_CACHE_SIZE (two pages); larger requests fall through to the page allocator.

Per-type object caches

When you allocate many objects of the same type, create a dedicated cache with kmem_cache_create:

/* include/linux/slab.h — public macro, two calling conventions: */
/* New form:    kmem_cache_create(name, object_size, args, flags) */
/* Legacy form: kmem_cache_create(name, object_size, align, flags, ctor) */
struct kmem_cache *kmem_cache_create(const char *name,
                                     unsigned int object_size,
                                     struct kmem_cache_args *args, /* or align */
                                     slab_flags_t flags);

void  kmem_cache_destroy(struct kmem_cache *s);
void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags);
void  kmem_cache_free(struct kmem_cache *s, void *objp);

The KMEM_CACHE macro provides a convenient shorthand:

/* include/linux/slab.h */
#define KMEM_CACHE(__struct, __flags)   \
    __kmem_cache_create_args(#__struct, sizeof(struct __struct), \
            &(struct kmem_cache_args) {                          \
                .align = __alignof__(struct __struct),           \
            }, (__flags))

/* Usage */
static struct kmem_cache *my_cache;

my_cache = KMEM_CACHE(my_struct, SLAB_HWCACHE_ALIGN);

struct my_struct *obj = kmem_cache_alloc(my_cache, GFP_KERNEL);
kmem_cache_free(my_cache, obj);
kmem_cache_destroy(my_cache);

Commonly used slab flags:

Flag	Effect
`SLAB_HWCACHE_ALIGN`	Align objects on cache-line boundaries.
`SLAB_PANIC`	Panic on allocation failure during cache creation.
`SLAB_TYPESAFE_BY_RCU`	Delay page freeing by an RCU grace period (does not delay object freeing).
`SLAB_RECLAIM_ACCOUNT`	Objects are reclaimable; pages are charged to `SReclaimable` in `/proc/meminfo`.

vmalloc

vmalloc allocates virtually contiguous but physically non-contiguous memory. It is slower than kmalloc because it requires individual page allocations and must map them together with ioremap-style page table entries.

/* include/linux/vmalloc.h */
void *vmalloc(unsigned long size);
void *vzalloc(unsigned long size);   /* zero-initialised */
void  vfree(const void *addr);

Use vmalloc when:

The allocation is large (many megabytes) and physical contiguity is not required.
The allocation is long-lived and kmalloc would fragment the buddy allocator.
You need to map I/O memory or firmware buffers into the kernel virtual address space.

Memory returned by vmalloc cannot be used for DMA. DMA engines require physically contiguous memory; use the DMA API (dma_alloc_coherent) instead.

Memory reclaim and the OOM killer

When free memory falls below a watermark, the kernel attempts to reclaim pages:

kswapd wakes

When the low watermark of a zone is crossed, kswapd (the background reclaim daemon) wakes and scans the LRU lists looking for reclaimable pages.

Direct reclaim

If kswapd cannot keep up and an allocation is failing, the allocating task itself enters direct reclaim. This is triggered by GFP_KERNEL (and other flags that include __GFP_DIRECT_RECLAIM).

OOM killer

If direct reclaim fails to free enough memory after repeated retries, the OOM (out-of-memory) killer selects a task to kill based on oom_score. The selected process receives SIGKILL, freeing its memory and allowing the stalled allocation to proceed.

You can inspect OOM scoring via /proc/<pid>/oom_score and influence it with /proc/<pid>/oom_score_adj (range −1000 to +1000). Kernel threads and processes that set oom_score_adj to −1000 are protected from the OOM killer.

Locking and concurrency

Learn which locking primitives to use when protecting shared memory structures.

Networking stack

See how sk_buff manages packet memory and interacts with the allocator.

Get Started

Development Guide

Kernel Internals

Driver Development

Administration

Contributing

Linux Kernel Memory Management Internals

Memory zones

GFP flags

Page allocator

NUMA-aware page allocation

Slab allocator

kmalloc and friends

Per-type object caches

vmalloc

Memory reclaim and the OOM killer

Further reading

Locking and concurrency

Networking stack

Build docs developers (and LLMs) love

Get Started

Development Guide

Kernel Internals

Driver Development

Administration

Contributing

Documentation Index

​Memory zones

​GFP flags

​Page allocator

​NUMA-aware page allocation

​Slab allocator

​kmalloc and friends

​Per-type object caches

​vmalloc

​Memory reclaim and the OOM killer

​Further reading

Locking and concurrency

Networking stack

Build docs developers (and LLMs) love

Memory zones

GFP flags

Page allocator

NUMA-aware page allocation

Slab allocator

kmalloc and friends

Per-type object caches

vmalloc

Memory reclaim and the OOM killer

Further reading