Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/DeelerDev/linux/llms.txt

Use this file to discover all available pages before exploring further.

Kernel self-protection is the design and implementation of mechanisms within the Linux kernel that reduce the impact of security flaws in the kernel itself. The core insight, from the kernel’s own documentation, is that the worst-case attacker has arbitrary read and write access to kernel memory. If defenses hold under that assumption, they will also hold under the more limited access that most real-world bugs provide. The goals for a self-protection system are that it is effective, on by default, requires no opt-in by developers, has no measurable performance impact, does not impede kernel debugging, and has tests. Meeting all of these simultaneously is uncommon, but they remain the standard against which proposals are evaluated.

Defense-in-depth philosophy

No single mechanism is sufficient. Kernel hardening is applied in layers: some mechanisms prevent bugs from being exploitable at all (read-only memory, strict RWX), some raise the cost of exploitation (KASLR, stack canaries), some detect exploitation in progress (KASAN, KFENCE), and some limit what a successful exploit can do (seccomp, namespaces, LSM policy). This document covers the mechanisms applied at the kernel level itself.
Kernel hardening options are independent of userspace access control. They protect the kernel from exploitation of its own bugs — whether triggered by an unprivileged local attacker or a privileged one who has loaded a malicious module.

Strict kernel memory permissions

Executable code and read-only data

The most direct way to prevent an attacker from redirecting kernel execution is to ensure that kernel code pages are never writable and kernel data pages are never executable. CONFIG_STRICT_KERNEL_RWX enforces this split:
  • Kernel text (.text) and read-only data (.rodata) are mapped non-writable.
  • Kernel data (.data, .bss) is mapped non-executable.
  • Module code and data are treated the same way via CONFIG_STRICT_MODULE_RWX.
Most architectures enable these options by default and do not expose a Kconfig prompt. A small number of architectures (arm) allow opt-out via ARCH_OPTIONAL_KERNEL_RWX.

Immutable function pointers

Kernel data structures contain many function pointer tables (file operations, network protocol handlers, descriptor tables). These are prime targets for overwrite attacks. Variables that are set once at boot can be marked __ro_after_init, which places them in a region that becomes read-only after kernel initialization completes:
static struct file_operations my_fops __ro_after_init = {
    .read  = my_read,
    .write = my_write,
};
Variables that are initialized at __init time and then constant for the rest of the kernel’s life should use this attribute. It prevents runtime overwrite without the overhead of cryptographic integrity checking.

KASLR — kernel address space layout randomization

Since knowing the address of kernel code or data structures is a prerequisite for most kernel exploits, randomizing those addresses at boot raises the cost of an attack significantly. CONFIG_RANDOMIZE_BASE randomizes the physical and virtual load address of the kernel at each boot. Even if an attacker knows the kernel version, they cannot assume the base address. The module loading address is offset separately, so a fixed module load order does not reveal the kernel base.
# The randomized kernel base is visible to root via /proc/kallsyms
# On hardened systems, kptr_restrict suppresses this
sudo cat /proc/kallsyms | grep " T _text"
KASLR also applies to:
  • Stack base — the kernel stack base varies between processes and can vary between syscalls, making stack-targeted attacks harder to aim.
  • Dynamic memory base — the base addresses for kmalloc and vmalloc regions are randomized between boots, frustrating layout-dependent heap exploits.
  • Structure layout — with CONFIG_RANDSTRUCT, the field order of sensitive kernel structures is randomized per build. An exploit tuned to one kernel build will fail on another.
KASLR is a probabilistic defense. If an attacker can leak a kernel address (e.g., via a /proc file, an uninitialized memory read, or a format string bug), KASLR is defeated for that session. Pair KASLR with kptr_restrict and other information-exposure mitigations.

Stack protection

Stack canaries

The classic stack buffer overflow overwrites the saved return address on the stack. A stack canary is a secret value placed between local variables and the return address; the compiler inserts a check before every function return. If the canary has been overwritten, the kernel panics rather than executing attacker-controlled code. CONFIG_STACKPROTECTOR enables basic stack canaries. CONFIG_STACKPROTECTOR_STRONG extends protection to functions with any array, structure, or union on the stack — not just those with character arrays — at a small performance cost. STRONG is the recommended setting for production kernels.
grep CONFIG_STACKPROTECTOR /boot/config-$(uname -r)
# CONFIG_STACKPROTECTOR_STRONG=y

Shadow call stack

On AArch64 kernels with CONFIG_SHADOW_CALL_STACK, the return address is also saved in a separate shadow stack that is not accessible via normal memory writes. A corrupted in-memory return address is caught when it diverges from the shadow stack copy.

Stack depth overflow

A stack depth overflow (unbounded recursion or large stack allocations) can write past the bottom of the preallocated kernel stack into adjacent memory. The thread_info structure has been moved off the stack on most architectures, and a faulting guard page (CONFIG_VMAP_STACK) catches overflows before they corrupt other objects.

Heap protection

KASAN

The Kernel Address Sanitizer (CONFIG_KASAN) instruments memory allocations to detect out-of-bounds reads and writes and use-after-free accesses at runtime. It maintains a shadow memory region that tracks the valid state of every byte of heap memory, and traps on any access to invalid memory. KASAN has a significant memory and performance overhead (typically 2-3x slowdown and substantial memory cost), so it is used in testing and CI environments rather than production. It is invaluable for finding heap memory bugs before they become exploits.
# KASAN reports appear in the kernel log
dmesg | grep -A 30 "BUG: KASAN"

KFENCE

CONFIG_KFENCE (Kernel Electric Fence) is a lightweight, production-usable alternative to KASAN. It uses a probabilistic sampling approach: a small fraction of allocations are placed in specially guarded pages so that any out-of-bounds access or use-after-free triggers an immediate trap. The performance overhead is negligible, making KFENCE suitable for enabling in production kernels.

Slab hardening

CONFIG_SLAB_FREELIST_RANDOM randomizes the order of free objects in the slab allocator’s per-CPU freelist, frustrating predictable heap spray attacks. CONFIG_SLAB_FREELIST_HARDENED adds integrity metadata to freelist pointers so that a corrupted pointer is detected before use. Memory poisoning — clearing freed allocations with a known pattern — prevents use-after-free attacks from reading stale contents. CONFIG_KSTACK_ERASE clears the kernel stack on syscall return.

CPU-assisted protections

Modern x86 processors provide hardware enforcement of kernel/userspace separation:
SMEP, enabled by the SMEP bit in CR4, causes a general protection fault if the CPU attempts to execute an instruction from a user-space page while in supervisor mode. This prevents the classic attack of mapping shellcode in user space and redirecting a kernel function pointer to it. Linux enables SMEP unconditionally on CPUs that support it.
SMAP, enabled by the SMAP bit in CR4, causes a fault if the kernel reads or writes user-space memory without first setting the AC (access control) flag in EFLAGS via stac/clac instructions. This prevents the kernel from being tricked into dereferencing attacker-controlled user-space pointers without explicit intent. copy_to_user() and copy_from_user() handle the flag manipulation correctly; direct dereferences of user pointers would fault. Linux enables SMAP unconditionally on supporting CPUs.
Intel CET provides two complementary mechanisms: Indirect Branch Tracking (IBT), which ensures that indirect calls and jumps land only on ENDBR instructions, and Shadow Stack (SHSTK), which maintains a separate hardware-protected stack of return addresses. Linux has experimental support for CET in later versions.
On ARM, Privileged Execute Never (PXN) prevents the kernel from executing user-space pages (equivalent to SMEP), and Privileged Access Never (PAN) prevents the kernel from directly accessing user-space memory (equivalent to SMAP). Linux enables both on supporting ARM hardware.

Control-flow integrity

Control-flow integrity (CFI) enforces that indirect calls and returns can only target legitimate destinations. With CONFIG_CFI_CLANG, the Clang compiler instruments every indirect call to check that the target function’s type signature matches the call site. A type-confused function pointer dereference (a common exploitation technique) triggers a kernel panic instead of executing attacker-controlled code. CFI requires the kernel to be built with Clang and is currently most mature on AArch64. It provides stronger guarantees than stack canaries for some classes of control-flow hijack.

Seccomp filter deployment

Seccomp is covered in more detail in the security overview, but from a hardening perspective, the key deployment patterns are:
  • Apply seccomp filters as early as possible in a process’s lifecycle, before any untrusted input is processed.
  • Use SECCOMP_RET_KILL_PROCESS for truly disallowed syscalls to prevent partial-execution attacks.
  • Always check the arch field in struct seccomp_data before matching on syscall number — on multi-ABI architectures (x86-64 with compat), syscall numbers overlap between calling conventions.

Preventing kernel pointer leaks

kptr_restrict

The kptr_restrict sysctl controls whether the %p and %pK printk format specifiers print raw kernel addresses or suppress them:
# 0: print raw addresses (default on some distros, insecure)
# 1: suppress addresses from unprivileged reads (/proc, /sys)
# 2: suppress addresses from all users including root
sysctl kernel.kptr_restrict=1
Set kptr_restrict=1 at minimum in production. This prevents an unprivileged attacker from learning kernel addresses via /proc/kallsyms or similar interfaces, preserving KASLR’s effectiveness.

Suppressing raw address printing

Code that writes to user-readable files (/proc, /sys, seq_file-backed files) should use %pK instead of %px or %p for addresses, and should avoid printing raw addresses unless the information is necessary and access is restricted to root. As of kernel 4.15, the plain %p specifier hashes the address before printing. Use %px only in contexts where the raw address is genuinely needed for debugging and the file is not user-readable.

Hardening Kconfig checklist

Use this checklist when configuring a kernel for a security-sensitive environment. Options marked with a * are typically enabled by default on mainstream distributions.
# Strict memory permissions
CONFIG_STRICT_KERNEL_RWX=y          # * kernel text/data RWX split
CONFIG_STRICT_MODULE_RWX=y          # * module text/data RWX split
CONFIG_DEBUG_RODATA=y               # read-only data enforcement (older kernels)

# Stack protection
CONFIG_STACKPROTECTOR_STRONG=y      # * stack canaries for most functions
CONFIG_VMAP_STACK=y                 # * guard page at stack bottom

# Heap integrity
CONFIG_SLAB_FREELIST_RANDOM=y       # randomize freelist order
CONFIG_SLAB_FREELIST_HARDENED=y     # harden freelist pointers
CONFIG_KSTACK_ERASE=y               # clear kernel stack on syscall return

# Address randomization
CONFIG_RANDOMIZE_BASE=y             # * KASLR
CONFIG_RANDOMIZE_MEMORY=y           # randomize vmalloc/vmemmap base

# Userspace access hardening
CONFIG_HARDENED_USERCOPY=y          # * bounds-check copy_to/from_user
CONFIG_FORTIFY_SOURCE=y             # * compile-time buffer overflow detection

# Control-flow integrity (Clang builds)
CONFIG_CFI_CLANG=y                  # indirect call type checking

# Information exposure
CONFIG_SECURITY_DMESG_RESTRICT=y    # restrict dmesg to root

# Production memory error detection
CONFIG_KFENCE=y                     # lightweight use-after-free/OOB detection
Run kernel-hardening-checker (a community tool) against your .config to get a scored summary of your kernel’s hardening posture against a curated checklist derived from Kernel Self Protection Project recommendations.

Security architecture overview

The full security layer stack: DAC, capabilities, namespaces, seccomp, and LSMs.

LSM framework

How LSM hooks work, configuring SELinux and AppArmor, and writing a custom LSM.

Build docs developers (and LLMs) love