Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/DeelerDev/linux/llms.txt

Use this file to discover all available pages before exploring further.

The Linux kernel implements security as a layered system built from multiple complementary mechanisms. Rather than relying on any single control, it combines discretionary access control, POSIX capabilities, namespace isolation, syscall filtering, and the Linux Security Modules framework to achieve defense in depth. Understanding how these layers interact is essential for anyone building, configuring, or auditing a Linux-based system.

Discretionary access control

Linux inherits UNIX discretionary access control (DAC): every file and process has an owner (UID) and a group (GID), and permissions are enforced based on three classes — owner, group, and others. The kernel evaluates read, write, and execute bits on every filesystem access. Beyond the traditional UNIX permission mask, Linux supports POSIX Access Control Lists (ACLs), which allow filesystem objects to carry per-user and per-group permission entries that are more expressive than the three fixed classes. DAC is the baseline check that runs before any MAC policy. Even with a mandatory access control module active, DAC denials are enforced first.

Linux capabilities

The traditional root/non-root split is too coarse for production systems. A process running as UID 0 has unrestricted access to the kernel; a process running as any other UID has almost none. The Linux capabilities system divides the privileges historically associated with root into discrete units that can be granted independently.
CapabilityPurpose
CAP_NET_ADMINConfigure network interfaces, routing tables, and firewall rules
CAP_SYS_ADMINA wide range of administrative operations — treat as near-equivalent to root
CAP_SYS_PTRACETrace or inspect arbitrary processes
CAP_DAC_OVERRIDEBypass file read, write, and execute permission checks
CAP_SETUID / CAP_SETGIDChange UID/GID of the current process
CAP_NET_BIND_SERVICEBind to TCP/UDP ports below 1024
CAP_SYS_MODULELoad and unload kernel modules
CAP_SYS_CHROOTUse chroot()
CAP_AUDIT_WRITEWrite records to the kernel audit log
Each task carries four sets:
  • Permitted — the capabilities the process may grant to its effective set
  • Effective — the capabilities currently active and checked by the kernel
  • Inheritable — the capabilities that may be passed across execve()
  • Bounding — an upper bound that limits which capabilities can be inherited across execve(), particularly relevant when executing setuid-root binaries
The capset() system call and file capability extended attributes (security.capability) allow fine-grained privilege assignment without requiring a setuid binary.
Avoid granting CAP_SYS_ADMIN to containers or services unless strictly necessary. Its scope is so broad that it effectively restores root-level kernel access.

Namespaces as security boundaries

Linux namespaces allow the kernel to present different views of global resources to different sets of processes. They are the foundation of container isolation.
User namespaces map UIDs and GIDs inside the namespace to a different range outside. A process that appears to be UID 0 inside a user namespace has no kernel-level privilege outside it. This allows unprivileged users to create isolated environments without requiring CAP_SYS_ADMIN in the root namespace.
PID namespaces give a process tree its own PID numbering. Process 1 inside the namespace is the namespace init — if it exits, all other processes in the namespace receive SIGKILL. Processes in the parent namespace can still see and signal namespace-internal processes via their global PIDs.
Each network namespace has its own network interfaces, routing tables, firewall rules, and socket table. Containers use this to provide isolated network stacks. The veth device pair is the standard mechanism for connecting a namespace to the root network.
Mount namespaces provide an independent view of the filesystem hierarchy. UTS namespaces give each namespace its own hostname. IPC namespaces isolate System V IPC objects and POSIX message queues.
User namespaces significantly expand the kernel attack surface available to unprivileged users. Some distributions restrict unprivileged user namespace creation via kernel.unprivileged_userns_clone or user.max_user_namespaces.

Seccomp for syscall filtering

Seccomp (secure computing) lets a process restrict the system calls it can make. The BPF-based filter mode (SECCOMP_MODE_FILTER) allows a process to install a Berkeley Packet Filter program that inspects each syscall number and its arguments, then returns an action.
/* Install a seccomp filter using prctl */
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
The process must call prctl(PR_SET_NO_NEW_PRIVS, 1) first (or hold CAP_SYS_ADMIN) to prevent privilege escalation through the filter. A filter can return one of:
Return valueEffect
SECCOMP_RET_ALLOWSyscall proceeds normally
SECCOMP_RET_ERRNOSyscall returns the specified errno
SECCOMP_RET_KILL_PROCESSProcess exits immediately with SIGSYS
SECCOMP_RET_TRAPKernel sends SIGSYS to the process
SECCOMP_RET_USER_NOTIFNotification sent to a supervisor process
Filters are inherited across fork() and execve(), and additional filters can be layered — the strictest matching rule always wins. Seccomp is most effective when combined with an LSM policy.

Linux Security Modules framework

The Linux Security Modules (LSM) framework provides a hook-based infrastructure for implementing mandatory access control and other security policies inside the kernel. At critical points in the kernel — file opens, socket connections, process credential changes — LSM calls registered hook functions that can allow or deny the operation. The active LSMs on a running system are listed at:
cat /sys/kernel/security/lsm
A typical output might be:
capability,landlock,lockdown,yama,selinux
See Linux Security Modules Framework for a detailed explanation of how the framework works, how to configure individual LSMs, and how to write a custom LSM.

Available LSMs

SELinux

Policy-based mandatory access control developed by the NSA. Every subject and object receives a label; a policy database determines which label combinations are allowed. Used by default on RHEL, Fedora, and Android.

AppArmor

Path-based MAC that confines programs to a declared set of files, capabilities, and network access via per-application profiles. Used by default on Ubuntu and Debian.

Smack

Simplified Mandatory Access Control Kernel. Uses short labels on files and processes; a simple rule set controls which labels can read or write to which other labels.

TOMOYO

Pathname-based MAC that builds a learning profile of allowed operations over time and can then enforce that profile. Focused on reducing false positives via training mode.

Landlock

User-space sandboxing API. Any process — including unprivileged ones — can voluntarily restrict its own filesystem and network access by constructing and enforcing a ruleset.

BPF LSM

Allows eBPF programs to be attached to LSM hooks, enabling policy enforcement that can be loaded and updated at runtime without recompiling the kernel.

Kernel hardening mechanisms

Beyond access control, the kernel contains a suite of self-protection mechanisms that make exploitation of kernel bugs significantly harder.
  • KASLR (CONFIG_RANDOMIZE_BASE) — randomizes the kernel’s load address at each boot so that an attacker cannot rely on fixed memory addresses.
  • SMEP — Supervisor Mode Execution Prevention, a CPU feature that prevents the kernel from executing code in user-space pages.
  • SMAP — Supervisor Mode Access Prevention, a CPU feature that prevents the kernel from reading or writing user-space memory without explicit intent.
  • Stack canaries (CONFIG_STACKPROTECTOR) — a secret value placed between stack variables and the return address that is checked before a function returns.
  • KASAN — Kernel Address Sanitizer, a runtime detector for out-of-bounds accesses and use-after-free bugs.
  • CONFIG_STRICT_KERNEL_RWX — enforces that kernel text is not writable and kernel data is not executable.
See Kernel Self-Protection and Hardening for a full breakdown of hardening Kconfig options and deployment guidance.

Reporting vulnerabilities

The Linux kernel security team handles embargoed vulnerability reports at security@kernel.org. Reports should include a description of the vulnerability, affected kernel versions, and reproduction steps if available. The team coordinates with distributors and upstream maintainers to prepare fixes before public disclosure.

LSM framework

How LSM hooks work, configuring SELinux and AppArmor, and writing a custom LSM.

Kernel hardening

KASLR, stack protection, heap integrity, and the Kconfig hardening checklist.

Build docs developers (and LLMs) love