The Linux kernel implements security as a layered system built from multiple complementary mechanisms. Rather than relying on any single control, it combines discretionary access control, POSIX capabilities, namespace isolation, syscall filtering, and the Linux Security Modules framework to achieve defense in depth. Understanding how these layers interact is essential for anyone building, configuring, or auditing a Linux-based system.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/DeelerDev/linux/llms.txt
Use this file to discover all available pages before exploring further.
Discretionary access control
Linux inherits UNIX discretionary access control (DAC): every file and process has an owner (UID) and a group (GID), and permissions are enforced based on three classes — owner, group, and others. The kernel evaluates read, write, and execute bits on every filesystem access. Beyond the traditional UNIX permission mask, Linux supports POSIX Access Control Lists (ACLs), which allow filesystem objects to carry per-user and per-group permission entries that are more expressive than the three fixed classes. DAC is the baseline check that runs before any MAC policy. Even with a mandatory access control module active, DAC denials are enforced first.Linux capabilities
The traditional root/non-root split is too coarse for production systems. A process running as UID 0 has unrestricted access to the kernel; a process running as any other UID has almost none. The Linux capabilities system divides the privileges historically associated with root into discrete units that can be granted independently.Commonly used capabilities
Commonly used capabilities
| Capability | Purpose |
|---|---|
CAP_NET_ADMIN | Configure network interfaces, routing tables, and firewall rules |
CAP_SYS_ADMIN | A wide range of administrative operations — treat as near-equivalent to root |
CAP_SYS_PTRACE | Trace or inspect arbitrary processes |
CAP_DAC_OVERRIDE | Bypass file read, write, and execute permission checks |
CAP_SETUID / CAP_SETGID | Change UID/GID of the current process |
CAP_NET_BIND_SERVICE | Bind to TCP/UDP ports below 1024 |
CAP_SYS_MODULE | Load and unload kernel modules |
CAP_SYS_CHROOT | Use chroot() |
CAP_AUDIT_WRITE | Write records to the kernel audit log |
Capability sets
Capability sets
Each task carries four sets:
- Permitted — the capabilities the process may grant to its effective set
- Effective — the capabilities currently active and checked by the kernel
- Inheritable — the capabilities that may be passed across
execve() - Bounding — an upper bound that limits which capabilities can be inherited across
execve(), particularly relevant when executing setuid-root binaries
capset() system call and file capability extended attributes (security.capability) allow fine-grained privilege assignment without requiring a setuid binary.Avoid granting
CAP_SYS_ADMIN to containers or services unless strictly necessary. Its scope is so broad that it effectively restores root-level kernel access.Namespaces as security boundaries
Linux namespaces allow the kernel to present different views of global resources to different sets of processes. They are the foundation of container isolation.User namespaces
User namespaces
User namespaces map UIDs and GIDs inside the namespace to a different range outside. A process that appears to be UID 0 inside a user namespace has no kernel-level privilege outside it. This allows unprivileged users to create isolated environments without requiring
CAP_SYS_ADMIN in the root namespace.PID namespaces
PID namespaces
PID namespaces give a process tree its own PID numbering. Process 1 inside the namespace is the namespace init — if it exits, all other processes in the namespace receive SIGKILL. Processes in the parent namespace can still see and signal namespace-internal processes via their global PIDs.
Network namespaces
Network namespaces
Each network namespace has its own network interfaces, routing tables, firewall rules, and socket table. Containers use this to provide isolated network stacks. The
veth device pair is the standard mechanism for connecting a namespace to the root network.Mount, UTS, and IPC namespaces
Mount, UTS, and IPC namespaces
Mount namespaces provide an independent view of the filesystem hierarchy. UTS namespaces give each namespace its own hostname. IPC namespaces isolate System V IPC objects and POSIX message queues.
Seccomp for syscall filtering
Seccomp (secure computing) lets a process restrict the system calls it can make. The BPF-based filter mode (SECCOMP_MODE_FILTER) allows a process to install a Berkeley Packet Filter program that inspects each syscall number and its arguments, then returns an action.
prctl(PR_SET_NO_NEW_PRIVS, 1) first (or hold CAP_SYS_ADMIN) to prevent privilege escalation through the filter. A filter can return one of:
| Return value | Effect |
|---|---|
SECCOMP_RET_ALLOW | Syscall proceeds normally |
SECCOMP_RET_ERRNO | Syscall returns the specified errno |
SECCOMP_RET_KILL_PROCESS | Process exits immediately with SIGSYS |
SECCOMP_RET_TRAP | Kernel sends SIGSYS to the process |
SECCOMP_RET_USER_NOTIF | Notification sent to a supervisor process |
fork() and execve(), and additional filters can be layered — the strictest matching rule always wins. Seccomp is most effective when combined with an LSM policy.
Linux Security Modules framework
The Linux Security Modules (LSM) framework provides a hook-based infrastructure for implementing mandatory access control and other security policies inside the kernel. At critical points in the kernel — file opens, socket connections, process credential changes — LSM calls registered hook functions that can allow or deny the operation. The active LSMs on a running system are listed at:Available LSMs
SELinux
Policy-based mandatory access control developed by the NSA. Every subject and object receives a label; a policy database determines which label combinations are allowed. Used by default on RHEL, Fedora, and Android.
AppArmor
Path-based MAC that confines programs to a declared set of files, capabilities, and network access via per-application profiles. Used by default on Ubuntu and Debian.
Smack
Simplified Mandatory Access Control Kernel. Uses short labels on files and processes; a simple rule set controls which labels can read or write to which other labels.
TOMOYO
Pathname-based MAC that builds a learning profile of allowed operations over time and can then enforce that profile. Focused on reducing false positives via training mode.
Landlock
User-space sandboxing API. Any process — including unprivileged ones — can voluntarily restrict its own filesystem and network access by constructing and enforcing a ruleset.
BPF LSM
Allows eBPF programs to be attached to LSM hooks, enabling policy enforcement that can be loaded and updated at runtime without recompiling the kernel.
Kernel hardening mechanisms
Beyond access control, the kernel contains a suite of self-protection mechanisms that make exploitation of kernel bugs significantly harder.- KASLR (
CONFIG_RANDOMIZE_BASE) — randomizes the kernel’s load address at each boot so that an attacker cannot rely on fixed memory addresses. - SMEP — Supervisor Mode Execution Prevention, a CPU feature that prevents the kernel from executing code in user-space pages.
- SMAP — Supervisor Mode Access Prevention, a CPU feature that prevents the kernel from reading or writing user-space memory without explicit intent.
- Stack canaries (
CONFIG_STACKPROTECTOR) — a secret value placed between stack variables and the return address that is checked before a function returns. - KASAN — Kernel Address Sanitizer, a runtime detector for out-of-bounds accesses and use-after-free bugs.
CONFIG_STRICT_KERNEL_RWX— enforces that kernel text is not writable and kernel data is not executable.
Reporting vulnerabilities
The Linux kernel security team handles embargoed vulnerability reports at security@kernel.org. Reports should include a description of the vulnerability, affected kernel versions, and reproduction steps if available. The team coordinates with distributors and upstream maintainers to prepare fixes before public disclosure.LSM framework
How LSM hooks work, configuring SELinux and AppArmor, and writing a custom LSM.
Kernel hardening
KASLR, stack protection, heap integrity, and the Kconfig hardening checklist.
