Synchronization and Locking in the Linux Kernel

The Linux kernel runs on symmetric multiprocessor (SMP) systems where multiple CPUs execute kernel code simultaneously, and also deals with asynchronous interrupts and software interrupts (softirqs) that can preempt code at arbitrary points. Choosing the right synchronisation primitive for a given situation directly affects both correctness and performance.

Atomic operations

Atomic operations are the lowest-level synchronisation primitives. They guarantee that a read-modify-write sequence on a single variable completes without interference from another CPU.

#include <linux/atomic.h>

atomic_t refcount = ATOMIC_INIT(0);

atomic_inc(&refcount);               /* increment */
atomic_dec(&refcount);               /* decrement */
int val = atomic_read(&refcount);    /* read current value */
atomic_set(&refcount, 5);            /* set to a value */

/* Decrement and test: returns true if result is zero */
if (atomic_dec_and_test(&refcount))
    free_object();

/* Compare and exchange */
int old = atomic_cmpxchg(&var, expected, new_val);

64-bit variants (atomic64_t) and bitwise operations (set_bit(), clear_bit(), test_and_set_bit()) are also available. Atomic operations on atomic_t imply full memory ordering on most architectures; for relaxed variants use the atomic_*_relaxed() family.

Spinlocks

A spinlock is the most basic SMP locking primitive. A CPU attempting to acquire a locked spinlock busy-waits (“spins”) until the lock is released. Spinlocks are appropriate for critical sections that are short and cannot sleep.

#include <linux/spinlock.h>

/* Static initialisation */
static DEFINE_SPINLOCK(my_lock);

/* Dynamic initialisation */
spinlock_t lock;
spin_lock_init(&lock);

/* Basic acquire/release (process context only, no interrupts) */
spin_lock(&my_lock);
/* ... critical section ... */
spin_unlock(&my_lock);

/* Safe in interrupt context: saves and disables local IRQs */
unsigned long flags;
spin_lock_irqsave(&my_lock, flags);
/* ... critical section ... */
spin_unlock_irqrestore(&my_lock, flags);

/* Safe when BH (softirq) context is the concern */
spin_lock_bh(&my_lock);
/* ... critical section ... */
spin_unlock_bh(&my_lock);

Never call any function that can sleep (memory allocation with GFP_KERNEL, mutex_lock(), copy_from_user(), etc.) while holding a spinlock. The lock is held with preemption disabled, so a sleeping call will deadlock or corrupt the scheduler state.

When to use irqsave

If the protected data is accessed from both process context and interrupt handlers on the same CPU, you must use spin_lock_irqsave() to prevent the interrupt handler from acquiring the spinlock on a CPU that already holds it — which would cause a deadlock. If the interrupt handler only runs on other CPUs, plain spin_lock() is sufficient because interrupts on remote CPUs do not affect the local lock holder.

Reader-writer spinlocks

When reads greatly outnumber writes, rwlock_t allows multiple concurrent readers while still serialising writes:

rwlock_t rw_lock = __RW_LOCK_UNLOCKED(rw_lock);

/* Multiple readers can hold this simultaneously */
read_lock_irqsave(&rw_lock, flags);
/* ... read-only critical section ... */
read_unlock_irqrestore(&rw_lock, flags);

/* Exclusive write access */
write_lock_irqsave(&rw_lock, flags);
/* ... read-write critical section ... */
write_unlock_irqrestore(&rw_lock, flags);

Reader-writer spinlocks carry more overhead than plain spinlocks and cannot be upgraded from read to write. Prefer RCU for read-mostly data structures; the kernel is actively removing rwlocks in favour of RCU.

Mutexes

Mutexes are sleeping locks: a task that cannot acquire a mutex is put to sleep (added to the mutex’s wait queue) rather than spinning. This makes them suitable for critical sections that may be long or that call blocking operations.

#include <linux/mutex.h>

/* Static initialisation */
static DEFINE_MUTEX(my_mutex);

/* Dynamic initialisation */
struct mutex lock;
mutex_init(&lock);

/* Acquire (may sleep) */
mutex_lock(&my_mutex);
/* ... critical section (may sleep) ... */
mutex_unlock(&my_mutex);

/* Acquire interruptibly (returns -EINTR if signalled) */
if (mutex_lock_interruptible(&my_mutex))
    return -ERESTARTSYS;

/* Try to acquire without blocking */
if (mutex_trylock(&my_mutex)) {
    /* ... got the lock ... */
    mutex_unlock(&my_mutex);
}

/* Check if the mutex is locked */
bool locked = mutex_is_locked(&my_mutex);

Mutex implementation details

Mutex acquisition follows three paths, as documented in Documentation/locking/mutex-design.rst:

Fastpath: cmpxchg() atomically sets the owner to the current task. Succeeds instantly in the uncontended case.
Midpath (optimistic spinning): If the lock owner is currently running on another CPU, the waiter spins using an MCS lock rather than sleeping immediately. This avoids the overhead of a context switch for short-lived contention.
Slowpath: The task is added to the wait queue and sleeps as TASK_UNINTERRUPTIBLE until woken by mutex_unlock().

The kernel enforces strict mutex semantics: only the owner can unlock, recursive locking is forbidden, and mutexes cannot be used in interrupt context.

Do not use mutexes in hardware or software interrupt context (ISRs, tasklets, timers). Use spinlocks instead, since interrupt handlers cannot sleep.

RCU — Read-Copy-Update

RCU is a synchronisation mechanism optimised for read-mostly data structures. Readers acquire no lock and execute with minimal overhead — typically just a preemption disable. Writers proceed in two phases: they create an updated copy of the data, atomically publish the pointer to the new version, and then wait for all readers that started before the update to finish (a grace period) before freeing the old version.

#include <linux/rcupdate.h>

/* Reader side: no explicit lock, just disable preemption */
rcu_read_lock();
struct my_data *p = rcu_dereference(global_ptr); /* safe dereference */
if (p)
    use_data(p);
rcu_read_unlock();

/* Writer side: update the pointer, then wait for old readers */
struct my_data *new = kmalloc(sizeof(*new), GFP_KERNEL);
*new = *old;
new->field = new_value;

/* Atomically publish the new pointer */
rcu_assign_pointer(global_ptr, new);

/* Wait for all pre-existing readers to finish */
synchronize_rcu();

/* Now safe to free the old data */
kfree(old);

For situations where the writer cannot block, call_rcu() registers a callback that is invoked after the grace period:

static void my_rcu_callback(struct rcu_head *head) {
    struct my_data *p = container_of(head, struct my_data, rcu);
    kfree(p);
}

/* Non-blocking: schedule free after grace period */
call_rcu(&old->rcu, my_rcu_callback);

RCU is ideal for linked lists, hash tables, and pointer-based structures where reads dominate. The kernel uses RCU extensively in the networking stack (routing tables, netfilter), the VFS (dcache lookups), and the task list.

Seqlocks

Seqlocks (sequence locks) provide a reader-writer mechanism where writers never wait for readers. Readers detect concurrent writes by checking a sequence counter before and after their critical section; if the counter changed (indicating a write occurred), they retry.

#include <linux/seqlock.h>

/* Static initialisation */
static DEFINE_SEQLOCK(my_seqlock);

/* Writer: holds the embedded spinlock, increments counter */
write_seqlock(&my_seqlock);
/* ... update the protected data ... */
write_sequnlock(&my_seqlock);

/* Reader: retry loop */
unsigned int seq;
do {
    seq = read_seqbegin(&my_seqlock);
    /* ... read the protected data ... */
} while (read_seqretry(&my_seqlock, seq));

The sequence counter is odd during a write (indicating an in-progress update) and even when no write is active. If a reader observes an odd counter, or if the counter changes between read_seqbegin() and read_seqretry(), it retries. Seqlocks are used for frequently-read, infrequently-written data such as the system clock (jiffies, xtime).

Seqlocks cannot protect data that contains pointers. A writer may free a pointer between the reader’s read_seqbegin() and the point where the reader dereferences it. Use RCU for pointer-based data.

Per-CPU variables

Per-CPU variables give each CPU its own copy of a variable, eliminating all locking for the common case where each CPU only accesses its own copy:

#include <linux/percpu.h>

/* Define a per-CPU variable */
DEFINE_PER_CPU(long, my_counter);

/* Increment the current CPU's counter (preemption disabled) */
this_cpu_inc(my_counter);

/* Read the current CPU's value */
long val = this_cpu_read(my_counter);

/* Access another CPU's variable */
long remote = per_cpu(my_counter, cpu_id);

Per-CPU variables are safe to access without locks as long as preemption is disabled (which the this_cpu_* accessors ensure). They are used throughout the kernel for statistics counters, per-CPU allocator caches, and runqueue data structures.

Memory barriers

Modern CPUs and compilers reorder memory accesses for performance. Memory barriers prevent specific reorderings that would break concurrent code:

#include <asm/barrier.h>

/* Full memory barrier: orders all loads and stores before/after */
mb();

/* Read barrier: orders loads */
rmb();

/* Write barrier: orders stores */
wmb();

/* SMP-safe variants (no-ops on UP, barriers on SMP) */
smp_mb();
smp_rmb();
smp_wmb();

/* Read with acquire semantics (pairs with smp_store_release) */
val = smp_load_acquire(&ptr);

/* Write with release semantics */
smp_store_release(&ptr, val);

Explicit barriers are rarely needed in code that uses the standard locking APIs, since spinlock/mutex acquire and release imply the necessary barriers. Barriers are required when implementing lock-free algorithms or communicating between contexts using shared variables without locks.

Lock debugging with lockdep

lockdep is the kernel’s run-time lock validator. When CONFIG_PROVE_LOCKING is enabled, every lock acquisition and release is tracked. lockdep maintains a directed graph of lock-ordering dependencies and reports:

Circular dependency: lock A → lock B → lock A would deadlock.
Lock ordering inversions: taking locks in different orders in different code paths.
Lock usage violations: taking a sleeping lock in interrupt context.

# lockdep warnings appear in dmesg, e.g.:
# WARNING: possible circular locking dependency detected
# <task> is trying to acquire lock:
# <lock A>
# but task is already holding lock:
# <lock B>

Run your driver or subsystem under a debug kernel (CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_MUTEXES, CONFIG_PROVE_LOCKING) during development. lockdep catches many bugs that only manifest under rare timing conditions in production.

Lock statistics are available in /proc/lock_stat when CONFIG_LOCK_STAT is enabled, showing contention counts and waiting times per lock class.

Memory management

The buddy allocator, slab caches, and page reclaim all use spinlocks and RCU internally.

Scheduling

Per-CPU runqueue spinlocks and RCU-based task list traversal in the scheduler.

Networking

RCU in the routing table and socket hash tables; spinlocks in the NAPI receive path.

Filesystems

Inode semaphores, RCU-walk in the dcache, and seqlocks for filesystem timestamps.

Get Started

Development Guide

Kernel Internals

Administration

Synchronization and Locking in the Linux Kernel

Atomic operations

Spinlocks

When to use irqsave

Reader-writer spinlocks

Mutexes

Mutex implementation details

RCU — Read-Copy-Update

Seqlocks

Per-CPU variables

Memory barriers

Lock debugging with lockdep

Memory management

Scheduling

Networking

Filesystems

Build docs developers (and LLMs) love

Get Started

Development Guide

Kernel Internals

Administration

Documentation Index

​Atomic operations

​Spinlocks

​When to use irqsave

​Reader-writer spinlocks

​Mutexes

​Mutex implementation details

​RCU — Read-Copy-Update

​Seqlocks

​Per-CPU variables

​Memory barriers

​Lock debugging with lockdep

Memory management

Scheduling

Networking

Filesystems

Build docs developers (and LLMs) love

Atomic operations

Spinlocks

When to use irqsave

Reader-writer spinlocks

Mutexes

Mutex implementation details

RCU — Read-Copy-Update

Seqlocks

Per-CPU variables

Memory barriers

Lock debugging with lockdep