Locking Primitives and Concurrency in the Kernel

The Linux kernel runs on systems with hundreds of CPUs and must protect shared data structures from concurrent access while staying responsive. It provides a rich set of synchronization primitives, each with different performance characteristics and usage constraints. Choosing the wrong primitive can cause deadlocks, priority inversion, or subtle data corruption — understanding when to use each one is a core kernel programming skill.

Lock categories

The kernel divides locking primitives into three categories:

Category	Examples	May sleep?	Notes
Sleeping locks	`mutex`, `rw_semaphore`, `semaphore`	Yes	Only valid in preemptible task context
CPU-local locks	`local_lock`	No	Disables preemption or interrupts on one CPU
Spinning locks	`spinlock_t`, `rwlock_t`, `raw_spinlock_t`	No	Busy-wait; implicitly disable preemption

On PREEMPT_RT kernels spinlock_t and rwlock_t are converted to sleeping RT-mutex-based locks, which changes their context constraints. Use raw_spinlock_t only when you need a non-sleeping spinlock on all kernel configurations.

Spinlocks

Spinlocks are the most fundamental primitive. They busy-wait until the lock is available, disable preemption, and optionally disable interrupts. All spinlock operations are defined in include/linux/spinlock.h.

/* Static initialization */
static DEFINE_SPINLOCK(my_lock);

/* Dynamic initialization */
spinlock_t my_lock;
spin_lock_init(&my_lock);

Choosing the right spinlock variant

/* Process context only — no interrupt involvement */
spin_lock(&lock);
/* ... critical section ... */
spin_unlock(&lock);

/* When the lock may also be taken in a softirq / tasklet */
spin_lock_bh(&lock);
/* ... */
spin_unlock_bh(&lock);

/* When the lock may also be taken in a hardware interrupt handler */
unsigned long flags;
spin_lock_irqsave(&lock, flags);
/* ... */
spin_unlock_irqrestore(&lock, flags);

If a spinlock is ever acquired inside a hardware interrupt handler, you must use spin_lock_irqsave everywhere that lock is taken. Failing to disable interrupts before acquiring a lock that an interrupt handler also acquires will deadlock the CPU: the interrupt will try to acquire an already-held lock on the same CPU.

The irqsave/irqrestore variant saves the current interrupt-enable state into flags and restores it on unlock. Use this instead of spin_lock_irq whenever the interrupt state before the lock is unknown.

Reader-writer spinlocks

rwlock_t allows many concurrent readers but only one writer at a time:

/* Documentation/locking/spinlocks.rst */
rwlock_t xxx_rw_lock = __RW_LOCK_UNLOCKED(xxx_rw_lock);

unsigned long flags;

read_lock_irqsave(&xxx_rw_lock, flags);
/* ... read-only critical section ... */
read_unlock_irqrestore(&xxx_rw_lock, flags);

write_lock_irqsave(&xxx_rw_lock, flags);
/* ... read/write critical section ... */
write_unlock_irqrestore(&xxx_rw_lock, flags);

The kernel community is actively removing rwlock_t from most subsystems because the read-write overhead exceeds the benefit for short critical sections. Prefer RCU for read-heavy data structures.

Mutexes

Mutexes are sleeping locks defined in include/linux/mutex.h. The holder is put to sleep when the mutex is contended, freeing the CPU for other work. Mutexes have strict semantics enforced by CONFIG_DEBUG_MUTEXES:

Only one task may hold the mutex at a time.
Only the owner may unlock it.
Recursive locking is not permitted.
A mutex must not be used in hardware or software interrupt context.
A task must not exit while holding a mutex.

/* Static initialization */
static DEFINE_MUTEX(my_mutex);

/* Dynamic initialization */
struct mutex my_mutex;
mutex_init(&my_mutex);

/* Acquire (uninterruptible — cannot be interrupted by a signal) */
void mutex_lock(struct mutex *lock);

/* Acquire (interruptible — returns -EINTR on signal) */
int  mutex_lock_interruptible(struct mutex *lock);

/* Try to acquire without blocking (returns 1 on success, 0 on failure) */
int  mutex_trylock(struct mutex *lock);

/* Release */
void mutex_unlock(struct mutex *lock);

/* Query */
int  mutex_is_locked(struct mutex *lock);

Mutexes use an optimistic spinning path (MCS lock) when the owner is running on another CPU, making them behave more like a hybrid than a pure sleeping lock. This significantly improves performance on lightly contended locks.

RCU: Read-Copy Update

RCU (Read-Copy Update) is a synchronization mechanism optimised for read-heavy workloads. Readers hold no locks, incur no cache traffic, and never block. Writers make a copy of the data, modify the copy, then atomically publish it; the old version is freed only after all pre-existing readers have finished.

/* Reader side — protected by rcu_read_lock / rcu_read_unlock */
rcu_read_lock();
struct my_node *p = rcu_dereference(global_ptr);
if (p)
    use(p->field);
rcu_read_unlock();

/* Writer side */
struct my_node *old_p;
struct my_node *new_p = kmalloc(sizeof(*new_p), GFP_KERNEL);

spin_lock(&update_lock);
old_p  = rcu_dereference_protected(global_ptr,
                                   lockdep_is_held(&update_lock));
*new_p = *old_p;         /* copy */
new_p->field = new_val;  /* modify copy */
rcu_assign_pointer(global_ptr, new_p); /* publish */
spin_unlock(&update_lock);

synchronize_rcu();       /* wait for all existing readers to finish */
kfree(old_p);

Key RCU APIs:

API	Purpose
`rcu_read_lock()` / `rcu_read_unlock()`	Mark the boundaries of an RCU read-side critical section.
`rcu_dereference(p)`	Safely dereference an RCU-protected pointer inside a read-side section.
`rcu_assign_pointer(p, v)`	Publish a new pointer value with the correct memory barrier.
`synchronize_rcu()`	Sleep until all pre-existing RCU readers have completed (writer side).
`call_rcu(&head, func)`	Asynchronous variant: calls `func` after a grace period (no sleeping).

If you only need to free memory after readers finish, kfree_rcu(ptr, rcu_head_field) is a convenient shorthand for call_rcu + kfree.

Atomic operations

The kernel provides atomic integer types and operations that execute as a single indivisible instruction without any lock:

/* include/linux/atomic.h */
atomic_t     counter = ATOMIC_INIT(0);
atomic64_t   counter64 = ATOMIC64_INIT(0);

void  atomic_set(atomic_t *v, int i);
int   atomic_read(const atomic_t *v);
void  atomic_inc(atomic_t *v);
void  atomic_dec(atomic_t *v);
int   atomic_inc_and_test(atomic_t *v);   /* returns true if result == 0 */
int   atomic_dec_and_test(atomic_t *v);   /* returns true if result == 0 */
int   atomic_add_return(int i, atomic_t *v);
int   atomic_cmpxchg(atomic_t *v, int old, int new);

atomic_dec_and_test is commonly used for reference counting:

struct my_obj {
    atomic_t refcount;
    /* ... */
};

static inline void my_obj_get(struct my_obj *obj)
{
    atomic_inc(&obj->refcount);
}

static inline void my_obj_put(struct my_obj *obj)
{
    if (atomic_dec_and_test(&obj->refcount))
        kfree(obj);
}

Prefer refcount_t over atomic_t for reference counting. refcount_t adds overflow and underflow detection and is harder to misuse. It is defined in include/linux/refcount.h.

For bitwise operations on single bits, use set_bit(), clear_bit(), test_bit(), and test_and_set_bit().

Semaphores

struct semaphore is a counting semaphore defined in include/linux/semaphore.h. It allows a configurable number of concurrent holders.

struct semaphore sem;
sema_init(&sem, 1);    /* binary semaphore (like a mutex) */

down(&sem);            /* acquire (uninterruptible) */
down_interruptible(&sem);  /* acquire (interruptible) */
up(&sem);              /* release */

New code should rarely use semaphore. Semaphores have no concept of ownership, so they cannot participate in priority inheritance and are susceptible to priority inversion. Prefer mutex for mutual exclusion and completion for signalling between tasks.

Reader-writer semaphores

rw_semaphore allows multiple concurrent readers or one exclusive writer:

struct rw_semaphore rwsem;
init_rwsem(&rwsem);

down_read(&rwsem);
/* ... read-only critical section ... */
up_read(&rwsem);

down_write(&rwsem);
/* ... read/write critical section ... */
up_write(&rwsem);

/* Downgrade a write lock to a read lock (avoids starvation) */
downgrade_write(&rwsem);

Choosing the right primitive

Short critical section with possible interrupt context

Use spinlock. If the lock is ever taken in a hardware interrupt handler, use spin_lock_irqsave. If only in softirq/tasklet, use spin_lock_bh. For process-context-only code, use spin_lock.

Longer critical section, process context only

Use mutex. The lock holder sleeps if contended, which avoids wasting CPU cycles. Mutexes integrate with lockdep and priority inheritance (on RT kernels).

Read-heavy data, infrequent writes

Use RCU. Readers have zero overhead and never block. Writers copy the data, update, and publish atomically. Use synchronize_rcu or call_rcu to defer freeing the old version.

Reference counting

Use refcount_t or, if you need additional atomic arithmetic, atomic_t. Never use locks for simple reference counting — the atomic primitives are sufficient and far cheaper.

Signalling between tasks

Use completion (include/linux/completion.h). One task calls wait_for_completion() and another calls complete(). This is cleaner than using a semaphore as a one-shot signal.

Common locking pitfalls

Lock ordering violations

Acquiring two or more locks in different orders on different code paths will deadlock. Establish a global lock ordering and always acquire locks in that order. Enable CONFIG_LOCKDEP during development — lockdep tracks acquisition sequences and reports ordering violations.

Sleeping in atomic context

Calling any function that may sleep — kmalloc(GFP_KERNEL), mutex_lock(), msleep(), copy_from_user() — while holding a spinlock or in interrupt context will hang or corrupt the kernel. Use GFP_ATOMIC for allocations inside spinlock-protected sections or interrupt handlers.

Missing memory barriers

On architectures with weak memory ordering (ARM, POWER), accesses may be reordered by the CPU or compiler. The RCU primitives (rcu_assign_pointer, rcu_dereference) include the correct barriers. For other patterns, use smp_store_release() / smp_load_acquire() or the WRITE_ONCE() / READ_ONCE() macros.

Forgetting to unlock on all error paths

A function that acquires a lock must release it on every return path, including error paths. Use goto labels with cleanup paths or the scoped_guard() macro (available since 6.4) to ensure the lock is always released.

Memory management

GFP flags determine whether an allocation may sleep, which directly affects which lock is safe to hold.

Filesystems

The VFS uses a combination of mutexes, spinlocks, and RCU to protect inodes, dentries, and superblocks.

Get Started

Development Guide

Kernel Internals

Driver Development

Administration

Contributing

Locking Primitives and Concurrency in the Kernel

Lock categories

Spinlocks

Choosing the right spinlock variant

Reader-writer spinlocks

Mutexes

RCU: Read-Copy Update

Atomic operations

Semaphores

Reader-writer semaphores

Choosing the right primitive

Common locking pitfalls

Further reading

Memory management

Filesystems

Build docs developers (and LLMs) love

Get Started

Development Guide

Kernel Internals

Driver Development

Administration

Contributing

Documentation Index

​Lock categories

​Spinlocks

​Choosing the right spinlock variant

​Reader-writer spinlocks

​Mutexes

​RCU: Read-Copy Update

​Atomic operations

​Semaphores

​Reader-writer semaphores

​Choosing the right primitive

​Common locking pitfalls

​Further reading

Memory management

Filesystems

Build docs developers (and LLMs) love

Lock categories

Spinlocks

Choosing the right spinlock variant

Reader-writer spinlocks

Mutexes

RCU: Read-Copy Update

Atomic operations

Semaphores

Reader-writer semaphores

Choosing the right primitive

Common locking pitfalls

Further reading