Linux Kernel Synchronization Primitives API Reference

Concurrency in the kernel is pervasive: interrupt handlers, softirqs, worker threads, and multiple CPU cores can all execute your code simultaneously. The kernel provides a layered set of synchronization primitives, each with distinct performance and context constraints. Picking the wrong primitive is a common source of deadlocks, priority inversions, and subtle data corruption. This reference covers the full set: atomic operations for lockless counters, spinlocks for short critical sections in any context, mutexes for sleeping locks in process context, RCU for high-read-frequency data, semaphores for resource counting, completions for event signaling, and memory barriers for ordering guarantees.

Atomic operations

Lock-free integer operations for counters and flags.

Spinlocks

Busy-wait locks safe in interrupt and atomic context.

Mutexes

Sleeping mutual exclusion for process context.

RCU

Read-Copy-Update for fast, scalable read-mostly data.

Semaphores

Counting semaphores for resource limiting.

Completions

One-shot event notification between kernel threads.

Choosing a primitive

The right primitive depends on two constraints: who holds the lock (only process context, or also interrupt handlers and softirqs?) and how long the critical section is (microseconds or potentially milliseconds?).

Atomic operations — no lock needed

Use for simple integer counters and boolean flags shared between contexts. Zero overhead on modern CPUs. Suitable in any context including hardirq.

Spinlock — short critical section, any context

Disables preemption on the local CPU while held. Can be taken in hardirq context (with spin_lock_irqsave). Critical section must not sleep. Ideal for protecting small data structures for a handful of instructions.

Mutex — longer critical section, process context only

The holder sleeps if the mutex is contended. Cannot be acquired in interrupt context. Use when the critical section might allocate memory, call copy_from_user(), or do I/O. Preferred over spinlocks when sleeping is safe.

RCU — read-mostly data, very high read frequency

Readers are never blocked. Writers make a copy, update it, then wait for a grace period before freeing the old version. Ideal for routing tables, device lists, and other data read on every packet or syscall.

Semaphore — resource counting or one-way signaling

Like a mutex but with a count > 1. Use for limiting concurrent access to a pool of resources (e.g., at most N concurrent DMA transfers). For binary signaling between threads, prefer struct completion.

Completion — one-shot event signaling

One thread waits; another signals. Cleaner than a semaphore initialised to 0 for this pattern. Use for “wait for hardware to finish” or “wait for thread to start”.

Atomic operations

atomic_t wraps a 32-bit integer with CPU-level atomic read-modify-write instructions. No lock is required; all operations are indivisible with respect to other CPUs.

#include <linux/atomic.h>

atomic_t refcount = ATOMIC_INIT(1);

/* Read and write */
int val = atomic_read(&refcount);       /* returns current value */
atomic_set(&refcount, 5);              /* set to 5 */

/* Arithmetic */
atomic_inc(&refcount);                 /* add 1 */
atomic_dec(&refcount);                 /* subtract 1 */
atomic_add(3, &refcount);             /* add 3 */
atomic_sub(2, &refcount);             /* subtract 2 */

/* Arithmetic with return value */
int new = atomic_inc_return(&refcount);        /* returns value after increment */
int old = atomic_fetch_add(5, &refcount);      /* returns value before add */
bool zero = atomic_dec_and_test(&refcount);    /* true if result is 0 */

/* Compare-and-exchange */
int expected = 1;
int desired  = 2;
/* Returns the old value; if old == expected, the exchange happened */
int prev = atomic_cmpxchg(&refcount, expected, desired);

64-bit atomics

#include <linux/atomic.h>

atomic64_t counter = ATOMIC64_INIT(0);

atomic64_inc(&counter);
s64 val = atomic64_read(&counter);
atomic64_set(&counter, 0LL);

atomic_t operations do not imply full memory barriers by default. If you need ordering guarantees between the atomic operation and other memory accesses, use smp_mb__before_atomic() / smp_mb__after_atomic(), or choose an explicit barrier variant like atomic_fetch_add_acquire().

Spinlocks

Spinlocks are the correct choice when a lock must be acquired from interrupt context, or when the critical section is very short (tens of instructions). The lock holder busy-waits—it does not sleep—so long critical sections waste CPU cycles on other CPUs.

#include <linux/spinlock.h>

static DEFINE_SPINLOCK(my_lock);   /* static initialisation */

/* Dynamic initialisation */
spinlock_t lock;
spin_lock_init(&lock);

/* Basic lock / unlock (process context, interrupts not disabled) */
spin_lock(&my_lock);
/* ... critical section ... */
spin_unlock(&my_lock);

/* Lock with local IRQ disabled — required if an interrupt handler also takes this lock */
unsigned long flags;
spin_lock_irqsave(&my_lock, flags);
/* ... critical section ... */
spin_unlock_irqrestore(&my_lock, flags);

/* Lock with BH disabled — required if a softirq/tasklet also takes this lock */
spin_lock_bh(&my_lock);
/* ... critical section ... */
spin_unlock_bh(&my_lock);

/* Non-blocking try */
if (spin_trylock(&my_lock)) {
    /* ... */
    spin_unlock(&my_lock);
}

Choosing the right spinlock variant

Interrupt handler takes the lock?	Softirq/tasklet takes it?	Use
No	No	`spin_lock` / `spin_unlock`
No	Yes	`spin_lock_bh` / `spin_unlock_bh`
Yes	—	`spin_lock_irqsave` / `spin_unlock_irqrestore`

Never call a function that can sleep while holding a spinlock. This includes kmalloc(GFP_KERNEL), mutex_lock(), copy_from_user(), msleep(), and any function that may wait for I/O. Doing so causes a deadlock or a scheduler BUG on PREEMPT_RT kernels.

Mutexes

A mutex serializes access to a resource in process context. If the mutex is already held when a thread calls mutex_lock(), the thread is put to sleep and only woken when the mutex is released. This makes mutexes unsuitable for interrupt handlers but ideal for protecting state that requires memory allocation, userspace copies, or device I/O.

#include <linux/mutex.h>

static DEFINE_MUTEX(my_mutex);   /* static initialisation */

/* Dynamic initialisation */
struct mutex lock;
mutex_init(&lock);

/* Lock — sleeps until available */
mutex_lock(&my_mutex);
/* ... critical section ... */
mutex_unlock(&my_mutex);

/* Interruptible lock — returns -EINTR if a signal arrives */
if (mutex_lock_interruptible(&my_mutex))
    return -EINTR;
/* ... */
mutex_unlock(&my_mutex);

/* Killable lock — interrupted only by fatal signals (SIGKILL) */
if (mutex_lock_killable(&my_mutex))
    return -EINTR;
mutex_unlock(&my_mutex);

/* Non-blocking try — returns 1 if acquired, 0 if not */
if (mutex_trylock(&my_mutex)) {
    /* acquired */
    mutex_unlock(&my_mutex);
}

A mutex must be released by the same task that acquired it. This is enforced in debug builds. If you need a lock that can be released by a different task (e.g., producer/consumer), use a semaphore.

RCU (Read-Copy-Update)

RCU is a synchronization mechanism optimized for data that is read far more often than it is written. Readers acquire no lock and are never blocked. Writers make a copy of the data, modify it, atomically swap in the new version, and then wait for all pre-existing readers to finish before freeing the old version.

Reader side

#include <linux/rcupdate.h>

rcu_read_lock();
/*
 * Inside this critical section you may dereference RCU-protected pointers.
 * You must not sleep (including blocking allocations).
 */
struct my_data *data = rcu_dereference(global_data_ptr);
if (data)
    use(data->value);
rcu_read_unlock();

Writer side

/* Allocate and populate the new version */
struct my_data *new_data = kmalloc(sizeof(*new_data), GFP_KERNEL);
new_data->value = new_value;

/* Atomically publish the new pointer */
rcu_assign_pointer(global_data_ptr, new_data);

/*
 * synchronize_rcu() blocks until all pre-existing read-side
 * critical sections have completed. After this returns, no reader
 * can hold a reference to old_data.
 */
synchronize_rcu();
kfree(old_data);

Asynchronous callback (call_rcu)

When the writer cannot sleep (e.g., it holds a spinlock), use call_rcu() to schedule the free callback asynchronously after the grace period:

struct rcu_head rcu;   /* embed in the structure being freed */

static void my_free_callback(struct rcu_head *head)
{
    struct my_data *data = container_of(head, struct my_data, rcu);
    kfree(data);
}

/* Schedule the free; returns immediately */
call_rcu(&old_data->rcu, my_free_callback);

RCU-protected lists

#include <linux/rculist.h>

/* Writer: add to list */
spin_lock(&list_lock);
list_add_rcu(&new->list, &head);
spin_unlock(&list_lock);

/* Writer: delete from list */
spin_lock(&list_lock);
list_del_rcu(&entry->list);
spin_unlock(&list_lock);
synchronize_rcu();   /* or call_rcu() */
kfree(entry);

/* Reader: traverse without any lock */
rcu_read_lock();
list_for_each_entry_rcu(entry, &head, list) {
    /* ... */
}
rcu_read_unlock();

Always use rcu_dereference() to read RCU-protected pointers inside a read-side critical section, and rcu_assign_pointer() to publish them on the writer side. These macros enforce the memory barriers required for correct ordering on all architectures.

Semaphores

Semaphores maintain an integer count. down() decrements and blocks if the count would go negative; up() increments and wakes a waiter. Binary semaphores (count = 1) behave like sleeping mutexes but without the ownership constraint—any task can call up().

#include <linux/semaphore.h>

static DEFINE_SEMAPHORE(my_sem, 1);   /* binary semaphore, initial count 1 */

/* Dynamic initialisation */
struct semaphore sem;
sema_init(&sem, N);   /* counting semaphore allowing N concurrent holders */

/* Acquire — sleeps until count > 0 */
down(&my_sem);
/* ... */
up(&my_sem);

/* Interruptible acquire — returns -EINTR on signal */
if (down_interruptible(&my_sem))
    return -EINTR;
up(&my_sem);

/* Killable acquire — returns -EINTR on fatal signal only */
if (down_killable(&my_sem))
    return -EINTR;
up(&my_sem);

/* Non-blocking attempt — returns 0 if acquired, -EAGAIN if not */
if (!down_trylock(&my_sem)) {
    /* acquired */
    up(&my_sem);
}

For the common case of mutual exclusion in process context, prefer struct mutex over a binary semaphore. Mutex has better semantics (ownership tracking, priority inheritance on RT), more debugging support, and is faster on most architectures.

Completions

struct completion is the idiomatic way to signal a one-shot event from one kernel thread (or interrupt handler) to another. It is cleaner and more obvious in intent than a semaphore initialised to 0.

#include <linux/completion.h>

static DECLARE_COMPLETION(data_ready);   /* static initialisation */

/* Dynamic initialisation */
struct completion done;
init_completion(&done);

/* --- Thread / driver that waits for the event --- */

/* Block indefinitely */
wait_for_completion(&data_ready);

/* Block with a timeout (returns remaining jiffies, or 0 on timeout) */
unsigned long remaining =
    wait_for_completion_timeout(&data_ready, msecs_to_jiffies(1000));
if (!remaining)
    return -ETIMEDOUT;

/* Interruptible wait */
if (wait_for_completion_interruptible(&data_ready))
    return -ERESTARTSYS;

/* --- Thread / IRQ handler that signals the event --- */

complete(&data_ready);          /* wake one waiter */
complete_all(&data_ready);      /* wake all waiters */

/* Reinitialise for reuse */
reinit_completion(&data_ready);

Memory barriers

The CPU and compiler may reorder memory operations for performance. Memory barriers enforce ordering constraints where the architecture’s relaxed memory model would otherwise allow reordering visible to other CPUs.

#include <asm/barrier.h>

/*
 * smp_mb() — full memory barrier
 * All loads and stores before the barrier appear to complete
 * before all loads and stores after it, on all CPUs.
 */
smp_mb();

/*
 * smp_rmb() — read (load) memory barrier
 * All loads before the barrier complete before all loads after it.
 * Cheaper than smp_mb() on architectures with a Total Store Order model.
 */
smp_rmb();

/*
 * smp_wmb() — write (store) memory barrier
 * All stores before the barrier complete before all stores after it.
 */
smp_wmb();

/*
 * READ_ONCE / WRITE_ONCE — prevent compiler from eliminating or reordering
 * an individual access. Required when reading/writing a variable shared
 * between contexts without a lock.
 */
int val = READ_ONCE(shared_flag);
WRITE_ONCE(shared_flag, 1);

When barriers are needed

/* Producer: publish data, then set a flag */
buffer[idx] = data;
smp_wmb();                    /* ensure buffer write is visible before flag */
WRITE_ONCE(ready_flag, 1);

/* Consumer: check flag, then read data */
if (READ_ONCE(ready_flag)) {
    smp_rmb();                /* ensure flag read is ordered before buffer read */
    process(buffer[idx]);
}

rcu_assign_pointer() and rcu_dereference() already include the appropriate barriers for their usage context. spin_lock() / spin_unlock() and mutex_lock() / mutex_unlock() also imply full memory barriers on all supported architectures. Manual barriers are needed mainly in lockless or RCU-free code.

Barrier summary

Barrier	Orders	Use case
`smp_mb()`	All loads and stores	General producer/consumer flag protocols
`smp_rmb()`	Loads only	Checking a flag then reading the data it guards
`smp_wmb()`	Stores only	Writing data then publishing a pointer to it
`READ_ONCE(x)`	Single load	Reading a variable shared without a lock
`WRITE_ONCE(x, v)`	Single store	Writing a variable shared without a lock
`dma_wmb()`	Stores visible to DMA	Ensuring DMA descriptor is ready before ringing doorbell

Core APIs

Driver Development

Security

Linux Kernel Synchronization Primitives API Reference

Atomic operations

Spinlocks

Mutexes

RCU

Semaphores

Completions

Choosing a primitive

Atomic operations

64-bit atomics

Spinlocks

Choosing the right spinlock variant

Mutexes

RCU (Read-Copy-Update)

Reader side

Writer side

Asynchronous callback (call_rcu)

RCU-protected lists

Semaphores

Completions

Memory barriers

When barriers are needed

Barrier summary

Build docs developers (and LLMs) love

Core APIs

Driver Development

Security

Documentation Index

Atomic operations

Spinlocks

Mutexes

RCU

Semaphores

Completions

​Choosing a primitive

​Atomic operations

​64-bit atomics

​Spinlocks

​Choosing the right spinlock variant

​Mutexes

​RCU (Read-Copy-Update)

​Reader side

​Writer side

​Asynchronous callback (call_rcu)

​RCU-protected lists

​Semaphores

​Completions

​Memory barriers

​When barriers are needed

​Barrier summary

Build docs developers (and LLMs) love

Choosing a primitive

Atomic operations

64-bit atomics

Spinlocks

Choosing the right spinlock variant

Mutexes

RCU (Read-Copy-Update)

Reader side

Writer side

Asynchronous callback (call_rcu)

RCU-protected lists

Semaphores

Completions

Memory barriers

When barriers are needed

Barrier summary