Power management in the Linux kernel divides into two distinct but complementary models: system sleep, where the entire system transitions to a low-power state like suspend-to-RAM or hibernation, and runtime PM, where individual devices power down independently while the rest of the system keeps running. Most drivers will encounter both, and the sameDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/DeelerDev/linux/llms.txt
Use this file to discover all available pages before exploring further.
dev_pm_ops structure is used to implement callbacks for each. Understanding when each callback is invoked—and what the kernel guarantees about ordering—is essential to writing correct PM support.
System sleep states
The kernel exposes several system sleep states, corresponding to the ACPI S-states on x86:Suspend-to-Idle (S0ix)
The shallowest sleep. CPUs enter deep idle states; devices are suspended but memory is self-refreshed. Fast resume; used in modern laptops for “connected standby”.
Standby / Shallow sleep (S1)
A lightly powered state where the CPU is stopped but hardware context is preserved. Less power saving than suspend-to-RAM.
Suspend-to-RAM (S3)
The most commonly implemented suspend state. All device drivers are asked to suspend; main memory is kept alive but almost everything else powers off. Resume restores hardware context.
Suspend-to-Disk / Hibernation (S4)
The kernel writes a memory image to swap or a dedicated partition, then powers off completely. On resume, the image is loaded and memory state is restored. Slowest to resume; survives power loss.
The dev_pm_ops struct
All PM callbacks are gathered instruct dev_pm_ops, defined in include/linux/pm.h. You assign a populated instance to driver.pm:
SYSTEM_SLEEP_PM_OPS() and RUNTIME_PM_OPS() are convenience macros that populate the correct fields and handle CONFIG_PM ifdefs cleanly. Without them you would set .suspend, .resume, .runtime_suspend, .runtime_resume, and .runtime_idle directly.
System sleep callback sequence
When the system suspends, the kernel executes callbacks in this order for each device (using whichever callback is present among pm_domain, device type, class, bus type, and driver, in that precedence order):The
suspend_noirq and resume_noirq phases run with interrupts disabled except for those marked IRQF_NO_SUSPEND. Do not acquire regular spinlocks or interact with devices that require interrupt-driven completion in these phases.Runtime PM
Runtime PM lets a device power down while the system is still running. The PM core tracks a usage count and an autosuspend timer for each device. When the usage count drops to zero (and an optional autosuspend delay has expired), the core calls the driver’sruntime_suspend callback.
Enabling runtime PM in probe()
Acquiring and releasing the device in driver I/O paths
Every time your driver needs the hardware to be powered on, it increments the PM usage count. When it is done, it decrements the count, potentially triggering suspend:| Function | Effect |
|---|---|
pm_runtime_enable(dev) | Allow runtime PM for this device |
pm_runtime_disable(dev) | Prevent new suspend/resume calls |
pm_runtime_get_sync(dev) | Increment usage count; power on if suspended (synchronous) |
pm_runtime_get_noresume(dev) | Increment usage count without resuming |
pm_runtime_put(dev) | Decrement usage count; may trigger suspend |
pm_runtime_put_autosuspend(dev) | Decrement count; trigger autosuspend timer |
pm_runtime_put_noidle(dev) | Decrement without scheduling suspend |
pm_runtime_mark_last_busy(dev) | Reset the autosuspend timer |
pm_runtime_set_active(dev) | Mark device as active (use before pm_runtime_enable) |
pm_runtime_set_suspended(dev) | Mark device as suspended (use after pm_runtime_disable) |
pm_runtime_suspended(dev) | Return true if device is runtime-suspended |
Power domains
Sometimes multiple devices share a clock, voltage rail, or reset signal. They cannot be individually suspended; the shared resource must be controlled as a unit. The kernel supports this through PM domains, represented bystruct dev_pm_domain:
dev->pm_domain is set, the PM core uses the domain’s ops callbacks instead of the bus or driver callbacks. This allows platform code to group devices and control the shared resource without modifying individual drivers.
PM domains are set up by platform code (SoC PM drivers), not by individual device drivers. If your hardware platform uses a PM domain controller (common on ARM SoCs with
genpd), the assignment happens automatically through Device Tree power-domains bindings.ACPI power management
On x86 systems with ACPI firmware, many devices are described in ACPI namespace and the kernel integrates with the ACPI PM layer. The ACPI layer hooks into the samedev_pm_ops callbacks but also handles D-states (D0–D3cold) as defined by the PCI and ACPI specifications.
Drivers on ACPI systems generally do not need to interact with ACPI directly. The PCI or platform bus layer handles the ACPI calls, and the driver’s suspend/resume callbacks are invoked at the right time. For devices that need explicit ACPI interaction, use acpi_device_set_power() and the ACPI_COMPANION() macro to obtain the ACPI handle from a struct device.
Wakeup sources
Some devices can generate a wakeup event that forces the system out of a sleep state—Ethernet wake-on-LAN, RTC alarms, keyboard activity, USB remote wakeup. A driver declares wakeup capability and manages it with:device_init_wakeup() registers the device as wakeup-capable and creates the /sys/devices/.../power/wakeup sysfs file. User space can write "enabled" or "disabled" to that file to control whether the device’s wakeup mechanism is actually armed during suspend.
enable_irq_wake() tells the interrupt controller to treat the specified IRQ as a wakeup source. It must be paired with disable_irq_wake() on resume.
Wakeup source accounting
Wakeup source accounting
The PM core tracks active wakeup events to avoid suspending while an event is being processed. If your driver generates software-initiated wakeup events (not hardware IRQs), use the wakeup source API:
Runtime PM and system sleep interaction
Runtime PM and system sleep interaction
If a device is already runtime-suspended when a system suspend begins, the PM core can skip the driver’s
suspend callback and leave the device in its current low-power state, provided the driver’s prepare callback returns a positive value. This optimization—called direct-complete—reduces suspend latency significantly on systems with many idle devices.Declare PM ops
Define a
const struct dev_pm_ops with SYSTEM_SLEEP_PM_OPS() and RUNTIME_PM_OPS() macros. Assign it to driver.pm.Enable runtime PM in probe()
Call
pm_runtime_set_active(), optionally configure autosuspend delay with pm_runtime_set_autosuspend_delay() and pm_runtime_use_autosuspend(), then call pm_runtime_enable().Guard I/O paths
Wrap hardware access with
pm_runtime_get_sync() / pm_runtime_put_autosuspend() pairs. Set pm_runtime_mark_last_busy() before putting when using autosuspend.Implement suspend/resume
In
suspend(), quiesce DMA, save hardware state, and optionally call enable_irq_wake(). In resume(), restore state and re-initialize hardware.Declare wakeup capability (if applicable)
Call
device_init_wakeup() in probe(). In suspend(), call enable_irq_wake() when device_may_wakeup() returns true.