PREEMPT_RT reduces real-time latency by reshaping RCU read-side semantics in the Linux kernel, avoiding the delays introduced by traditional softirq and non-preemptible paths. This article focuses on the Read-Copy Update model, the grace period mechanism, and how Preemptible RCU replaces
preempt_disable()with nesting counters. Keywords: PREEMPT_RT, RCU, Grace Period.
Technical specification snapshot
| Parameter | Description |
|---|---|
| Technical topic | Linux kernel RCU / PREEMPT_RT |
| Primary language | C |
| Runtime environment | Linux kernel |
| Core mechanisms | Read-Copy Update, Grace Period, Preemptible RCU |
| Related configuration | CONFIG_PREEMPT_RCU |
| Protocol / interface | Kernel synchronization primitives |
| Stars | Not provided |
| Core dependencies | Scheduler, softirq, RCU subsystem, atomic pointer publication |
RCU is a highly concurrent synchronization mechanism for read-mostly workloads
RCU (Read-Copy Update) is well suited to data structures with many readers and few writers. Its goal is not to make writes faster. Instead, it makes the read path almost free. Readers do not compete for a mutex, which gives RCU excellent scalability on multicore systems.
In the Linux kernel, the most common value of RCU is this: readers see either the old version or the new version, but never a half-updated intermediate state. That property is especially important on critical paths such as linked lists, task structures, and routing tables.
struct foo *p;
rcu_read_lock(); // Mark the RCU read-side critical section
p = rcu_dereference(gbl_foo); // Safely read the global pointer
if (p)
do_something(p); // Use the currently visible version of the data
rcu_read_unlock(); // Exit the read-side critical section
This code shows the basic RCU read-side pattern: enter the critical section, read a published pointer, and access the data without locking.
RCU splits write operations into replacement and reclamation phases
Writers do not modify a shared object in place. Instead, they first create a copy, apply changes to that copy, and then publish the new pointer atomically. The old object is not freed immediately. It is reclaimed only after the grace period ends.
This design separates visibility switching from memory reclamation. As a result, a writer cannot free an old object while readers may still be accessing it, which avoids use-after-free bugs.
AI Visual Insight: The diagram shows the five-stage RCU update pipeline: the old object is copied first, the writer applies changes to the copy, and the new object is then published through an atomic pointer switch. After that, the system enters a grace period and waits for any readers that might still reference the old object to exit their critical sections before finally freeing the old memory. This flow highlights the core safety model of “publish first, reclaim later.”
The write-side update usually follows a fixed sequence
struct foo *old, *new;
old = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
new = kmemdup(old, sizeof(*old), GFP_KERNEL); // Copy the old object
new->value = 42; // Modify the copy
rcu_assign_pointer(gbl_foo, new); // Atomically publish the new object
synchronize_rcu(); // Wait for the grace period to finish
kfree(old); // Safely reclaim the old object
This code summarizes the typical RCU write-side flow: copy, modify, publish, wait, and reclaim.
The grace period determines when old data can be freed safely
The grace period is the core of RCU. It represents the time window from the publication of the new pointer until all readers that might still hold references to the old object have exited their read-side critical sections.
Once the system confirms that every CPU has passed through a point where it is no longer inside any old read-side critical section, the old object can be freed safely. This confirmation process usually relies on quiescent states such as scheduling points, context switches, transitions to user mode, or idle states.
AI Visual Insight: The diagram emphasizes that a grace period is not a fixed-duration timer. It is a completion condition derived from system execution state. Only when all CPUs are known to have passed through quiescent states can the system conclude that no old readers remain. The image clearly illustrates the relationship between the grace period, CPU observation points, and reader exit events.
The essence of a grace period is waiting for all old readers to leave
On a regular kernel, that wait is usually acceptable. In PREEMPT_RT, however, if the related processing runs for too long in non-preemptible context, it amplifies scheduling latency and directly harms real-time behavior.
PREEMPT_RT must redefine RCU behavior for real-time constraints
In mainline Linux, some RCU work depends on softirq context, and softirq paths often run with preemption disabled. That is acceptable for general throughput-oriented workloads, but in real-time systems it creates non-trivial long-tail latency.
For that reason, PREEMPT_RT is typically paired with a set of RCU optimizations: callback offloading, priority boosting, accelerated grace periods, and most importantly, preemptible RCU. All of these adjustments serve the same goal: reduce the blocking impact of non-real-time context on high-priority tasks.
/* PREEMPT_RT focus: allow read-side critical sections without relying on disabled preemption */
#ifdef CONFIG_PREEMPT_RCU
void __rcu_read_lock(void)
{
WRITE_ONCE(current->rcu_read_lock_nesting,
READ_ONCE(current->rcu_read_lock_nesting) + 1); // Record the nesting depth
barrier(); // Preserve critical-section ordering
}
#else
static inline void __rcu_read_lock(void)
{
preempt_disable(); // Traditional implementation disables preemption directly
}
#endif
This code shows the key change introduced by PREEMPT_RCU: entering the read side no longer simply relies on preempt_disable().
Preemptible RCU replaces disabled preemption with nesting counters
One important assumption in traditional RCU is that if a task is running with preemption disabled, it is effectively still inside an RCU read-side critical section. Grace period detection can therefore use schedulability as a signal.
In PREEMPT_RT, however, keeping preemption disabled for a long time damages real-time responsiveness. So Preemptible RCU instead maintains the nesting depth in current->rcu_read_lock_nesting. A task may be preempted even while it is inside a read-side critical section, and the system only needs to track precisely whether that task still holds RCU read-side protection.
This change makes real-time behavior more predictable
The read path remains lightweight because entering and leaving the critical section only updates a counter. At the same time, grace period detection shifts from “is preemption disabled” to “does any read-side nesting level remain active.”
That means PREEMPT_RT no longer trades coarse-grained scheduling blockage for safety. Instead, it uses finer-grained state tracking to guarantee correct reclamation. For high-priority real-time tasks, this is a more practical engineering tradeoff.
This design is especially suitable for highly concurrent and latency-sensitive kernel paths
If a system performs many lookups, traversals, and state reads while updates are relatively infrequent, RCU remains one of the most cost-effective synchronization mechanisms available. The value of PREEMPT_RT is not that it changes the theoretical RCU model, but that it allows the model to satisfy real-time constraints.
From an implementation perspective, the core difference can be summarized in one sentence: mainline RCU relies more on non-preemptible context to define who counts as a reader, while Preemptible RCU under PREEMPT_RT relies more on per-task nesting state to define readers.
FAQ
1. Why can the RCU read side be almost lock-free?
Because readers do not directly participate in write-side mutual exclusion and do not modify the shared object. Readers only consume published versions, while writers avoid conflicts through copying and deferred reclamation. That makes the read path extremely lightweight.
2. Why can PREEMPT_RT no longer rely on preempt_disable()?
Because disabling preemption extends the waiting time for high-priority tasks and creates unpredictable latency. Real-time systems care more about worst-case response time, so read-side critical sections must remain preemptible.
3. What determines that a grace period has ended?
The decision is based not on elapsed time, but on whether all readers that could still access the old object have exited their critical sections. Under PREEMPT_RCU, this usually depends on coordination between per-task read-side nesting counters and system quiescent states.
Core summary: This article reconstructs how Linux kernel RCU is implemented in a PREEMPT_RT environment, focusing on lockless reads, grace period detection, and the key differences introduced by Preemptible RCU. It explains how the kernel reduces real-time latency while still guaranteeing safe data reclamation.