JDK 21 Virtual Threads Deep Dive: Internals, Performance Benchmarks, and Tuning Best Practices - Devuly | Smart Analytics for Developers & Projects

[AI Readability Summary] JDK 21 virtual threads dramatically reduce thread overhead in high-concurrency I/O workloads through M:N scheduling, heap-stored stack frames, and extremely low creation cost. They address both the high cost of platform threads and the complexity of asynchronous programming. Keywords: JDK 21, Virtual Threads, Performance Tuning.

Table of Contents

Technical Specifications Snapshot

Parameter	Details
Language	Java
JDK Version	JDK 21
Concurrency Model	Virtual Threads / Project Loom
Scheduling Model	M:N scheduling, with virtual threads mapped to carrier threads
Default Scheduler	ForkJoinPool
Core Dependencies	OpenJDK 21, `java.util.concurrent`, `HttpClient`
Recommended Use Cases	I/O-intensive services, network requests, database access, message processing
Popularity Reference	Original post showed 629 views, 36 likes, and 32 saves

Virtual threads remove two core bottlenecks in traditional Java concurrency

Platform threads typically map directly to operating system kernel threads. Their creation, destruction, and context switching are all expensive. In production systems, once the thread count grows too large, memory pressure and scheduling overhead quickly consume throughput.

The other practical issue is the complexity of asynchronous models. Although CompletableFuture and reactive programming can improve resource utilization, chained callbacks, broken stack traces during debugging, and exception propagation significantly increase maintenance cost.

Virtual threads provide a more practical concurrency model

Virtual threads preserve the intuitive “one request, one thread” programming model while reducing thread creation cost to a very low level. For web services, database calls, and remote RPC, this model aligns more closely with how developers naturally write code and makes troubleshooting easier.

// Start a virtual thread directly in JDK 21
Thread.startVirtualThread(() -> {
    System.out.println("hello virtual thread"); // Core logic: execute the task inside a virtual thread
});

This snippet shows the minimal way to use a virtual thread. The code still looks synchronous, but the underlying scheduling model is completely different.

The core mechanism of virtual threads is JVM-managed M:N scheduling

The traditional thread model is 1:1: one Java thread maps to one kernel thread. Virtual threads use an M:N model instead: many virtual threads are mounted onto a small number of carrier threads, which are then mapped to OS threads.

When a virtual thread encounters a suspendable blocking point, such as sleep, network I/O, waiting for a lock, or other blocking operations, the JVM attempts to unmount it from the carrier thread so that the carrier thread can continue running other tasks. This is the fundamental reason virtual threads improve resource utilization under high concurrency.

The scheduling relationship can be summarized as a three-layer structure

Virtual Threads (massive scale)
  ↓ JVM scheduling
Carrier Threads (usually close to the number of CPU cores)
  ↓ 1:1
OS Kernel Threads

This means the number of virtual threads can be far greater than the number of machine threads, while scarce kernel threads are not occupied unnecessarily for long periods of time.

Virtual threads are lightweight because their stack frames can live on the heap

Platform threads usually reserve a large contiguous native stack, commonly around 1 MB. Virtual threads work differently: their execution state can be frozen through the Continuation mechanism and stored as heap objects, with an initial footprint of only a few hundred bytes.

That means blocking preserves a resumable call context rather than occupying an entire native stack continuously. This approach saves memory and also allows the garbage collector to manage the stored state.

Continuation is the key low-level abstraction behind virtual threads

// Illustrative code: shows the core idea of how virtual threads preserve the call stack
class Continuation {
    private Object[] stackFrames; // Frozen stack frames are stored in heap objects
    private int framePointer;     // Records the current execution position
}

This illustrative example shows that virtual threads do not eliminate stacks. Instead, they turn the stack into a more flexible and movable runtime structure.

The default scheduler uses ForkJoinPool to execute virtual threads

In JDK 21, virtual threads are scheduled by default on top of ForkJoinPool. Its parallelism is usually tied to the number of CPU cores, so it does not allocate one platform thread per task. Instead, it reuses a small set of carrier threads.

In high-concurrency environments, you can tune scheduler parallelism and maximum pool size through JVM options to match container quotas, CPU limits, or specific workload characteristics.

java \
  -Djdk.virtualThreadScheduler.parallelism=8 \
  -Djdk.virtualThreadScheduler.maxPoolSize=16 \
  -jar app.jar

These options control the size of the virtual thread scheduler so the defaults do not conflict with actual machine resources.

The correct production pattern is one virtual thread per task, not wrapping them in a traditional thread pool

The recommended virtual thread usage model is per-task execution. Create one virtual thread for each task, and let it terminate automatically when the task completes. This avoids artificially limiting concurrency and reduces the need for complex thread pool tuning.

If you continue applying the fixed thread pool mindset, you can cancel out the elasticity benefits of virtual threads and even introduce unnecessary blocking backlogs.

`newVirtualThreadPerTaskExecutor` is the recommended API

try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    for (int i = 0; i < 100_000; i++) {
        executor.submit(() -> {
            Thread.sleep(1000); // Simulate I/O wait; the virtual thread is suspended instead of occupying a carrier thread for the full duration
            return null;
        });
    }
} // Automatically waits for task completion when the scope ends

This example demonstrates the most typical virtual thread usage pattern and works well for large numbers of short blocking tasks.

Benchmark results show that virtual threads are better suited for high-concurrency I/O workloads

In an environment with 8 CPU cores, 16 GB of memory, and OpenJDK 21, a test with 100,000 short tasks showed that a fixed platform thread pool completed in about 11.2 seconds, while virtual threads completed in about 1.3 seconds. A cached platform thread pool could even run out of memory because of uncontrolled thread growth.

In simulated network I/O, virtual threads improved throughput by roughly 37% compared with an asynchronous approach backed by a platform thread pool. The main gains came from fewer context switches and much higher thread density.

Here is a simplified HTTP concurrency test example

HttpClient client = HttpClient.newHttpClient();
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    for (int i = 0; i < 20_000; i++) {
        executor.submit(() -> {
            var req = HttpRequest.newBuilder(URI.create("http://localhost:8080/delay?ms=100"))
                    .GET()
                    .build();
            client.send(req, HttpResponse.BodyHandlers.ofString()); // Synchronous call, but virtual threads carry the high concurrency
            return null;
        });
    }
}

This example shows that virtual threads let developers write highly concurrent I/O code in a synchronous style.

The key to performance tuning is not adding more threads blindly, but identifying pinned threads

The main risk to watch for with virtual threads is pinning, where a virtual thread becomes stuck on a carrier thread and cannot be unmounted. Common triggers include running long blocking operations inside synchronized blocks or entering certain native or non-suspendable regions.

Once pinning occurs, the carrier thread becomes truly blocked, and throughput can drop sharply. In the original data, a long sleep inside a synchronized block caused performance to degrade by about 8x.

Avoid blocking operations inside `synchronized`

ReentrantLock lock = new ReentrantLock();

lock.lock();
try {
    Thread.sleep(1000); // Even when waiting occurs, this pattern is more likely to avoid pinning the carrier thread for a long time
} finally {
    lock.unlock(); // Always release the lock
}

The point of this example is not that Lock is always faster, but that it better fits the suspend-and-resume semantics of virtual threads.

Enable pinned thread tracing in production diagnostics

java \
  -Djdk.tracePinnedThreads=short \
  -Djdk.virtualThreadScheduler.parallelism=16 \
  -XX:+UseZGC \
  -jar myapp.jar

These startup options help you quickly identify hotspots where virtual threads lose their lightweight behavior.

Project Loom structured concurrency completes task lifecycle management

Virtual threads reduce execution cost, while Structured Concurrency solves task organization. It keeps multiple concurrent subtasks within the same scope, making failure handling, cancellation, and result aggregation more predictable.

For aggregate queries, parallel RPC, or scenarios such as fetching user data and order data together, this model is much clearer than managing Future objects manually.

try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    Future
<String> user = scope.fork(() -> fetchUser());   // Query user information concurrently
    Future
<Integer> order = scope.fork(() -> fetchOrder()); // Query order count concurrently
    scope.join();
    scope.throwIfFailed();
    return user.resultNow() + ":" + order.resultNow();
}

This example shows how structured concurrency improves readability and failure propagation in concurrent logic.

Migration to virtual threads should follow clear boundaries rather than a full replacement strategy

The best candidates for migration are I/O-intensive applications, such as web services, database access, message consumption, and long-lived connection handling. CPU-intensive tasks do not become faster automatically with virtual threads and still require a sensible number of platform threads and proper task partitioning.

During migration, first inspect blocking points, long critical sections guarded by synchronized, and whether third-party drivers are Loom-friendly. Then gradually replace executors and container configurations. For example, Tomcat 11+ and Jetty 12 already support virtual threads.

FAQ

1. Can virtual threads replace all thread pools?

No. They primarily optimize I/O-intensive concurrency. CPU-intensive computation should still control the number of platform threads; otherwise, you only increase scheduling overhead.

2. Why does virtual-thread code look blocking but still deliver high throughput?

Because when blocking occurs, the JVM unmounts the virtual thread from the carrier thread and frees the underlying thread to execute other work. Unlike platform threads, it does not keep occupying kernel resources throughout the wait.

3. What should I investigate first when migrating to virtual threads?

Start by checking for long blocking operations inside synchronized, native calls, outdated drivers, and misuse of thread pools. Then enable jdk.tracePinnedThreads to locate pinning issues.

Key takeaways

This article systematically reconstructs the essential technical points behind JDK 21 virtual threads, covering M:N scheduling, Continuation-based stack storage, the default scheduler, benchmark results, pinned thread risks, and migration strategy. It helps Java developers adopt virtual threads safely in high-concurrency I/O scenarios.