Multi-Level Cache Architecture in Practice: A Guide to Coordinating Caffeine, Redis, and MySQL

[AI Readability Summary] For high-concurrency read workloads, this article redesigns a three-tier Caffeine + Redis + MySQL caching architecture. Its core value is to decouple hotspot queries from the bottlenecks of a single Redis layer, reduce network overhead and single-node CPU pressure, and still preserve consistency and recovery capabilities. Keywords: multi-level cache, Caffeine, Redis.

The technical specification snapshot defines the stack clearly

Parameter Description
Language Java
Protocols HTTP, Redis protocol, MySQL protocol
GitHub Stars Not provided in the source
Core Dependencies Caffeine, Spring Boot, Redis, MySQL, XXL-Job

A single Redis layer eventually hits a performance ceiling in hotspot read scenarios

A single Redis layer can sustain high QPS, but it does not provide infinite scalability. Hotspot endpoints such as flash-sale pages and course detail pages concentrate massive traffic on a small number of keys. The final bottlenecks typically appear in single-node CPU utilization, network round trips, and bursty expiration events.

When 2,000 concurrent requests continuously read the same course ID, Redis may still process commands very quickly, but network RT usually consumes most of the end-to-end time. As a result, upstream API latency increases. Even if the database remains protected, the cache layer itself becomes the hotspot amplifier first.

The three primary problems with a single Redis layer

Concentrated access to hotspot keys
→ Redis processes commands serially in a single thread
→ Single-core CPU approaches saturation
→ RT rises from 1ms to 50ms

Remote client access
→ Every request incurs a network round trip
→ 0.5ms to 2ms latency gets amplified under concurrency

Simultaneous cache expiration
→ Large numbers of requests penetrate to lower layers
→ Cache refill creates a SET storm

This flow shows that the issue with a single Redis layer is not just raw access speed. The real problem is the system-wide amplification effect under concentrated hotspot traffic.

Read/write separation is the first principle of a three-tier cache design

A multi-level cache does not mean that every request should always go through the cache first. A truly stable design caches only read requests, while write requests bypass the cache chain and execute actual mutations directly in Redis or the database.

The purpose is straightforward: avoid letting local cache intercept write operations such as inventory deduction, order creation, or payment confirmation. Otherwise, you risk stale data or delayed updates. Cache should accelerate reads, not carry business write semantics.

The read and write paths must be separated explicitly

@PostMapping("/seckill/deduct")
public Result deductStock(@RequestBody SeckillReq req) {
    // Write requests execute stock deduction directly without passing through local cache
    Long remain = redisLuaUtil.deductStock(req.getSkuId(), req.getUserId(), req.getCount());
    return Result.ok("Queued");
}

@GetMapping("/stock/{skuId}")
public Result
<Integer> getStock(@PathVariable Long skuId) {
    // Read requests may pass through the L1/L2/L3 multi-level cache
    Integer stock = stockQueryService.queryWithMultiLevelCache(skuId);
    return Result.success(stock);
}

This code illustrates a critical boundary: write operations go directly to the real data layer, while only read operations enter the cache interception chain.

The three-tier cache architecture should be divided by responsibility rather than by technology stack

L1 is JVM-local Caffeine, which serves as an ultra-low-latency hotspot cache. L2 is Redis, which maintains shared state across instances. L3 is MySQL, which stores the complete strongly consistent dataset. These three layers do not duplicate storage blindly. Instead, they distribute different access costs across layers.

In the ideal path, the request checks Caffeine first and returns immediately on a hit. On a miss, it checks Redis. Only after another miss does it fall back to MySQL, then writes back to L2 and L1 in order. This approach absorbs hotspot traffic while preserving shared consistency.

The three cache layers have distinct responsibilities

Cache Layer Location Typical Capacity Response Time Best Suited For Expiration Strategy
L1 Caffeine JVM heap Hundreds of MB <0.1ms Hotspot details, short-lived inventory views 3-30 seconds
L2 Redis Dedicated node or cluster GB-level 1-3ms Shared state, inventory, object cache 1-5 minutes
L3 MySQL Disk TB-level 10-100ms Full business data Persistent

The standard fallback flow follows a cache-aside pattern

public Course getCourse(Long courseId) {
    // Check local cache first and return immediately on hit
    Course course = courseCache.getIfPresent(courseId);
    if (course != null) return course;

    String redisKey = "course:" + courseId;
    String json = redisTemplate.opsForValue().get(redisKey);
    if (json != null && !"NULL".equals(json)) {
        // On a Redis hit, write back to L1 to reduce future remote access
        course = JSON.parseObject(json, Course.class);
        courseCache.put(courseId, course);
        return course;
    }

    // Fall back to the database when both cache layers miss
    course = courseMapper.selectById(courseId);
    if (course != null) {
        int expire = 300 + ThreadLocalRandom.current().nextInt(60); // Randomized expiration to prevent cache avalanche
        redisTemplate.opsForValue().set(redisKey, JSON.toJSONString(course), expire, TimeUnit.SECONDS);
        courseCache.put(courseId, course);
    } else {
        redisTemplate.opsForValue().set(redisKey, "NULL", 10, TimeUnit.SECONDS); // Cache null values to prevent cache penetration
    }
    return course;
}

This code implements a typical multi-level cache-aside fallback flow and includes two key safeguards: null caching and randomized TTL.

Caffeine is the preferred local cache because its performance and eviction strategy fit hotspot workloads better

Caffeine is not only fast. Its real advantage also comes from the W-TinyLFU eviction policy. It considers both access frequency and time decay, which helps prevent burst traffic from pushing genuinely hot data out of the cache.

Compared with traditional LRU, W-TinyLFU behaves more reliably for short traffic spikes, long-term hotspots, and hotspot switching. For pages such as flash-sale activity pages and course detail pages, this model matches real production traffic better than a strategy that only tracks recent access time.

Basic Caffeine configuration should be designed around the hotspot window

@Configuration
public class CaffeineConfig {

    @Bean
    public Cache<Long, Course> courseCache() {
        return Caffeine.newBuilder()
                .maximumSize(1000) // Control local cache size to avoid heap growth
                .expireAfterWrite(30, TimeUnit.SECONDS) // Allow short-term reuse for hotspot details
                .recordStats() // Enable hit-rate metrics for tuning
                .build();
    }
}

The goal of this configuration is not to cache as much as possible. It is to stabilize the most valuable hotspot window data.

Cache preheating and hotspot detection determine whether the system can survive the first flash-sale spike

If cache construction waits for the first user request, the first traffic spike at peak time will directly hit lower layers. The correct approach is to preload critical data such as course details, activity configurations, and inventory into Redis before the business event starts. If necessary, load them into Caffeine as well.

In addition, hotspots are not always known in advance. At runtime, you can detect high-frequency keys with sliding-window counters and Top-K strategies, then promote them proactively into local cache to create dynamic hotspot governance.

A typical cache preheating implementation before a flash sale

@XxlJob("cachePreheatJob")
public void preheat() {
    List
<Course> courses = courseMapper.selectTodaySeckillCourses();
    for (Course course : courses) {
        // Preload inventory into Redis for both write and read requests
        redisTemplate.opsForValue().set(
                "seckill:stock:" + course.getId(),
                String.valueOf(course.getSeckillStock()),
                2, TimeUnit.HOURS
        );

        // Preload details into both Redis and Caffeine
        redisTemplate.opsForValue().set(
                "course:" + course.getId(),
                JSON.toJSONString(course),
                2, TimeUnit.HOURS
        );
        courseCache.put(course.getId(), course);
    }
}

This preheating logic shifts the fallback cost of the first request batch into off-peak periods.

Consistency in a multi-level cache should be controllable rather than absolutely synchronous

A three-tier cache naturally introduces timing gaps between layers, so the consistency strategy must align with business tolerance. For read scenarios such as displayed inventory values, a local cache expiration of about three seconds is usually acceptable. For highly sensitive fields, you should shorten the stale-data window through active invalidation.

In practice, two strategies are common. The first is staggered TTL, where L1 expires faster than L2. The second is to broadcast invalidation events through Redis Pub/Sub, allowing each instance to clear its local cache proactively.

Active invalidation can significantly reduce the stale-read window

public void onStockChanged(Long courseId, int newStock) {
    // Update the shared layer first so the new value takes effect across instances
    redisTemplate.opsForValue().set("seckill:stock:" + courseId, String.valueOf(newStock));

    // Then broadcast an invalidation message to clear local hotspot caches in all JVMs
    redisTemplate.convertAndSend("cache:invalidate", "course:" + courseId);
}

The core effect of this logic is clear: it changes the consistency boundary from waiting for expiration to invalidating as soon as an event occurs.

Redis crash recovery must rely on a replayable business fact table

If inventory data exists only in Redis memory, it will be lost after a node restart. The key to recovery is not simply backing up the cache. The key is being able to derive the true state again from persistent facts.

The most important foundation here is the transaction log table. The recovery formula is straightforward: real inventory = original total inventory in MySQL – successfully deducted quantity recorded in the log table. Even if Redis data disappears, the system can still rebuild state through queries or scheduled jobs.

Query fallback recovery logic

public Integer queryStock(Long skuId) {
    String stockKey = "seckill:stock:" + skuId;
    String stock = redisTemplate.opsForValue().get(stockKey);
    if (stock != null) {
        return Integer.parseInt(stock);
    }

    // Rebuild the real inventory from original stock and the transaction log when Redis data is lost
    Integer totalStock = courseMapper.selectById(skuId).getSeckillStock();
    Integer deductedCount = deductLogMapper.getDeductedCount(skuId);
    Integer realStock = totalStock - deductedCount;

    redisTemplate.opsForValue().set(stockKey, String.valueOf(realStock), 2, TimeUnit.HOURS);
    return realStock;
}

This fallback code moves Redis from being the only source of truth back to being a rebuildable high-performance replica.

Cache penetration, breakdown, and avalanche must be governed in layers

Cache penetration happens when requests query non-existent data. The standard solutions are parameter validation, Bloom filters, and null caching. Cache breakdown happens when a hotspot key expires and many requests fall back at the same time. The standard solutions are distributed locks or a single-flight mechanism. Cache avalanche happens when many keys expire together, which requires randomized TTL and graceful degradation.

Pressure test results show that after adding Caffeine, the vast majority of hotspot requests are absorbed inside the JVM. Redis CPU usage drops from its peak, and P99 latency decreases significantly. For high-concurrency detail pages, this gain is much larger than continuing to optimize SQL alone.

AI Visual Insight: In production traffic patterns with concentrated hotspot reads, the largest latency reduction usually comes from eliminating repeated remote cache round trips rather than from shaving a few milliseconds off database queries. L1 local cache absorbs the sharpest spikes, L2 Redis preserves shared state, and L3 MySQL guarantees reconstructable truth.

FAQ

Q1: Why add Caffeine when Redis is already in place?

A1: Because Redis still has network overhead and a single-node hotspot ceiling. Caffeine keeps the highest-frequency reads inside the JVM, which significantly reduces remote access frequency and Redis CPU pressure.

Q2: Is inventory lookup always suitable for local cache?

A2: Not necessarily. If the business can tolerate 1 to 3 seconds of display delay, Caffeine is a good fit. If the business requires extreme real-time accuracy, keep inventory queries at the Redis layer only, or even read directly from the atomic inventory source.

Q3: How can a three-tier cache balance performance and consistency?

A3: The common approach combines read/write separation, staggered TTL, null caching, hotspot preheating, and active invalidation through Redis Pub/Sub. For critical fields, aim for fast invalidation. For general read scenarios, accept short-lived eventual consistency.

Core takeaway: This article systematically breaks down how Caffeine, Redis, and MySQL work together in a three-tier cache architecture. It covers read/write separation, hotspot detection, cache preheating, consistency control, Redis crash recovery, and pressure-test outcomes, helping you reduce hotspot query latency to the millisecond level in high-concurrency systems.