Redis Cluster in Production: High Availability Architecture, Hash Slots, and a 6-Node Deployment Guide

Redis Cluster is Redis’s native distributed solution. It addresses single-node capacity limits, the inability to scale writes horizontally, and failover processes that otherwise depend on manual intervention. This article focuses on high-availability mode comparisons, hash slot routing, 6-node deployment, and troubleshooting. Keywords: Redis Cluster, hash slots, failover.

Technical Specifications at a Glance

Parameter Value
Core Topic Redis Cluster
Languages Bash, INI
Protocols / Mechanisms Gossip, primary-replica replication, Raft-like election
Redis Version 5.0.14
Deployment Topology 3 masters, 3 replicas
Operating System OpenEuler 24
Service Port 6379
Cluster Bus Port 16379
GitHub Stars Not provided in the source
Core Dependencies gcc, make, tar, zlib-devel

Redis high availability evolves in layers

Common Redis high-availability patterns include primary-replica replication, Sentinel, and Redis Cluster. These are not mutually exclusive. Instead, they represent different capability tiers for different scales.

Primary-replica replication solves data backup and read scaling, but write traffic still concentrates on the primary node. Sentinel adds automatic failover, but it does not provide true sharding. Redis Cluster provides distributed sharding, automatic failover, and horizontal scaling at the same time.

The three modes serve different purposes

Mode Core Capability Main Limitation Best Fit
Primary-replica replication Read/write separation, data backup Failover depends on manual intervention Small workloads
Sentinel Automatic primary failover Write scaling remains limited Medium-sized workloads
Redis Cluster Sharding, scaling, automatic failover Multi-key operations are limited, and only database 0 is supported Large-scale, high-concurrency workloads
# You must use -c when connecting to a cluster
redis-cli -h 192.168.10.101 -p 6379 -c

# After writing data, the client automatically redirects to the node that owns the target slot
set test_key hello_cluster
get test_key

These commands verify whether the client supports automatic cluster routing.

Redis Cluster uses hash slots for native sharding

Redis Cluster maintains exactly 16,384 hash slots. Every key is first processed with CRC16, then modulo 16384, and finally mapped to a specific primary node.

This design removes the need for application-side shard routing and turns scale-out data movement into slot migration instead of full database migration. Only primary nodes serve slot reads and writes. Replica nodes handle replication and failover takeover.

Hash slots and cluster communication form the basis of stability

The routing formula is CRC16(key) mod 16384. Nodes exchange status, slot distribution, and primary-replica role information through the Gossip protocol, which makes Redis Cluster fundamentally a decentralized topology.

When a primary node becomes unreachable and more than half of the primary nodes mark it as FAIL, its associated replica nodes start a promotion vote. The replica that receives a majority becomes the new primary. After the original primary recovers, it typically rejoins as a replica.

bind 127.0.0.1 192.168.10.101
protected-mode yes
port 6379
daemonize yes
appendonly yes

# Enable cluster mode
cluster-enabled yes
cluster-config-file nodes-6379.conf
cluster-node-timeout 15000
cluster-require-full-coverage no

This configuration defines the minimum required conditions for running Redis Cluster.

A 6-node deployment is the minimum high-availability unit for production

A 3-master, 3-replica deployment is the recommended baseline. The reason is straightforward: you need at least three primary nodes to form a stable majority, which ensures reliable failure detection and primary-replica failover.

The sample environment uses OpenEuler 24. Node IPs range from 192.168.10.101 to 192.168.10.106. All nodes use port 6379, and the Redis version is 5.0.14.

Every node must complete baseline initialization first

First disable the firewall and SELinux, then install build dependencies, extract the source package, and complete the installation. All six machines must use a consistent configuration. Otherwise, cluster creation can fail because of mismatched node capabilities.

# Disable the firewall and SELinux
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

# Install build dependencies
dnf -y install gcc* zlib-devel tar make

# Compile and install Redis
tar -zxvf redis-5.0.14.tar.gz
cd redis-5.0.14
make
make PREFIX=/usr/local/redis install
ln -s /usr/local/redis/bin/* /usr/local/bin/
cd utils
./install_server.sh

This script completes Redis compilation, installation, symlink creation, and service initialization.

Provide the full node list at cluster creation time

After all nodes are started, run the cluster creation command on any primary node. --cluster-replicas 1 means Redis automatically assigns one replica to each primary, resulting in a 3-master, 3-replica topology.

# Create a 3-master, 3-replica cluster in one step
redis-cli --cluster create --cluster-replicas 1 \
192.168.10.101:6379 \
192.168.10.102:6379 \
192.168.10.103:6379 \
192.168.10.104:6379 \
192.168.10.105:6379 \
192.168.10.106:6379

This command automatically assigns slots and establishes primary-replica replication relationships.

Cluster scaling must revolve around slot migration

There are two common ways to add a node: cluster meet introduces the node to the cluster, while add-node is better suited for standardized operational workflows. After a node joins, you must rebalance slots if the node is expected to carry traffic.

If you add a replica node, first join it to the cluster, then bind it to the target primary with cluster replicate. Before you remove a primary node, you must clear its slots. Otherwise, the removal fails because the node still owns hash slots.

Common operations should be documented as a fixed runbook

# View node and cluster status
redis-cli -h 192.168.10.101 -p 6379 cluster nodes
redis-cli --cluster check 192.168.10.101:6379
redis-cli -h 192.168.10.101 -p 6379 cluster info

# Add a node and trigger rebalancing
redis-cli --cluster add-node 192.168.10.108:6379 192.168.10.101:6379
redis-cli --cluster rebalance --cluster-threshold 1 --cluster-use-empty-masters 192.168.10.101:6379

These commands cover three high-frequency operational tasks: cluster inspection, scale-out onboarding, and slot rebalancing.

Most common failures relate to residual state and network connectivity

Slot 0 is already busy usually means the node still has stale slot metadata or the nodes-6379.conf file was not cleaned up. In this case, simply restarting the service usually does not work. You must clear the data and reset the cluster state.

Waiting for cluster to join is commonly caused by port 16379 not being open, nodes not having performed meet, or an incorrect bind configuration. In addition to the service port, Redis Cluster depends on the cluster bus port for node coordination.

# Clean up residual cluster state
redis-cli -h 节点IP -p 6379 flushall
redis-cli -h 节点IP -p 6379 cluster reset
rm -rf /var/lib/redis/6379/nodes-6379.conf
redis-server /etc/redis/6379.conf

These commands fix slot conflict issues caused by residual node state.

In production, recoverability matters more than peak performance

Enable AOF persistence, distribute primary and replica nodes across racks, and expose only ports 6379 and 16379 to trusted network ranges. For scaling, follow this sequence: add the node, configure replication, then rebalance. This helps avoid unstable routing during migration.

At a minimum, monitoring should cover node health, slot distribution, replication lag, command latency, and memory utilization. If replica nodes need to serve read traffic, run readonly after connecting.

FAQ

Why is a 3-master, 3-replica topology the minimum recommendation for Redis Cluster?

Because failure detection and new primary election depend on majority voting. With fewer than three primary nodes, network jitter or a single-node failure significantly reduces cluster availability and the accuracy of failure decisions.

Why must the client use -c when connecting to a Redis Cluster?

Because cluster data is distributed across different primary nodes by slot. The -c option allows the client to automatically follow MOVED/ASK redirections. Without it, you will frequently see errors or only access data from a single node.

Why must you clear slots before removing a primary node?

If a primary node still owns slots, it is still responsible for data routing. Removing it directly breaks slot integrity and leaves the cluster unhealthy. You must migrate or clear its slots before deletion.

AI Readability Summary

This article systematically reconstructs the core principles and implementation workflow of Redis Cluster. It covers a comparison of three high-availability modes, the 16,384-hash-slot model, Gossip-based communication, failover mechanisms, and the deployment, scaling, troubleshooting, and production practices for a 3-master, 3-replica cluster. It is well suited for development and operations teams that need to quickly build a horizontally scalable Redis cluster.