[AI Readability Summary] This article focuses on request distribution after deploying multiple instances in a Spring Cloud microservices environment. It explains manual round-robin selection, Spring Cloud LoadBalancer integration, custom strategies, and Linux deployment verification. It addresses three common problems: single-node overload, hardcoded service addresses, and uneven traffic distribution after scaling.
Technical specifications are summarized below
| Parameter | Description |
|---|---|
| Core Language | Java |
| Core Frameworks | Spring Boot, Spring Cloud |
| Service Discovery Protocol | Eureka-based service registration and discovery |
| Load Balancing Mode | Client-side load balancing |
| Default Strategy | Round-robin |
| Extendable Strategies | Random, custom ReactorLoadBalancer |
| Deployment Environment | Linux, multiple JVM instances |
| Key Dependencies | spring-cloud-starter-loadbalancer, spring-web |
| Original Popularity Reference | 317 views, 4 likes, 9 bookmarks |
Direct calls to a single instance naturally cause traffic skew
In a microservices architecture, the goal of multi-instance deployment is not simply to “run more processes.” The real goal is to distribute traffic evenly across multiple nodes. If the consumer always selects the first instance from the service list, scaling out becomes meaningless.
A common mistake is to treat service discovery as only an “instance lookup” operation, without including “instance selection” in the invocation path. As a result, three machines may all be online, but requests still hit the same node every time, creating a pseudo-cluster.
// Only retrieve the instance list without performing load-balanced selection
List
<ServiceInstance> instances = discoveryClient.getInstances("product-service");
// Directly selecting the first instance causes traffic to be pinned to one node
ServiceInstance instance = instances.get(0);
String url = instance.getUri() + "/product/1";
The problem with this code is not whether the call succeeds. The real issue is that it breaks the basic assumptions behind high availability and horizontal scaling.
Log output can directly reveal load imbalance
When product-service starts three instances on ports 9090, 9091, and 9092, and the logs keep showing only 9090, the consumer is clearly not applying any balancing strategy.
product-service:9090
product-service:9090
product-service:9090
This pattern means the hot instance will be overloaded first, while idle instances never participate in request handling.
Manual round robin is useful for understanding the mechanism but not for production
To quickly understand load balancing, you can implement the most basic round-robin strategy with AtomicInteger. The idea is simple: maintain an incrementing counter, then use modulo against the number of instances to determine which node should receive the current request.
@Configuration
public class BeanConfig {
private static final AtomicInteger COUNTER = new AtomicInteger(0); // Atomic counter for thread safety
public ServiceInstance choose(List
<ServiceInstance> instances) {
int index = COUNTER.getAndIncrement() % instances.size(); // Use modulo to implement round robin
return instances.get(index); // Return the selected service instance
}
}
This code builds the smallest usable round-robin selector. It works well for teaching and debugging, but it should not directly carry production traffic.
The limitations of a handwritten solution are obvious
First, service-governance logic leaks into business code. Second, you must implement instance list caching, unhealthy instance eviction, and retry recovery on your own. Third, once you expand the strategy to support weights, randomness, or canary releases, complexity grows rapidly.
9091
9090
9092
9091
When the logs begin switching ports evenly, it only proves that round robin works. It does not mean the solution is maintainable at an engineering level.
The responsibility boundary between client-side and server-side load balancing must remain clear
Server-side load balancing is typically handled by standalone proxies such as Nginx, LVS, or F5. The caller connects only to the proxy entry point, and the proxy maintains the backend node list and decides where to forward each request.
Client-side load balancing places instance-selection logic inside the consumer itself. The consumer first retrieves the instance list from the registry, then selects a target node locally before sending the request. Spring Cloud LoadBalancer belongs to this category.
The core difference is who maintains the instance list
The server-side model centralizes topology management at the proxy layer, which simplifies unified governance. The client-side model pushes selection down to the caller, reducing extra forwarding hops and avoiding a single proxy bottleneck.
String url = "http://product-service/product/" + productId; // Use the service name instead of a fixed IP
ProductInfo productInfo = restTemplate.getForObject(url, ProductInfo.class);
The key change here is not the URL syntax itself. The real change is switching the addressing target from a physical address to a logical service name.
Spring Cloud LoadBalancer is the officially recommended client-side solution
Since Spring Cloud 2020, Ribbon has gradually left the mainstream path, and Spring Cloud LoadBalancer has become the official recommendation. Its strengths include simple integration, alignment with the Spring ecosystem, and support for per-service strategy customization.
The most common integration pattern is to inject a RestTemplate with @LoadBalanced so that each request automatically passes through the load-balancing interceptor chain before execution.
@Configuration
public class RestTemplateConfig {
@Bean
@LoadBalanced // Enable client-side load balancing
public RestTemplate restTemplate() {
return new RestTemplate();
}
}
This configuration gives RestTemplate the ability to resolve service names into instances. It does much more than simply register an HTTP client.
Service invocation code becomes significantly cleaner
After integration, the business layer no longer manually assembles the IP and port of a specific instance. It only needs to call the service by name. Service discovery, instance selection, and address replacement are handled uniformly by the framework.
public OrderInfo selectOrderById(Integer orderId) {
OrderInfo orderInfo = orderMapper.selectOrderById(orderId);
String url = "http://product-service/product/" + orderInfo.getProductId(); // Use the service name directly
ProductInfo productInfo = restTemplate.getForObject(url, ProductInfo.class);
orderInfo.setProductInfo(productInfo);
return orderInfo;
}
This code brings the business focus back to “what data to request” instead of “which machine to request it from.”
Custom load-balancing strategies support more advanced traffic governance
Spring Cloud LoadBalancer uses round robin by default, but you can also switch to random selection. For scenarios involving large instance differences, canary releases, or zone affinity, custom strategies provide much more value.
A common implementation pattern is to provide a ReactorLoadBalancer bean and bind it to a specific service through @LoadBalancerClient, rather than applying a coarse global override.
public class LoadBalancerConfig {
@Bean
public ReactorLoadBalancer
<ServiceInstance> randomLoadBalancer(
Environment environment,
LoadBalancerClientFactory factory) {
String name = environment.getProperty(LoadBalancerClientFactory.PROPERTY_NAME);
return new RandomLoadBalancer(
factory.getLazyProvider(name, ServiceInstanceListSupplier.class),
name
); // Inject a random strategy for the target service
}
}
This configuration replaces the default round-robin behavior with random selection and works well for quickly verifying a strategy switch.
Avoid accidental global scanning when binding custom strategies
If a custom strategy class is mistakenly scanned as a global configuration, all services may end up sharing the same strategy, which breaks the design goal of per-service customization.
@Configuration
@LoadBalancerClient(name = "product-service", configuration = LoadBalancerConfig.class)
public class BeanConfig {
@Bean
@LoadBalanced
public RestTemplate restTemplate() {
return new RestTemplate();
}
}
This code enables the specified strategy only for product-service and does not affect other downstream services.
The interceptor mechanism explains how a service name becomes a real address
The core of LoadBalancer is not mysterious. It simply adds an interception layer before a request is sent. It extracts the host from the URI as the service name, selects a target instance from the registry, and then rewrites the request address.
public ClientHttpResponse intercept(HttpRequest request, byte[] body, ClientHttpRequestExecution execution) {
String serviceName = request.getURI().getHost(); // Extract the logical service name
ServiceInstance instance = loadBalancer.choose(serviceName); // Select an instance according to the strategy
return execution.execute(request, body); // Continue forwarding with the selected real address
}
This pseudocode reveals the essence of client-side load balancing: logical-name addressing, local instance selection, and request rewriting.
The Linux multi-instance deployment process determines whether the solution truly works in practice
Running successfully in a local IDE does not mean the solution is production-ready. In a real deployment, you must at least complete five steps: package the JAR files, start the registry first, run services in the background, open the required ports, and verify instance status.
# Start the registry
nohup java -jar eureka-server.jar > logs/eureka.log &
# Start the order service
nohup java -jar order-service.jar > logs/order.log &
# Start three product service instances
nohup java -jar product-service.jar --server.port=9090 > logs/product-9090.log &
nohup java -jar product-service.jar --server.port=9091 > logs/product-9091.log &
nohup java -jar product-service.jar --server.port=9092 > logs/product-9092.log &
These commands deploy multiple ports for the same service in parallel, which is the basic prerequisite for verifying whether client-side load balancing is working.
Verification must check both the control plane and the data plane
First, open the Eureka console and confirm that all three product-service instances are in the UP state. Then call the order API and verify that product-service logs are distributed across 9090, 9091, and 9092. Only when both the control plane and the data plane behave correctly can you consider the deployment fully validated.
The conclusion is that production environments should prefer the official load-balancing component
Manual round robin is useful for understanding the underlying principle, but it is not suitable for long-term maintenance. Through standard annotations, extensible strategies, and a unified interceptor chain, Spring Cloud LoadBalancer elevates client-side load balancing from a coding trick into a framework capability.
In multi-instance microservices, the best practice is to invoke downstream services by service name, let LoadBalancer handle instance selection, and then improve production governance further with health checks, canary releases, and weighted routing.
FAQ
Q1: Why do requests still hit only one machine even though multiple instances are registered?
A: Because successful registration only makes instances discoverable. It does not automatically balance traffic for the caller. If your code directly uses instances.get(0) or hardcodes an IP address, traffic will still converge on a single node.
Q2: What does @LoadBalanced actually do?
A: It gives RestTemplate the ability to resolve service names and apply load-balancing interception, so a logical address such as http://product-service can be replaced with a real instance address before the request is sent.
Q3: When should you customize a LoadBalancer strategy?
A: You should customize the strategy at the service level when instance performance differs, when you need random traffic splitting, canary releases, same-zone preference, or weighted scheduling, instead of relying only on the default round-robin behavior.
AI Visual Insight: This article systematically reconstructs the invocation path for multi-instance microservices. It explains why direct single-instance calls create traffic skew, and compares manual round robin with Spring Cloud LoadBalancer across implementation approach, strategy extensibility, interception mechanics, and Linux multi-instance deployment. It is especially useful for Java microservices developers who need to implement client-side load balancing quickly and correctly.