Spring Boot WebClient Connection Pool Tuning

Jun 02, 2026

Outbound HTTP calls can become one of the busiest parts of a Spring Boot service. One user request may reach a controller, then the service may call several outside APIs before it can send the response back. If every outbound call opened a brand-new TCP connection, the service would pay connection setup cost again and again. WebClient avoids that extra cost by running on an HTTP client, and with Reactor Netty available, that client can reuse pooled connections. Pool tuning controls how many connections can stay active, how long callers can wait for a connection, how long idle connections remain ready for reuse, and how the service reacts when traffic arrives in bursts. Good tuning does not mean raising every limit. It means matching the pool to the downstream service, expected request rate, latency target, and failure behavior you want during pressure.

How WebClient Uses Pooled Connections

WebClient is the API your Spring Boot code calls, but the lower HTTP client owns the socket behavior. With Reactor Netty available, WebClient can run through Reactor Netty’s HttpClient, and that client can keep connections open after a response finishes. Later requests to the same remote address can reuse those open connections instead of opening a brand-new TCP connection every time. Connection reuse is the part that makes pooling useful. The application code still reads like normal request code, but the client layer is managing a small group of network connections behind that request code. That pool controls which connections are active, which ones are idle, and which ones can be reused for the next call.

Client Connector Selection

WebClient needs a ClientHttpConnector before it can send HTTP traffic. The connector adapts Spring’s client API to the HTTP client library that performs the network I/O. In a Spring Boot application with Reactor Netty available, ReactorClientHttpConnector connects WebClient to Reactor Netty’s HttpClient.

The WebClient.Builder is usually the best place to attach this configuration. Service classes can then receive a named WebClient bean and make calls without rebuilding the HTTP client for every request. That keeps the connection resources tied to the Spring container instead of scattering client creation across several classes.

package com.alex.demo.http;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.reactive.function.client.WebClient;

@Configuration(proxyBeanMethods = false)
class OrdersClientConfig {

    @Bean
    WebClient ordersWebClient(WebClient.Builder builder) {
        return builder
                .baseUrl("https://orders.example.com")
                .build();
    }
}

This version relies on the builder that Spring Boot provides. It is enough for a basic outbound client, and it keeps the WebClient reusable. The service code does not need to create new clients for every request, and the connector can stay behind the bean configuration.

Custom pool settings start when the application creates a Reactor Netty ConnectionProvider, passes it to an HttpClient, and then gives that client to ReactorClientHttpConnector.

package com.alex.demo.http;

import java.time.Duration;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.http.client.reactive.ReactorClientHttpConnector;
import org.springframework.web.reactive.function.client.WebClient;

import reactor.netty.http.client.HttpClient;
import reactor.netty.resources.ConnectionProvider;

@Configuration(proxyBeanMethods = false)
class InventoryClientConfig {

    @Bean(destroyMethod = "dispose")
    ConnectionProvider inventoryConnectionProvider() {
        return ConnectionProvider.builder("inventory-pool")
                .maxConnections(60)
                .pendingAcquireTimeout(Duration.ofMillis(750))
                .maxIdleTime(Duration.ofSeconds(25))
                .build();
    }

    @Bean
    WebClient inventoryWebClient(
            WebClient.Builder builder,
            ConnectionProvider inventoryConnectionProvider
    ) {
        HttpClient httpClient = HttpClient.create(inventoryConnectionProvider);

        return builder
                .baseUrl("https://inventory.example.com")
                .clientConnector(new ReactorClientHttpConnector(httpClient))
                .build();
    }
}

The pool name gives the provider a readable identity. The destroyMethod gives Spring a way to dispose of the custom provider during shutdown, which keeps the client resource lifecycle tied to the application context.

Pool Reuse

Reusable connections come from HTTP keep-alive behavior. After a request completes and the response body has been consumed or released, Reactor Netty can return the channel to the pool. A later request to the same remote address can then receive that idle channel instead of opening a new socket. The service method does not manually borrow and return connections. Calls such as retrieve, bodyToMono, and bodyToFlux run through the configured connector, and Reactor Netty handles the pool interaction as part of the exchange. The main point for application code is that the response pipeline should finish normally so the connection can become reusable again.

package com.alex.demo.orders;

import org.springframework.stereotype.Service;
import org.springframework.web.reactive.function.client.WebClient;

import reactor.core.publisher.Mono;

@Service
class OrderLookupService {

    private final WebClient ordersWebClient;

    OrderLookupService(WebClient ordersWebClient) {
        this.ordersWebClient = ordersWebClient;
    }

    Mono<OrderSummary> findOrder(String orderId) {
        return ordersWebClient.get()
                .uri("/orders/{orderId}", orderId)
                .retrieve()
                .bodyToMono(OrderSummary.class);
    }
}

Nothing in that method mentions a pool, but the request still travels through the connector and its HttpClient. If the pool already has an idle connection for orders.example.com, the client can reuse it. If no idle channel is ready and the pool has room, the client can open a new connection.

Streaming responses behave differently from short JSON responses because the connection stays active while the stream remains open. That can be perfectly valid, but the pool sees that connection as active for the lifetime of the stream. Take this for example:

package com.alex.demo.events;

import org.springframework.core.ParameterizedTypeReference;
import org.springframework.stereotype.Service;
import org.springframework.web.reactive.function.client.WebClient;

import reactor.core.publisher.Flux;

@Service
class ShipmentEventService {

    private final WebClient shipmentWebClient;

    ShipmentEventService(WebClient shipmentWebClient) {
        this.shipmentWebClient = shipmentWebClient;
    }

    Flux<ShipmentEvent> streamEvents() {
        return shipmentWebClient.get()
                .uri("/shipments/events")
                .retrieve()
                .bodyToFlux(new ParameterizedTypeReference<>() {
                });
    }
}

While that Flux is active, the connection cannot return to the idle pool. Short responses can finish, release the channel, and make the connection available again. Long streams keep their channels busy, so we count them as active connections for as long as the stream stays open.

Remote Host Boundaries

Connection reuse stays tied to the remote address. Sockets opened for https://orders.example.com cannot be reused for https://inventory.example.com, because those sockets are already connected to a different destination. Different ports and schemes are treated as separate targets too. This means a warm pool for one downstream API does not warm every other downstream API. If a service calls an orders API, a profile API, and a billing API, those destinations do not share the same idle connections. They may share application code and configuration style, but the actual pooled channels still belong to their matching host and port.

package com.alex.demo.http;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.reactive.function.client.WebClient;

@Configuration(proxyBeanMethods = false)
class DownstreamClientsConfig {

    @Bean
    WebClient ordersWebClient(WebClient.Builder builder) {
        return builder
                .baseUrl("https://orders.example.com")
                .build();
    }

    @Bean
    WebClient profilesWebClient(WebClient.Builder builder) {
        return builder
                .baseUrl("https://profiles.example.com")
                .build();
    }
}

These two clients call different hosts, so connection reuse stays separated by destination. A returned connection from the orders API cannot serve a request to the profiles API. This is why pool size should be thought about per downstream target rather than only as one large number for the whole service.

Different schemes also create different targets. https://billing.example.com on port 443 and http://billing.example.com on port 80 are not the same connection destination. The pool treats them separately because the underlying socket target is different.

Timeout Layers

Several timeout values can be involved in one outbound call, and they do not all cover the same part of the request. Pool acquisition time is the wait for a connection from the pool. Connection time is the wait while opening a socket to the remote server. Response time is the wait after the request has been sent and the client is waiting for the remote response. Idle time applies after a connection has returned to the pool.

pendingAcquireTimeout belongs to the pool. It applies when a request is waiting to acquire a connection. If an idle connection is available right away, this timeout is not the main factor. If the pool has no available connection, the request can wait only as long as this value allows.

maxIdleTime applies to returned channels. Returned channels that sit idle longer than the configured time can be closed instead of reused, which helps keep the idle pool from holding quiet connections too long.

maxLifeTime is based on total connection age. Connections can be retired after they reach that age, even after they have handled several successful requests.

responseTimeout applies after the request has a connection and is waiting for the HTTP response. It does not control how long a request waits to acquire a pooled connection.

package com.alex.demo.http;

import java.time.Duration;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.http.client.reactive.ReactorClientHttpConnector;
import org.springframework.web.reactive.function.client.WebClient;

import io.netty.channel.ChannelOption;
import reactor.netty.http.client.HttpClient;
import reactor.netty.resources.ConnectionProvider;

@Configuration(proxyBeanMethods = false)
class CatalogClientConfig {

    @Bean(destroyMethod = "dispose")
    ConnectionProvider catalogConnectionProvider() {
        return ConnectionProvider.builder("catalog-pool")
                .maxConnections(40)
                .pendingAcquireTimeout(Duration.ofMillis(500))
                .maxIdleTime(Duration.ofSeconds(20))
                .maxLifeTime(Duration.ofMinutes(3))
                .build();
    }

    @Bean
    WebClient catalogWebClient(
            WebClient.Builder builder,
            ConnectionProvider catalogConnectionProvider
    ) {
        HttpClient httpClient = HttpClient.create(catalogConnectionProvider)
                .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 2000)
                .responseTimeout(Duration.ofSeconds(4));

        return builder
                .baseUrl("https://catalog.example.com")
                .clientConnector(new ReactorClientHttpConnector(httpClient))
                .build();
    }
}

The values in that client cover different stages of the outbound call. pendingAcquireTimeout controls pool waiting. CONNECT_TIMEOUT_MILLIS controls socket connection time. responseTimeout controls the wait for the response after the request has been sent. maxIdleTime and maxLifeTime control what happens to channels after they return to the pool.

Keeping these timeout layers separate prevents one delay from being confused with another. Raising responseTimeout will not help a request that fails while waiting for a pooled connection. Raising pendingAcquireTimeout will not help a remote API that accepts the request and then takes too long to respond.

Pool Pressure During Request Bursts

Traffic rarely arrives at a perfectly even pace. A service can spend most of the day making a normal amount of outbound calls, then receive a rush from users, scheduled jobs, retries, or upstream services. During those rushes, the connection pool decides how much traffic can move right away, how much waits for a connection, and when waiting turns into failure. The pool does not make outbound traffic unlimited, and it does not remove the limits of the downstream API. It gives the client a bounded place to reuse connections, cap active network calls, and place extra acquire requests into a waiting area before they are allowed to send HTTP traffic.

That waiting area can protect the client from opening too many sockets at the same time, but it also adds time before the outbound request reaches the downstream API.

Pending Acquire Queue

Requests enter pending acquire when the pool has no idle connection ready and has already reached its active connection limit. At that point, the client does not keep opening sockets forever. The acquire request waits for an active connection to finish and return to the pool, or it fails when the queue limit or timeout is reached. maxConnections controls how many active connections the pool can have for the target pool, pendingAcquireMaxCount controls how many acquire requests can wait after that active limit has been reached, and pendingAcquireTimeout controls how long an acquire request can wait before Reactor Netty fails it.

package com.alex.demo.http;

import java.time.Duration;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import reactor.netty.resources.ConnectionProvider;

@Configuration(proxyBeanMethods = false)
class PaymentPoolConfig {

    @Bean(destroyMethod = "dispose")
    ConnectionProvider paymentConnectionProvider() {
        return ConnectionProvider.builder("payment-pool")
                .maxConnections(50)
                .pendingAcquireMaxCount(100)
                .pendingAcquireTimeout(Duration.ofMillis(400))
                .metrics(true)
                .build();
    }
}

With these values, up to 50 requests can hold active connections at the same time for that pool. After those active slots are full, up to 100 more acquire requests can wait. If a waiting acquire request does not receive a connection within 400 milliseconds, it fails instead of staying parked behind the full pool.

The pending queue is not a second connection pool. It contains requests waiting for a connection, not extra sockets that are ready to send traffic. That distinction is worth keeping in mind because a request waiting in the pending queue has not reached the downstream API yet. From the caller’s point of view, time is already passing, but the remote service has not received that HTTP request.

Positive values for pendingAcquireMaxCount let the client absorb short bursts. Setting it to 0 makes the client fail fast when no idle connection is available and no new connection can be opened. Setting it to -1 removes the upper queue limit, which can create a large backlog during a downstream slowdown. Most services get better failure behavior from a bounded queue because the application stops accepting more waiting than it can complete in time.

We can choose a fail-fast style for a low-latency client like this:

package com.alex.demo.http;

import java.time.Duration;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import reactor.netty.resources.ConnectionProvider;

@Configuration(proxyBeanMethods = false)
class PricingPoolConfig {

    @Bean(destroyMethod = "dispose")
    ConnectionProvider pricingConnectionProvider() {
        return ConnectionProvider.builder("pricing-pool")
                .maxConnections(30)
                .pendingAcquireMaxCount(0)
                .pendingAcquireTimeout(Duration.ofMillis(1))
                .metrics(true)
                .build();
    }
}

With this configuration, requests do not sit in a pending acquire queue when the pool is full. That choice can fit an endpoint where stale answers are not useful and a quick failure is better than waiting behind a backlog that may already be too old by the time it runs.

Metrics give us a better view of what the queue is doing during traffic. Reactor Netty can publish connection provider metrics through Micrometer when metrics are active. Values such as active connections, idle connections, pending connections, max connections, and pending acquire time tell us where the delay is building. If pending connections rise while active connections are at the limit, the pool is saturated. If pending time rises during the same window, callers are spending part of their latency budget before the outbound request is sent.

Burst Latency

Request bursts raise latency in layers. The fastest calls get idle connections that are already in the pool. If the pool still has capacity, later calls can open new connections. After the active connection limit is reached, the next calls wait in pending acquire. If the pending queue fills or the acquire timeout expires, those calls fail before they send HTTP traffic to the downstream service.

That chain explains why average latency can look fine during calm traffic but climb quickly during a rush. Warm idle connections avoid connection setup cost, while pending acquire adds waiting time before the request leaves the client. The user-facing request can become slow before the remote API has seen that specific outbound call.

Raising maxConnections can reduce local pending time when the downstream service can handle the extra concurrent requests. If the downstream service is already slow, a larger pool can send more traffic into a service that is already taking longer to answer. Bounded pending acquire, shorter timeouts, retry limits, and less fan-out per user request can give the client better behavior during that kind of pressure.

Fan-out can multiply burst traffic quickly. An incoming request may call several downstream endpoints, so a small rush of incoming requests can turn into a larger number of outbound calls. We can cap concurrent outbound calls inside a reactive flow with the concurrency argument on flatMap:

package com.alex.demo.orders;

import java.util.List;

import org.springframework.stereotype.Service;
import org.springframework.web.reactive.function.client.WebClient;

import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;

@Service
class OrderBatchService {

    private final WebClient ordersWebClient;

    OrderBatchService(WebClient ordersWebClient) {
        this.ordersWebClient = ordersWebClient;
    }

    Mono<List<OrderStatus>> loadStatuses(List<String> orderIds) {
        return Flux.fromIterable(orderIds)
                .flatMap(this::loadStatus, 12)
                .collectList();
    }

    private Mono<OrderStatus> loadStatus(String orderId) {
        return ordersWebClient.get()
                .uri("/orders/{orderId}/status", orderId)
                .retrieve()
                .bodyToMono(OrderStatus.class);
    }
}

The 12 passed to flatMap limits how many status calls this flow runs at the same time. That limit does not replace the connection pool, but it reduces how aggressively this batch operation pushes requests toward the pool. The pool still protects the HTTP client across callers, while the local concurrency cap keeps this flow from filling the pool by itself.

Idle timeout behavior also affects burst latency. If idle connections are retired too quickly between bursts, the next rush has to open more new connections before reuse can help. If idle connections are held much longer than the downstream server or load balancer allows, the client may try to reuse a connection that the other side has already closed. A practical value is usually lower than the known idle timeout used by the downstream server or the load balancer in front of it.

During a burst, active connections tell us how many pool slots are occupied, idle connections tell us how much immediate reuse capacity remains, and pending connections tell us how much demand is waiting for access. High active usage with growing pending demand points to pool saturation. High response time with high active usage can mean connections are staying busy longer because the downstream service is answering more slowly. High idle count during slow requests points away from pool saturation and toward delay elsewhere in the call chain.

Good burst tuning keeps waiting bounded, the goal is not to hide every spike behind a long queue. Short queues can absorb a brief rush, but large queues can turn pressure into delayed failures and long user-facing latency. The pool should give the service room to handle normal bursts while still failing in a controlled way when the downstream call volume exceeds what the client and remote service can complete in time.

Conclusion

WebClient pool tuning comes down to how we let outbound calls get a connection, reuse it, wait for it, or fail when pressure gets too high. The connector sends our calls through Reactor Netty, the pool keeps reusable channels tied to their remote host, and the timeout settings separate pool waiting from socket connection time and response time. During bursts, active connections, idle connections, and pending acquire behavior show how traffic is moving through the client. Good tuning keeps that movement bounded so the service can reuse connections efficiently without hiding too much pressure behind long queues.

Share Alexander Obregon's Substack

Alexander Obregon's Substack

Discussion about this post

Ready for more?