Every incoming request should have a time budget that covers the full trip through the service, from local logic and database access to remote HTTP calls, JSON serialization, and the final response. Without that shared limit, a downstream call can receive its own full timeout, spend almost the entire window waiting, and leave the rest of the request with little time left to finish. Deadline propagation handles that by turning the request budget into an exact expiration time, passing that value through the call chain, and checking how much time remains before starting the next remote operation.
Request Time Budgets
Incoming request gets one time window, and all processing for that request has to fit inside it. That includes local validation, controller logic, service calls, database queries, remote HTTP calls, JSON mapping, response serialization, and the final write back to the caller. If the request budget is 2 seconds, the app does not really have 2 seconds for inventory, 2 seconds for pricing, and 2 seconds for shipping. It has 2 seconds total for the full request.
Timeout behavior can get surprising here because separate timeout values do not automatically protect the full request. Developers usually add timeouts to HTTP clients, database pools, async calls, and queue reads because no service should wait forever, and that habit is still right. The missing piece is the shared request limit. Without it, one slow dependency can spend most of the caller’s patience before the app even reaches the next step.
Operation-level timeouts still have value, but they answer a narrower question. They tell one client, query, or async operation how long it may wait. Request budgets answer a wider question about how long the caller can wait for the whole response.
Timeouts Compared With Deadlines
The difference between a timeout and a deadline comes down to duration versus expiration. Timeout means a length of time, such as 250 milliseconds to connect or 900 milliseconds to wait for a response. Deadline means the exact moment when the request should be treated as out of time. If a request starts at 10 00 00.000 with a 2 second budget, the deadline is 10 00 02.000. Later steps subtract the current time from that deadline to find the remaining budget.
That difference changes how we read the flow of a request. Copying the same timeout into every call ignores earlier cost. Carrying a deadline keeps that cost visible. If validation took 80 milliseconds, a database lookup took 220 milliseconds, and inventory took 700 milliseconds, the next call does not receive the original 2 seconds. It receives what remains after those earlier steps have already spent part of the request window.
The small calculation keeps the time budget visible as we move through the request:
import java.time.Duration;
import java.time.Instant;
public class DeadlineMathDemo {
public static void runDemo() {
Instant requestStartedAt = Instant.parse("2026-06-01T15:00:00Z");
Duration requestBudget = Duration.ofSeconds(2);
Instant deadline = requestStartedAt.plus(requestBudget);
Instant afterValidation = requestStartedAt.plusMillis(80);
Instant afterDatabaseLookup = afterValidation.plusMillis(220);
Instant afterInventoryCall = afterDatabaseLookup.plusMillis(700);
Duration remaining = Duration.between(afterInventoryCall, deadline);
System.out.println("Deadline = " + deadline);
System.out.println("Remaining milliseconds = " + remaining.toMillis());
}
}We start with a 2 second request budget, then subtract the time already spent by validation, a database lookup, and the inventory call. The result leaves 1000 milliseconds. That remaining time still has to cover the next remote call, response mapping, and the final response write. If the next client call has a 1500 millisecond read timeout, that timeout no longer matches the request’s actual budget.
The app can set that client timeout, but the caller’s request has only 1000 milliseconds left. Deadline math makes that visible before the next remote call starts.
Deadlines also give us a better way to reject late processing before it starts. If the request has 40 milliseconds left and a downstream call usually needs several hundred milliseconds, starting that call adds traffic with very little chance of returning a useful result. The service can fail fast, skip optional processing when the API contract allows it, or return a timeout response that matches the caller’s time window.
This helper keeps the remaining-time check close to the request budget itself:
import java.time.Duration;
import java.time.Instant;
public final class RequestTimeBudget {
private final Instant deadline;
private RequestTimeBudget(Instant deadline) {
this.deadline = deadline;
}
public static RequestTimeBudget fromNow(Duration totalBudget) {
return new RequestTimeBudget(Instant.now().plus(totalBudget));
}
public Duration timeLeft() {
Duration remaining = Duration.between(Instant.now(), deadline);
if (remaining.isNegative() || remaining.isZero()) {
return Duration.ZERO;
}
return remaining;
}
public boolean hasAtLeast(Duration minimumNeeded) {
return timeLeft().compareTo(minimumNeeded) >= 0;
}
}The hasAtLeast method reads like a request-level question rather than a network-level question. We are not asking how long an HTTP socket can wait. We are asking if the request still has enough time left to start the next step with a reasonable chance of finishing.
Timeout values still belong in the service. Connection timeouts, read timeouts, query timeouts, and reactive operator timeouts all stop specific waits from running too long. The deadline sits above those operation-level limits and keeps them tied to the caller’s full request window.
Budget Loss Across Calls
Remote calls spend the request budget one after the other unless they run in parallel. In a checkout-style flow, inventory may run first, pricing may run second, and shipping may run third. If every downstream client has a 1 second timeout, the code can wait up to 3 seconds across those calls before counting local logic, while the incoming caller may have expected an answer within 1500 milliseconds. That mismatch does not require a broken service or a missing timeout. The service can have timeouts everywhere and still miss the caller’s window because those timeouts are independent. Every downstream call receives a fresh wait limit, while the caller experiences one continuous wait from the outside.
The timeline below shows the mismatch in Java:
import java.time.Duration;
public class SeparateTimeoutsDemo {
public static void runDemo() {
Duration incomingRequestBudget = Duration.ofMillis(1500);
Duration inventoryTimeout = Duration.ofMillis(1000);
Duration pricingTimeout = Duration.ofMillis(1000);
Duration shippingTimeout = Duration.ofMillis(1000);
Duration possibleWait =
inventoryTimeout
.plus(pricingTimeout)
.plus(shippingTimeout);
System.out.println("Incoming budget = " + incomingRequestBudget.toMillis() + " ms");
System.out.println("Possible downstream wait = " + possibleWait.toMillis() + " ms");
}
}The output shows 1500 milliseconds for the incoming request, while the downstream wait can add up to 3000 milliseconds. The local service could be behaving exactly as configured and still return too late for the caller.
Shared deadlines change the calculation after every completed step. If inventory consumes 950 milliseconds from a 1500 millisecond budget, only about 550 milliseconds remain. Pricing cannot honestly receive a full 1000 milliseconds anymore. If pricing then consumes 400 milliseconds, shipping has roughly 150 milliseconds left before the caller’s request window is gone.
import java.time.Duration;
public class SharedDeadlineTimeline {
public static void runDemo() {
Duration requestBudget = Duration.ofMillis(1500);
Duration inventoryElapsed = Duration.ofMillis(950);
Duration pricingElapsed = Duration.ofMillis(400);
Duration afterInventory = requestBudget.minus(inventoryElapsed);
Duration afterPricing = afterInventory.minus(pricingElapsed);
System.out.println("Remaining after inventory = " + afterInventory.toMillis() + " ms");
System.out.println("Remaining after pricing = " + afterPricing.toMillis() + " ms");
}
}We should not read that last 150 milliseconds as an automatic failure. Some calls are cached, local, or fast enough to finish inside a small window. The decision should come from the time left, not from the original per-call timeout. If shipping needs a network round trip and response body parsing, 150 milliseconds may be too small for a useful attempt.
Retries can drain the same budget even faster. Starting a retry can sound harmless when it is framed as one more attempt, but it still spends time from the original caller window. If a first attempt waits 600 milliseconds, then a retry waits another 600 milliseconds, that pair already consumed 1200 milliseconds before the service has handled the rest of the request.
import java.time.Duration;
public class RetryBudgetDemo {
public static void runDemo() {
Duration requestBudget = Duration.ofMillis(1500);
Duration firstAttempt = Duration.ofMillis(600);
Duration retryAttempt = Duration.ofMillis(600);
Duration responseMargin = Duration.ofMillis(100);
Duration remaining =
requestBudget
.minus(firstAttempt)
.minus(retryAttempt)
.minus(responseMargin);
System.out.println("Remaining after retry plan = " + remaining.toMillis() + " ms");
}
}The final 100 milliseconds in that calculation is reserved for response handling. Returning a timeout response still takes time because the service has to map the failure, set the status, build the body if it returns one, serialize it, and send it back. Spending the last millisecond on a remote call can leave the app with no room to return a useful answer.
Parallel calls change the timeline, but they do not remove the need for a request budget. If inventory, pricing, and shipping all start at the same time, the total wait is closer to the slowest call than the sum of all three. The request still has one deadline, and a slow shipping call can still consume the full window if the service waits for it before responding.
When parallel processing is required, the deadline still tells the service when to stop waiting for the remaining branch and move into timeout handling. Budget loss comes from the gap between operation-level waiting and caller-level waiting. Operation-level timeouts answer how long one thing can wait, while request budgets answer how long the caller can wait for everything. Spring Boot services need both ideas because callers see the full request, not the internal list of waits that happened along the way.
Deadline Propagation in Code
Spring Boot code has to carry the deadline from the edge of the request to the places that spend time. We capture the deadline near the HTTP entry point, store it for the active request flow, pass it to downstream calls, and check it before slower operations begin. The storage choice depends on the web stack. Servlet-based Spring MVC code can keep the deadline in a request-scoped holder during the request thread. Reactive WebFlux code should pass the deadline through Reactor context or method arguments because reactive execution can resume on a different thread. The main idea stays the same. Later code reads the same expiration time instead of creating a fresh budget.
Inbound Budget Capture
Servlet requests enter the app through the servlet container before they reach a controller, so OncePerRequestFilter gives us a good place to read or create the deadline. We can look for a caller-provided deadline header, fall back to a local default, cap overly large caller values, and reject the request if the deadline has already expired.
The deadline type below keeps the time math away from controller and service methods:
package com.example.deadline;
import java.time.Duration;
import java.time.Instant;
public final class RequestDeadline {
private final Instant expiresAt;
private RequestDeadline(Instant expiresAt) {
this.expiresAt = expiresAt;
}
public static RequestDeadline after(Duration budget) {
return new RequestDeadline(Instant.now().plus(budget));
}
public static RequestDeadline fromHeader(
String rawHeader,
Duration fallbackBudget,
Duration maximumBudget) {
Instant localMaximum = Instant.now().plus(maximumBudget);
if (rawHeader == null || rawHeader.isBlank()) {
return after(fallbackBudget);
}
try {
long epochMillis = Long.parseLong(rawHeader);
Instant callerDeadline = Instant.ofEpochMilli(epochMillis);
if (callerDeadline.isAfter(localMaximum)) {
return new RequestDeadline(localMaximum);
}
return new RequestDeadline(callerDeadline);
} catch (NumberFormatException ex) {
return after(fallbackBudget);
}
}
public Duration remaining() {
Duration remaining = Duration.between(Instant.now(), expiresAt);
if (remaining.isNegative() || remaining.isZero()) {
return Duration.ZERO;
}
return remaining;
}
public boolean expired() {
return remaining().isZero();
}
public long toEpochMillis() {
return expiresAt.toEpochMilli();
}
}We store the deadline as an Instant, not as a remaining duration. Duration values age immediately after we calculate them, while an absolute expiration time can be checked again later. The remaining method reads the clock at the moment code calls it, so service code gets the latest amount of time left.
The local cap protects service policy. Callers should not be able to grant themselves a longer request window by sending a far-future header. In the code above, maximumBudget sets the local upper bound, while fallbackBudget handles requests that do not send a header or send a malformed value.
For servlet code that stays on the request thread, ThreadLocal can hold the deadline during the request:
package com.example.deadline;
public final class RequestDeadlineContext {
private static final ThreadLocal<RequestDeadline> CURRENT = new ThreadLocal<>();
private RequestDeadlineContext() {
}
public static void set(RequestDeadline deadline) {
CURRENT.set(deadline);
}
public static RequestDeadline get() {
return CURRENT.get();
}
public static void reset() {
CURRENT.remove();
}
}This holder stores only the active request deadline and removes it when request handling finishes. The reset call is not optional in servlet code because containers reuse threads. Leaving request data in a reused thread can cause a later request to read the wrong deadline.
Now we can connect the deadline type and holder inside a Spring filter:
package com.example.deadline;
import java.io.IOException;
import java.time.Duration;
import jakarta.servlet.FilterChain;
import jakarta.servlet.ServletException;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import org.springframework.http.HttpStatus;
import org.springframework.stereotype.Component;
import org.springframework.web.filter.OncePerRequestFilter;
@Component
public class RequestDeadlineFilter extends OncePerRequestFilter {
private static final String DEADLINE_HEADER = "X-Request-Deadline-Ms";
private static final Duration FALLBACK_BUDGET = Duration.ofSeconds(2);
private static final Duration MAXIMUM_BUDGET = Duration.ofSeconds(2);
@Override
protected void doFilterInternal(
HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain) throws ServletException, IOException {
String rawDeadline = request.getHeader(DEADLINE_HEADER);
RequestDeadline deadline = RequestDeadline.fromHeader(
rawDeadline,
FALLBACK_BUDGET,
MAXIMUM_BUDGET);
if (deadline.expired()) {
response.sendError(
HttpStatus.GATEWAY_TIMEOUT.value(),
"Request deadline expired");
return;
}
RequestDeadlineContext.set(deadline);
try {
filterChain.doFilter(request, response);
} finally {
RequestDeadlineContext.reset();
}
}
}Here we read the header before the controller runs, build the request deadline, and store it for the servlet flow. If the deadline has already expired, the controller does not run. That stops late requests before they start database access or remote HTTP calls that already missed the caller’s time window.
Outbound Header Propagation
Downstream services can follow the same deadline only when we send it forward. The outbound call should carry the absolute expiration time, and it can also send the current remaining milliseconds for logs or gateway behavior. The absolute value does the main job because every service can calculate its own remaining time from the same expiration point.
Spring’s RestClient supports request interceptors, so we can add the deadline header to every outbound request from a client. The interceptor can also stop the call before it leaves the service if the request is already out of time:
package com.example.deadline;
import java.io.IOException;
import java.net.SocketTimeoutException;
import java.time.Duration;
import org.springframework.http.HttpRequest;
import org.springframework.http.client.ClientHttpRequestExecution;
import org.springframework.http.client.ClientHttpRequestInterceptor;
import org.springframework.http.client.ClientHttpResponse;
public class DeadlineForwardingInterceptor implements ClientHttpRequestInterceptor {
private static final String DEADLINE_HEADER = "X-Request-Deadline-Ms";
private static final String REMAINING_HEADER = "X-Request-Timeout-Ms";
@Override
public ClientHttpResponse intercept(
HttpRequest request,
byte[] body,
ClientHttpRequestExecution execution) throws IOException {
RequestDeadline deadline = RequestDeadlineContext.get();
if (deadline == null) {
return execution.execute(request, body);
}
Duration remaining = deadline.remaining();
if (remaining.isZero()) {
throw new SocketTimeoutException("Request deadline expired before outbound call");
}
request.getHeaders().set(DEADLINE_HEADER, Long.toString(deadline.toEpochMillis()));
request.getHeaders().set(REMAINING_HEADER, Long.toString(remaining.toMillis()));
return execution.execute(request, body);
}
}This interceptor does not choose the HTTP client’s connect or read timeout. It reads the request deadline, adds headers, and prevents late outbound traffic. Client-level timeouts still belong in the client configuration because sockets and response reads need their own limits.
Header propagation also needs a trust boundary. Services usually accept deadline headers from internal callers, API gateways, or trusted service-to-service traffic. Public callers should not be allowed to extend the service’s request window, which is why inbound code caps the deadline before storing it.
The remaining-time header helps with logs, but it should not replace the absolute deadline. The value changes while the request travels across the network. The receiving service should prefer the absolute deadline header, then calculate remaining time when the request arrives.
RestClient Per Call Flow
Synchronous HTTP calls block the current thread until the response arrives, a connection failure happens, or the configured client timeout fires. That means timeout layering has to be deliberate. The request deadline tells us how much caller time remains, while the RestClient request factory controls network-level limits such as connection time and read time.
Spring Boot can build the request factory from HttpClientSettings. We can set conservative network limits and attach the deadline interceptor to the client:
package com.example.deadline;
import java.time.Duration;
import org.springframework.boot.http.client.ClientHttpRequestFactoryBuilder;
import org.springframework.boot.http.client.HttpClientSettings;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.http.client.ClientHttpRequestFactory;
import org.springframework.web.client.RestClient;
@Configuration(proxyBeanMethods = false)
public class InventoryRestClientConfig {
@Bean
RestClient inventoryRestClient(RestClient.Builder builder) {
HttpClientSettings settings = HttpClientSettings.defaults()
.withConnectTimeout(Duration.ofMillis(250))
.withReadTimeout(Duration.ofMillis(900));
ClientHttpRequestFactory requestFactory =
ClientHttpRequestFactoryBuilder.detect().build(settings);
return builder
.baseUrl("https://inventory.internal")
.requestFactory(requestFactory)
.requestInterceptor(new DeadlineForwardingInterceptor())
.build();
}
}We give the client a short connect timeout because failed connection attempts should not consume much of the request window. The read timeout is longer because response data can reasonably take more time than opening a connection. Those values are still operation-level limits. Before a service method calls the client, it should check the request deadline too.
package com.example.deadline;
import java.time.Duration;
import org.springframework.http.HttpStatus;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestClient;
import org.springframework.web.server.ResponseStatusException;
@Service
public class InventoryLookupService {
private final RestClient inventoryRestClient;
public InventoryLookupService(RestClient inventoryRestClient) {
this.inventoryRestClient = inventoryRestClient;
}
public InventoryDto findBySku(String sku) {
RequestDeadline deadline = RequestDeadlineContext.get();
if (deadline != null && !deadlineHasRoom(deadline, Duration.ofMillis(75))) {
throw new ResponseStatusException(
HttpStatus.GATEWAY_TIMEOUT,
"Request deadline expired before inventory lookup");
}
return inventoryRestClient.get()
.uri("/inventory/{sku}", sku)
.retrieve()
.body(InventoryDto.class);
}
private boolean deadlineHasRoom(RequestDeadline deadline, Duration minimumRemaining) {
return deadline.remaining().compareTo(minimumRemaining) >= 0;
}
}We check for a small remaining window before starting the remote call. The value does not have to match the read timeout. It represents the lowest amount of caller time that makes the call worth attempting. Some endpoints can use a smaller threshold. Other endpoints need a larger one because they still have database updates, response mapping, or follow-up calls after the remote response returns. Blocking cancellation has limits though. If calling code wraps a blocking RestClient operation in a future and stops waiting early, the underlying HTTP exchange can keep running until the client timeout or I/O layer stops it. Short connection timeouts, bounded read timeouts, a deadline check before starting the call, and retry logic tied to the same deadline make the blocking flow safer.
Retry decisions should read the deadline again before every attempt. The second attempt should not receive the same wait window as the first attempt. It should run only if the request still has enough time left for that attempt and for the response that follows.
WebClient Cancellation Timing
Reactive HTTP calls return a publisher instead of blocking the caller thread. That gives WebClient a natural place to apply the remaining request time with Reactor’s timeout operator. If the remote response does not arrive before the remaining time expires, the publisher fails and the subscription is canceled from the client side. Reactive code should not depend on the servlet-style ThreadLocal holder. Reactor chains can resume on a different thread, so the deadline should travel through Reactor context or be passed as a method argument. The example below reads the deadline from Reactor context, forwards the header, and applies the remaining time to the outbound call:
package com.example.deadline;
import java.time.Duration;
import java.util.concurrent.TimeoutException;
import org.springframework.stereotype.Service;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.core.publisher.Mono;
@Service
public class ReactiveInventoryLookupService {
private final WebClient inventoryWebClient;
public ReactiveInventoryLookupService(WebClient.Builder builder) {
this.inventoryWebClient = builder
.baseUrl("https://inventory.internal")
.build();
}
public Mono<InventoryDto> findBySku(String sku) {
return Mono.deferContextual(contextView -> {
RequestDeadline deadline = contextView.get(RequestDeadline.class);
Duration remaining = deadline.remaining();
if (remaining.isZero()) {
return Mono.error(new TimeoutException(
"Request deadline expired before inventory lookup"));
}
return inventoryWebClient.get()
.uri("/inventory/{sku}", sku)
.header("X-Request-Deadline-Ms", Long.toString(deadline.toEpochMillis()))
.header("X-Request-Timeout-Ms", Long.toString(remaining.toMillis()))
.retrieve()
.bodyToMono(InventoryDto.class)
.timeout(remaining);
});
}
}The timeout operator protects the reactive chain from waiting past the request deadline. Cancellation affects the local subscriber first. The caller no longer waits for the response, and the client can close or release resources based on the HTTP client state. The downstream service can continue briefly unless it notices the closed connection, reads the propagated deadline, or has its own timeout checks.
Network-level timeouts still belong beside the reactive deadline. Connection timeout protects the connection phase, while response timeout protects the HTTP exchange at the client library layer. The reactive timeout call protects the caller’s remaining request window. These limits cover different parts of the flow, so we treat them as layers rather than duplicates.
Reactor context also lets deeper reactive calls read the deadline without passing it through every method parameter. At the request entry point, we can attach the deadline to the context before the chain reaches the service layer. Downstream methods can then read it with deferContextual at the moment the call starts, which gives them the latest remaining time instead of an old duration.
Conclusion
Deadline propagation keeps request timing tied to the full request instead of letting every downstream call start with a fresh wait window. We start with one budget, turn it into an expiration time, carry that value through Spring Boot filters, client interceptors, RestClient calls, and WebClient chains, then check the remaining time before more processing begins. That flow keeps slow downstream calls from quietly consuming the whole request window, while connect timeouts, read timeouts, and reactive timeout calls still protect the individual waits inside the larger request.


