Reliable event delivery gets hard as soon as a service has to save database data and send that same change to other services. If the database commit finishes but the broker publish never happens, downstream services never receive the event. If the message is sent first and the transaction rolls back, those services react to something that never became true in the database. The outbox flow closes that gap by writing the business row and the outbox row inside one local transaction, then leaving message delivery to a later publisher that can retry after a crash, restart, or network failure. That keeps the database change and the outbound event tied to the same committed transaction without pulling in two-phase commit.
Why the Outbox Flow Exists
The hard part starts when one service has to commit its own database change and also record that an outbound event still needs to be delivered. Those two actions are tied to the same business moment, but they do not happen in the same place. The database transaction handles local state, while message delivery reaches outside the service and can fail later for reasons the database cannot control. The outbox flow exists to bridge that gap by turning the outbound event into committed database data first, so the service has a durable record of what still needs to be published.
Local Transaction Writes
Most Spring applications place this boundary on a public service method marked with @Transactional. Spring applies declarative transactions through a proxy, so code that enters through that proxy runs inside the transaction, but a call from one method to another method on the same object does not cross that boundary. For that reason, the business insert and the outbox insert belong in the same service method, or in separate beans reached through Spring-managed calls.
At the database level, the outbox row is just an extra table row. Typical columns hold a stable event id, aggregate type, aggregate id, event name, payload, status, creation time, retry timing, claim data, and attempt count. That row records a debt to publish a message, and it records that debt inside the same commit that saved the business change. If the transaction rolls back, both rows disappear. If the transaction commits, both rows remain. That transaction boundary is the heart of the flow.
Seeing the write side in code helps make that boundary more tangible:
package com.example.accounts;
import java.time.Instant;
import java.util.UUID;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
@Service
public class AccountApplicationService {
private final AccountRepository accountRepository;
private final OutboxMessageRepository outboxMessageRepository;
private final OutboxJson outboxJson;
public AccountApplicationService(
AccountRepository accountRepository,
OutboxMessageRepository outboxMessageRepository,
OutboxJson outboxJson) {
this.accountRepository = accountRepository;
this.outboxMessageRepository = outboxMessageRepository;
this.outboxJson = outboxJson;
}
@Transactional
public UUID openAccount(String email) {
UUID accountId = UUID.randomUUID();
Account account = new Account(
accountId,
email,
Instant.now());
accountRepository.save(account);
AccountOpenedEvent event = new AccountOpenedEvent(
UUID.randomUUID(),
accountId,
email,
Instant.now());
OutboxMessage message = OutboxMessage.newMessage(
event.eventId(),
"Account",
accountId.toString(),
"account.opened",
outboxJson.write(event),
Instant.now());
outboxMessageRepository.save(message);
return accountId;
}
}What gives that method value is not the Java syntax. The point is that Account and OutboxMessage live inside the same transaction boundary. There is no broker call in this method, no attempt to contact Kafka or RabbitMQ from the request thread, and no stage where the service commits business data but leaves no durable outbound record behind. The database either accepts both inserts or rejects both.
Keeping the row itself plain also helps:
package com.example.accounts;
import java.time.Instant;
import java.util.UUID;
import jakarta.persistence.Column;
import jakarta.persistence.Entity;
import jakarta.persistence.EnumType;
import jakarta.persistence.Enumerated;
import jakarta.persistence.Id;
import jakarta.persistence.Table;
@Entity
@Table(name = "outbox_message")
public class OutboxMessage {
@Id
private UUID id;
@Column(nullable = false, unique = true)
private UUID eventId;
@Column(nullable = false)
private String aggregateType;
@Column(nullable = false)
private String aggregateId;
@Column(nullable = false)
private String eventType;
@Column(nullable = false, length = 10000)
private String payload;
@Enumerated(EnumType.STRING)
@Column(nullable = false)
private OutboxStatus status;
@Column(nullable = false)
private Instant createdAt;
@Column(nullable = false)
private Instant nextAttemptAt;
private String claimedBy;
private Instant claimedUntil;
@Column(nullable = false)
private int attemptCount;
protected OutboxMessage() {
}
public static OutboxMessage newMessage(
UUID eventId,
String aggregateType,
String aggregateId,
String eventType,
String payload,
Instant createdAt) {
OutboxMessage message = new OutboxMessage();
message.id = UUID.randomUUID();
message.eventId = eventId;
message.aggregateType = aggregateType;
message.aggregateId = aggregateId;
message.eventType = eventType;
message.payload = payload;
message.status = OutboxStatus.NEW;
message.createdAt = createdAt;
message.nextAttemptAt = createdAt;
message.attemptCount = 0;
return message;
}
public UUID getEventId() {
return eventId;
}
public String getEventType() {
return eventType;
}
public String getPayload() {
return payload;
}
public int getAttemptCount() {
return attemptCount;
}
}Fields such as eventId, aggregateId, and status are there because later delivery needs stable data that can be read back from the database. The event id gives downstream consumers something durable to track. The aggregate fields point back to the business change that created the row. The status tells the service which rows are fresh and which have already moved past the first stage.
Spring also has @TransactionalEventListener, and its default phase is tied to commit completion. That can fit in-process follow-up logic after a successful commit, but by itself it does not create a durable outbox row that remains in the database after a crash between commit and outbound delivery. That gap is a major reason the outbox table exists.
Partial Failure Timeline
Several failure branches explain why this flow exists. If a transaction dies before commit, neither the business row nor the outbox row becomes visible to other transactions. If failure comes after commit but before any broker send, the committed outbox row is still there for later delivery. Third, a branch appears after the broker accepts the message but before the service records that outbound step in the database. That can lead to repeat delivery later, which is why duplicate-safe consumers belong in the larger story.
Small failure path code can help trace the first branch:
package com.example.billing;
import java.math.BigDecimal;
import java.time.Instant;
import java.util.UUID;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
@Service
public class InvoiceApplicationService {
private final InvoiceRepository invoiceRepository;
private final OutboxMessageRepository outboxMessageRepository;
public InvoiceApplicationService(
InvoiceRepository invoiceRepository,
OutboxMessageRepository outboxMessageRepository) {
this.invoiceRepository = invoiceRepository;
this.outboxMessageRepository = outboxMessageRepository;
}
@Transactional
public void createInvoiceThenFail(BigDecimal total) {
UUID invoiceId = UUID.randomUUID();
invoiceRepository.save(new Invoice(invoiceId, total, Instant.now()));
outboxMessageRepository.save(
OutboxMessage.newMessage(
UUID.randomUUID(),
"Invoice",
invoiceId.toString(),
"invoice.created",
"{\"invoiceId\":\"" + invoiceId + "\",\"total\":\"" + total + "\"}",
Instant.now()));
throw new IllegalStateException("Stopping before commit");
}
}If that method is called and the exception reaches the transaction boundary, Spring marks the transaction for rollback and neither insert remains in the database. That outcome is exactly what the flow needs. The service does not keep an invoice row with no matching outbound record, and it does not keep an outbox row for an invoice that never committed. Both writes rise or fall with the same transaction boundary.
This timeline also explains why a plain after-commit callback does not fully solve the problem. Code that runs only after commit is already past the database transaction. If the process stops in the narrow gap between commit completion and outbound delivery, a callback by itself leaves no table row behind for recovery. The outbox row closes that gap by turning the outbound message into committed data first, then handing later delivery to a separate stage.
Subtle detail hides inside these failure branches. The outbox flow is not trying to make broker delivery and database commit happen as one giant atomic action across two different systems. The database transaction covers only the local write side. After that point, the outbox row acts as a durable record of pending delivery. The service can crash, restart, reconnect, and still find which outbound messages need attention because the database already recorded them as committed rows.
Publishing From the Outbox Table
After the transaction commits, the outbox row still has to leave the database and reach the broker. That second stage needs its own flow because the request that created the row is already finished. Spring scheduling fits well here after scheduling support is turned on with @EnableScheduling in a configuration class or the Spring Boot application class. Relay methods marked with @Scheduled can wake up at short intervals, scan the table, and hand pending rows to publishing code without tying that delivery step to a user request.
Row Pickup
Committed rows do nothing until a relay finds them. Short repeating polls are usually better than a huge pass that grabs every pending row in sight. Smaller batches reduce database pressure, keep lock time lower, and let the relay come back quickly for newly committed rows. That relay is detached from any user request, which is why a scheduled method fits so naturally. More than one service instance changes the pickup problem right away. Without a claim step, two nodes can read the same pending row and both try to publish it. Most outbox tables handle that by moving a row from NEW to IN_PROGRESS and storing a lease deadline. That lease gives a single node temporary ownership. If the node dies during delivery, the lease can expire and the row can be claimed again.
The relay can keep its job narrow by fetching a small batch of ids and passing them to the dispatch step:
package com.example.outbox;
import java.time.Instant;
import java.util.List;
import java.util.UUID;
import org.springframework.data.domain.PageRequest;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
@Component
public class OutboxRelay {
private final OutboxMessageRepository repository;
private final OutboxDispatchService dispatchService;
public OutboxRelay(
OutboxMessageRepository repository,
OutboxDispatchService dispatchService) {
this.repository = repository;
this.dispatchService = dispatchService;
}
@Scheduled(fixedDelayString = "PT2S")
public void relayBatch() {
Instant now = Instant.now();
List<UUID> ids = repository.findReadyIds(now, PageRequest.of(0, 25));
for (UUID id : ids) {
dispatchService.dispatchSingleMessage(id, now);
}
}
}Its job stays intentionally narrow, the method asks the table for a limited batch and hands each id to a dispatch service. Narrow scheduled methods are easier to read, and they make later transaction boundaries less tangled.
The database should decide row ownership through a conditional update, so only one relay instance can claim it at a time:
package com.example.outbox;
import java.time.Instant;
import java.util.UUID;
import org.springframework.data.domain.Pageable;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Modifying;
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.repository.query.Param;
public interface OutboxMessageRepository extends JpaRepository<OutboxMessage, UUID> {
@Query("""
select m.id
from OutboxMessage m
where m.nextAttemptAt <= :now
and (
m.status = com.example.outbox.OutboxStatus.NEW
or (
m.status = com.example.outbox.OutboxStatus.IN_PROGRESS
and m.claimedUntil <= :now
)
)
order by m.createdAt asc
""")
java.util.List<UUID> findReadyIds(@Param("now") Instant now, Pageable pageable);
@Modifying
@Query("""
update OutboxMessage m
set m.status = com.example.outbox.OutboxStatus.IN_PROGRESS,
m.claimedBy = :claimedBy,
m.claimedUntil = :claimedUntil
where m.id = :id
and m.nextAttemptAt <= :now
and (
m.status = com.example.outbox.OutboxStatus.NEW
or (
m.status = com.example.outbox.OutboxStatus.IN_PROGRESS
and m.claimedUntil <= :now
)
)
""")
int claim(
@Param("id") UUID id,
@Param("claimedBy") String claimedBy,
@Param("now") Instant now,
@Param("claimedUntil") Instant claimedUntil);
}That return value tells the relay if it truly owns the row. 1 means the state changed for this caller. 0 means a different node claimed it first or the row no longer matched the pickup rules. Database-side claiming closes a race where two nodes read the same row and both think they can move forward.
Send Result Handling
Broker delivery belongs outside the transaction that created the business change. By the time the relay runs, the database write is already committed, so the relay should keep database transactions short and keep the network call to the broker outside those windows. TransactionTemplate fits well here because the relay can wrap claim, published marking, and failure recording in separate transaction callbacks.
Ordering is what makes this stage safe. Marking a row as published before the broker accepts the message creates silent loss. Letting the broker accept the message first and then failing to mark the row can lead to repeated delivery later. That second outcome is far better because downstream handlers can guard against duplicates, while silent loss is much harder to recover from.
Dispatch code can keep the database state changes brief while leaving the broker call in the middle:
package com.example.outbox;
import java.time.Instant;
import java.util.UUID;
import org.springframework.stereotype.Service;
import org.springframework.transaction.PlatformTransactionManager;
import org.springframework.transaction.support.TransactionTemplate;
@Service
public class OutboxDispatchService {
private final OutboxMessageRepository repository;
private final BrokerPublisher brokerPublisher;
private final TransactionTemplate tx;
public OutboxDispatchService(
OutboxMessageRepository repository,
BrokerPublisher brokerPublisher,
PlatformTransactionManager transactionManager) {
this.repository = repository;
this.brokerPublisher = brokerPublisher;
this.tx = new TransactionTemplate(transactionManager);
}
public void dispatchSingleMessage(UUID id, Instant now) {
boolean claimed = Boolean.TRUE.equals(tx.execute(status ->
repository.claim(id, "relay-1", now, now.plusSeconds(30)) == 1));
if (!claimed) {
return;
}
OutboxMessage message = repository.findById(id).orElseThrow();
try {
brokerPublisher.publish(message.getEventType(), message.getPayload(), message.getEventId());
tx.executeWithoutResult(status ->
repository.markPublished(id, Instant.now()));
} catch (Exception ex) {
tx.executeWithoutResult(status ->
repository.recordFailure(
id,
message.getAttemptCount() + 1,
Instant.now(),
abbreviate(ex.getMessage())));
}
}
private String abbreviate(String value) {
if (value == null || value.length() <= 500) {
return value;
}
return value.substring(0, 500);
}
}Claiming, published marking, and failure recording all stay inside short transactions. The broker call sits between those state changes, so a slow network trip does not leave database resources open for the full delivery attempt.
Retry Windows
Failed sends should not spin in a hot loop. Rows that cannot be delivered right now need a future attempt time, a higher attempt count, and enough failure detail for later diagnosis. Keeping that data on the row lets the relay stop, restart, or move to a different node without losing track of what should happen next.
Backoff belongs to that retry state. Early failures can retry quickly, while later failures should wait longer so the service does not hammer a broker that is down or only partly reachable. The exact spacing depends on traffic volume and delivery expectations, but the rule stays the same. Retry timing belongs in row data, not in local memory inside the relay. That way the table still tells the truth after a crash or deployment restart.
Rows also need a landing place after repeated failure. Some services keep trying for a long time. Others move the row into FAILED after a fixed attempt limit and surface that state for review. That stops a permanently broken message from blocking newer rows forever. A failed row still carries value because it preserves the payload, event id, broker error, and attempt count.
Duplicate Safety
At-least-once delivery means the same event can arrive more than a single time. That happens when the broker accepts the message and the relay dies before the outbox row is marked as published. After restart, the row still looks unfinished, so the relay tries again. That behavior is normal for this flow, which means downstream consumers need their own duplicate guard.
Stable event ids are the usual answer. The producer writes the event id into the outbox row when the business transaction commits, then sends that same id with the broker message. Consumers can try to insert that id into a table with a unique constraint before applying their own state change. If the insert succeeds, the event has not been handled yet. If the insert fails on the unique constraint, the consumer already saw that message and can stop before repeating the side effect.
Consumer-side duplicate guards can stay short and local:
package com.example.shipping;
import java.time.Instant;
import java.util.UUID;
import org.springframework.dao.DataIntegrityViolationException;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
@Service
public class ShipmentEventHandler {
private final ProcessedEventRepository processedEventRepository;
private final ShipmentRepository shipmentRepository;
public ShipmentEventHandler(
ProcessedEventRepository processedEventRepository,
ShipmentRepository shipmentRepository) {
this.processedEventRepository = processedEventRepository;
this.shipmentRepository = shipmentRepository;
}
@Transactional
public void handleOrderPlaced(UUID eventId, UUID orderId) {
try {
processedEventRepository.saveAndFlush(new ProcessedEvent(eventId, Instant.now()));
} catch (DataIntegrityViolationException ex) {
return;
}
shipmentRepository.createPendingShipment(orderId, Instant.now());
}
}That handler records the event id first inside its own local transaction. If a repeated message arrives later with the same id, the insert fails and the method returns before creating a second shipment row. Some consumers get the same protection from a natural business constraint or an upsert-style write, but the principle stays the same. Repeated delivery is part of the deal, so the receiving side has to treat message handling as repeatable.
Conclusion
The outbox flow keeps one hard problem manageable by splitting it into two committed steps. First, the service writes its business change and its outbox row in the same local transaction, so the database ends up with a durable record of what still needs to be sent. After that, a relay reads pending rows, claims them, sends them to the broker, records success or failure, and retries later when delivery does not finish. That flow keeps database state, outbound events, and recovery after partial failure moving through the same stored record instead of relying on timing or luck.


