Heavy-Light Decomposition in Java

Jun 11, 2026

Tree queries get expensive when every request walks from node to node across a long route. Heavy-light decomposition gives that route an array-based layout instead. The tree is rooted, each node chooses one heavy child, and those heavy chains receive nearby positions in an array. From there, one route between two nodes breaks into a small set of array ranges. With a segment tree over those positions, path sums, path maximums, and other associative queries run in O(log^2 n) time after O(n) preprocessing.

Tree Paths as Chains

Generally speaking, tree routes become much easier to reason about after we choose a root. The root gives every other node a parent, a depth, and a subtree below it. From there, we mark one child edge from a node as heavy and treat the remaining child edges as light. Following heavy edges gives us downward chains that group large branches before we assign any array positions later.

The main benefit comes from how those chains are chosen. We keep the largest child branch attached to its parent through a heavy edge, while smaller child branches start new chains after a light edge. That split gives us long heavy runs through the tree and keeps the number of light crossings small.

Heavy Child Choice

We choose the child with the largest subtree as the heavy child for its parent. Subtree size means the number of nodes below a node, including that node itself. If node 3 has child branches rooted at 4, 8, and 11, we compare the sizes of those three branches. The child branch with the largest size gets the heavy edge from 3.

Only one child can be heavy for a node. If two children have the same subtree size, either child can be selected and the time bound still holds. We do not need extra tie logic beyond making a consistent choice in code, because the selected child still belongs to the group of largest children.

We can compute the subtree sizes with a first DFS, then store the largest child while each recursive call returns:

private int dfsSizes(int node, int parentNode) {
    parent[node] = parentNode;
    subtreeSize[node] = 1;
    heavyChild[node] = -1;

    int largestSize = 0;

    for (int next : graph.get(node)) {
        if (next == parentNode) {
            continue;
        }

        depth[next] = depth[node] + 1;
        int childSize = dfsSizes(next, node);
        subtreeSize[node] += childSize;

        if (childSize > largestSize) {
            largestSize = childSize;
            heavyChild[node] = next;
        }
    }

    return subtreeSize[node];
}

We visit neighbors through the adjacency list, and the parent check prevents the traversal from moving back toward the node we came from. Leaf nodes return a subtree size of 1, while internal nodes add the returned sizes from their children. After every child has returned, subtreeSize[node] contains the full branch size below that node.

The value -1 marks a node with no heavy child. That happens at leaves, and it also tells the later chain-building pass where a heavy chain stops. If a node has exactly one child, that child becomes the heavy child because it is the largest available child branch.

The comparison uses > rather than >=. With >, the first child found with the largest size remains selected during a tie. With >=, the later tied child would replace it. Either choice is valid, so long as we select only one of the largest children.

For small tests, we can add a tiny helper that reads the selected edge in the parent-to-child direction:

private boolean isHeavyEdge(int from, int to) {
    return heavyChild[from] == to;
}

We do not need this helper for the final query logic, but it can make test output easier to read while building the decomposition. If isHeavyEdge(3, 8) returns true, then 8 is the selected heavy child of 3. If it returns false, that edge is either light or not the parent-to-child direction in the rooted tree.

After we select heavy children, the chains are already defined by the heavyChild array. Starting at any node and repeatedly following heavyChild[node] walks down a single heavy chain until the next value becomes -1. Child edges that were not selected as heavy become the light edges that connect separate chains.

Light Edge Count

The light edges give heavy-light decomposition its logarithmic bound. Moving from a node to a light child means we are entering a child branch that was not the largest branch. Because the heavy child has the largest subtree, every light child has a subtree size no greater than half of the current node’s subtree size.

If a light child had more than half of the parent’s subtree, no sibling branch could be larger than it. That child would have been selected as heavy. So every light move drops into a branch that is at most half as large as the branch we just left.

That halving fact limits how far light edges can stack up on a route from the root to any node. Starting with n nodes, the branch size can only be cut in half O(log n) times before it reaches 1. Heavy edges may continue for a long distance, but they remain within the same chain, so they do not increase the number of chain changes.

We can check the idea on a built tree by counting light edges while moving from a node back toward the root:

private int countLightEdgesToRoot(int node) {
    int count = 0;

    while (parent[node] != -1) {
        int p = parent[node];

        if (heavyChild[p] != node) {
            count++;
        }

        node = p;
    }

    return count;
}

We move upward through parent, then inspect the parent’s selected heavy child. If the current node is not that selected child, the downward edge from the parent to the current node is light, so we add to the count.

This count is not needed by the query code, but it matches the reason the decomposition is fast. We can think of a route between two nodes as climbing toward a shared ancestor, then continuing toward the other node. Every time we leave the top of a chain and move to the parent side, we cross a light edge. Because a root-to-node route can cross only O(log n) light edges, the number of chain segments stays within O(log n) too.

Heavy edges do not shrink the subtree size in the same way, and they do not need to. We can group heavy edges into chains so a later range query can cover an entire chain segment at a time. Light edges control how often we have to move between chains, while heavy edges give those chains their length.

The preprocessing that picks heavy children still takes linear time. The DFS touches each tree edge a constant number of times, so the subtree and heavy-child pass costs O(n). The arrays for parent, depth, subtreeSize, and heavyChild each use O(n) space.

Java Range Mapping

With heavy chains already selected, the next step is to give tree nodes positions in a flat array. The original node labels still exist, but the decomposition adds a second label for range queries. That second label is the node’s array position. We place nodes from the same heavy chain next to each other. When two nodes belong to the same chain, the route between them turns into a normal interval in the base array. The segment tree can then read that interval without walking across tree edges one at a time.

Position Numbers

We assign positions during a second DFS after the subtree sizes and heavy children are already known. The traversal starts at a chain head, records the current node, then follows the heavy child before it visits light children. That heavy-first order keeps one heavy chain packed into consecutive indexes.

Every node gets two stored values. The head array stores the top node of the chain that contains the current node. The position array stores the node’s index inside the flat base array. If two nodes have the same head, they belong to the same chain. The heavy child keeps the same chain head as its parent. Light children start new chains, so each light child becomes its own chain head. That difference gives the later query code a fast way to tell when two nodes are still in separate chains.

The second DFS can be written like this:

private void decompose(int node, int chainHead, int[] values) {
    head[node] = chainHead;
    position[node] = currentPosition;
    base[currentPosition] = values[node];
    currentPosition++;

    if (heavyChild[node] != -1) {
        decompose(heavyChild[node], chainHead, values);
    }

    for (int next : graph.get(node)) {
        if (next != parent[node] && next != heavyChild[node]) {
            decompose(next, next, values);
        }
    }
}

We store the current node at the array slot held by currentPosition. The value from values[node] moves into base[currentPosition], so the segment tree later reads node values in decomposition order rather than original node-label order.

The recursive call for heavyChild[node] keeps the same chainHead. That is the part that keeps a heavy chain continuous in the base array. The loop afterward visits light children, and every light child starts a fresh chain with itself as the head.

For node values, the mapping is direct. Node v stores its value at base[position[v]]. For edge values, the usual mapping stores the value of the edge from parent[v] to v at base[position[v]], because v is the deeper endpoint of that edge. The root has no parent edge, so the root’s edge value is normally 0.

Segment Tree Storage

The segment tree reads the base array created by the decomposition. It does not need parent links, depths, chain heads, or heavy children. It receives an array and answers range requests over indexes.

For route sums with node values, every range answer is a sum over base[left] through base[right]. The same structure can handle maximum, minimum, greatest common divisor, or bitwise operations when we change the merge operation and the neutral return value.

The build step below stores sums:

private void build(int index, int left, int right, int[] base) {
    if (left == right) {
        tree[index] = base[left];
        return;
    }

    int mid = left + (right - left) / 2;

    build(index * 2, left, mid, base);
    build(index * 2 + 1, mid + 1, right, base);

    tree[index] = tree[index * 2] + tree[index * 2 + 1];
}

The leaf case stores a single base-array value. Internal nodes store the sum from their two child ranges, so a larger query interval can be answered by combining smaller covered intervals.

Point updates follow the same layout. When a node value changes, we do not search by the original node label inside the segment tree. We update position[node], because that is where the node was placed in the base array.

public void updateNodeValue(int node, int newValue) {
    update(1, 0, n - 1, position[node], newValue);
}

private void update(int index, int left, int right, int target, int newValue) {
    if (left == right) {
        tree[index] = newValue;
        return;
    }

    int mid = left + (right - left) / 2;

    if (target <= mid) {
        update(index * 2, left, mid, target, newValue);
    } else {
        update(index * 2 + 1, mid + 1, right, target, newValue);
    }

    tree[index] = tree[index * 2] + tree[index * 2 + 1];
}

Updating a single node costs O(log n) because we move down one branch of the segment tree and recompute stored sums on the return. Building the segment tree costs O(n), and the segment tree array needs O(n) space.

The storage rules stay consistent as long as the same value mapping is applied across build, query, and update logic. Node-value queries include both endpoint nodes. Edge-value queries usually skip the lowest shared ancestor’s position in the final same-chain range, because that position represents the edge entering the ancestor rather than an edge below it.

Route Query Flow

The route query moves between chains, not individual nodes. We compare the chain heads of the two current nodes. If both nodes have the same head, the remaining route is inside a single chain, so one range query finishes it. If the heads differ, we process the deeper chain first and move that side upward.

Depth is compared at the chain-head level, not at the node level. The deeper chain head belongs to the chain farther from the root, so the segment from that head down to the current node is part of the route we need.

The main loop for a node-value sum query looks like this:

public long queryRoute(int a, int b) {
    long result = 0L;

    while (head[a] != head[b]) {
        if (depth[head[a]] < depth[head[b]]) {
            int temp = a;
            a = b;
            b = temp;
        }

        int chainTop = head[a];
        result += querySegmentTree(position[chainTop], position[a]);
        a = parent[chainTop];
    }

    int left = Math.min(position[a], position[b]);
    int right = Math.max(position[a], position[b]);

    result += querySegmentTree(left, right);
    return result;
}

We swap a and b when b is in the deeper chain. That keeps the loop on the side that needs to move upward first. The range from position[chainTop] to position[a] covers the current chain segment from its top down to the current node.

After that range is counted, a = parent[chainTop] moves to the node above the current chain. That jump crosses from the current chain to the parent side through a light edge. The number of those jumps is bounded by O(log n), so the loop repeats a logarithmic number of times before both nodes share a chain.

The final range needs the smaller and larger positions because the two nodes can appear in either order within the same chain.

int left = Math.min(position[a], position[b]);
int right = Math.max(position[a], position[b]);

long lastPart = querySegmentTree(left, right);

For node values, that range includes both endpoint nodes. For edge values, the final same-chain range is adjusted so the ancestor position is not counted as an incoming edge.

int left = Math.min(position[a], position[b]) + 1;
int right = Math.max(position[a], position[b]);

if (left <= right) {
    result += querySegmentTree(left, right);
}

That edge-value version belongs to the final same-chain case after the two sides have reached the same chain. The + 1 skips the higher node’s position, because the edge stored there belongs to the edge above that node.

The time cost for a full route query is O(log^2 n). The decomposition creates O(log n) chain ranges, and the segment tree answers each range in O(log n).

Practical Trace

Take a tree rooted at 0, with a heavy chain that goes from 0 to 2 to 5. During decomposition, those nodes receive consecutive positions because we follow the heavy child before visiting light children. A lighter branch from node 0, such as the branch rooted at 1, starts a different chain with its own head.

Now we can read a query from a node in the branch under 1 to node 5. The two nodes do not start in the same chain. We first take the chain whose head is deeper, query from that chain head down to the current node, then move to the parent of that chain head. After that jump, the remaining route may be in the chain that contains 0, 2, and 5.

The decomposition arrays give us enough information to trace that movement without walking every tree node. This small helper prints the current node, its chain head, and its array position:

private void printNodeState(int node) {
    System.out.println(
        "node=" + node
            + ", head=" + head[node]
            + ", position=" + position[node]
    );
}

Running that helper on the two query endpoints helps verify the chain placement during testing. Nodes with the same head belong to the same chain, and their position values define the range inside the base array.

During a query, no range should cross from one chain into a different chain. Each segment tree request belongs to one chain segment. We count one chain piece, jump to the parent side, and repeat until the last remaining piece lies inside a shared chain.

We can print the chain pieces before connecting the query to the segment tree:

private void printRoutePieces(int a, int b) {
    while (head[a] != head[b]) {
        if (depth[head[a]] < depth[head[b]]) {
            int temp = a;
            a = b;
            b = temp;
        }

        int chainTop = head[a];

        System.out.println(
            position[chainTop] + " to " + position[a]
        );

        a = parent[chainTop];
    }

    int left = Math.min(position[a], position[b]);
    int right = Math.max(position[a], position[b]);

    System.out.println(left + " to " + right);
}

The printed ranges are the same ranges a segment tree would receive for a node-value route query. Each range belongs to the flat array, while still representing a connected piece of the original tree route.

For preprocessing, the second DFS that assigns positions costs O(n), because every node receives exactly one array position. The base array needs O(n) space. Combined with the segment tree, the range-mapping side remains linear in space, while each route query runs in O(log^2 n).

Conclusion

Heavy-light decomposition gets its speed from two connected steps. We first root the tree, measure subtree sizes, and choose the largest child branch as the heavy edge from each node. Those heavy edges form chains, while light edges limit how many chain changes a route can have. Then we place chain nodes into array positions, build a segment tree over that array, and turn each route query into a small set of range queries. The preprocessing costs O(n), updates cost O(log n), route queries cost O(log^2 n), and the extra storage stays at O(n).

Share Alexander Obregon's Substack

Thanks for reading! If you found this helpful, highlighting, clapping, or leaving a comment really helps me out.

Alexander Obregon's Substack

Discussion about this post

Ready for more?