Building a Summary Table with GROUP BY and CASE

Turn raw rows into totals, buckets, and labeled reports

Jul 22, 2025

When you’re building reports or summaries from raw data, one of the most practical ways to shape the output is by combining GROUP BY with CASE. They work well together when you need to group values, apply custom labels, or break things into categories that make sense for what you’re trying to measure.

Grouping Rows and Categorizing Values

When data is scattered across hundreds or thousands of rows, it’s hard to make sense of it without some structure. Raw data on its own doesn’t tell you much unless you can pull patterns or summaries from it. That’s where GROUP BY starts to come into play. It lets you pull related rows together based on shared values, and from there, you can start calculating totals, counting rows, or shaping the output to match the story you want your data to tell. On its own, it’s already useful. But when paired with CASE, it opens up more ways to slice and label what those groups actually mean.

How GROUP BY Restructures a Result Set

GROUP BY takes a column, or set of columns, and treats the values in those columns as markers for how to group the rows. When the database engine processes your query, it loops through every row and builds sets based on those column values. Then it runs aggregate functions over each group separately.

Say you have a simple sales table with customer purchases:

You want to know how much each customer has spent in total. A GROUP BY on customer_id lets you collapse multiple purchases into one row per customer:

The engine collects all the rows where customer_id is the same, and then runs SUM(purchase_total) for each group. Without GROUP BY, the database has no reason to organize the rows that way. So this structure lets you see relationships in the data that wouldn’t be visible otherwise.

Now suppose you add a second column to the group:

Now the grouping is split not just by customer but also by location. So if a customer shopped at two locations, they’ll appear more than once in the output. The grouping becomes more specific, which gives you a more detailed breakdown.

GROUP BY doesn’t rearrange the original rows on disk or in memory, it controls how the output is shaped based on shared values. Each grouped set runs through the aggregate functions separately, and that’s what shows up in your result.

The Role of CASE in Defining Buckets

CASE is what gives you control over labeling and reclassifying values inside your query. It doesn’t group anything by itself, but it lets you decide how the data should be labeled before it’s grouped. You can think of it as a value translator. Instead of grouping by the raw value from a column, you group by the result of your logic.

Let’s say you’re looking at the same sales table, but you want to count how many purchases fall into custom price brackets. Something like “Small,” “Medium,” and “Large” purchases based on the purchase total.

You can do this with a CASE expression right inside your SELECT:

The CASE logic runs for every row, turning the raw number into a category. The database then groups those labeled rows and counts them. It doesn't store this label anywhere unless you insert it into a new table, but for the purpose of the query, it works like any other grouped value. Most engines (PostgreSQL, SQL Server, Oracle) require you to repeat the full CASE expression in the GROUP BY because aliases from the SELECT clause aren’t available yet. Some platforms like MySQL, BigQuery, DuckDB, and Trino do let you use the alias directly, so you can shorten the query to purchase_size in those cases. Writing out the full expression keeps the query portable across different systems.

Also, CASE doesn’t have to return text. You can make it return numbers or other values too. It just depends on what you’re trying to shape the result into.

Filtering Inside the Aggregates Instead of WHERE

There’s a difference between filtering rows before they get grouped and filtering rows inside the aggregate functions themselves. If you use a WHERE clause, it limits which rows go into the GROUP BY. But sometimes you want to count or sum different conditions side by side without excluding any rows. That’s where CASE fits inside aggregates.

Here’s a version that shows how many sales were “Small,” “Medium,” or “Large,” but all in one row instead of separate ones:

This runs all the conditions row by row. If the condition matches, a 1 is returned and counted. If it doesn’t, the result is NULL, and NULLs don’t get counted. That’s why you don’t see a need for ELSE in this case.

The value inside the COUNT doesn’t have to be 1. You can use SUM instead if you want to add up the totals in each category:

This totals the actual purchase amounts, broken into your own categories. You still get a single row back, but with more context across columns.

Combining GROUP BY with CASE lets you shape the output around your own categories while keeping all the conditions visible in the same result. It’s a fast way to build summaries that follow real business rules without needing to rework the data later.

Labeling and Reshaping for Custom Reports

When you’ve grouped the rows and added basic logic with CASE, you can start reshaping the output to match the structure of a real report. Raw aggregates help, but most reporting needs go a little further. You might want to split data by two dimensions at the same time, lay everything out in columns instead of rows, or show totals across different categories side by side. These tweaks aren’t extra features of SQL. They’re just different ways to arrange what the engine already knows how to do. The same tools still apply, but now you’re guiding them into a format that’s easier to scan and more useful to whoever is reading the result.

Mixing Multiple Groupings

When you need to show totals across more than one category, you can group by multiple columns. Instead of showing total sales by product, maybe you also want to break that down by sales region. GROUP BY lets you do that by stacking columns together. Each unique combination becomes its own group.

Take this inventory_log table as a starting point:

If you group by product_name, the totals combine across regions:

That gives one row per product. But to see how each product performs in each region, add the second column to the GROUP BY:

Now you get a row for every combination of product and region. That level of grouping is more useful when comparing categories across dimensions. If some combinations don’t exist in the data, they won’t show up in the result. So not every product will have the same number of rows unless all combinations are present.

This is how most real reports are built. The totals don’t always tell the full story unless you split them out across dimensions that matter to whoever’s reading the numbers.

Building Horizontal Summary Tables

Sometimes the vertical layout makes the report harder to scan. If you’re grouping values like delivery speed or sales category, you might want to shift those categories into their own columns. That way, you get just one row per item, and each column tells you how it performed in a specific category. This is where CASE inside aggregates comes back in. Let’s say you’re working with a ticket_sales table that logs ticket revenue by event. Each ticket has a price band that tells you whether it was “Standard,” “Premium,” or “VIP.” But instead of stacking those into rows, you want to lay them out horizontally across columns.

Now you get one row per event. The columns break down the revenue by ticket type. That makes it easier to compare across categories without needing to pivot the data later. Each CASE runs through the full set of rows, checks if the type matches, and adds up the revenue when it does.

This layout is useful in dashboards or reports that track financials, metrics by category, or any kind of labeled quantity across a fixed set of labels.

Using SUM Instead of COUNT

Most earlier examples use COUNT(*) to tally rows. But when your data includes numeric values that matter, SUM usually fits better. Instead of counting how many times something happened, you're measuring how much of it happened.

Now let’s go back to the orders table, but this time focus on how much was spent in different delivery windows. If you already have a column for total cost, you can group and label those amounts by time range.

This tells you how much revenue came from each type of delivery window, broken down by customer type. Instead of counting orders, you’re now adding up the money from those orders. The structure stays the same. Only the aggregate function changes, and now the totals reflect actual value instead of volume.

You can mix COUNT and SUM in the same query, too. That’s helpful when you want to show both how many orders were placed and how much they were worth, all grouped the same way.

Labeling NULLs to Avoid Blank Results

One thing that throws off reports more often than people expect is NULL values. If you group by a column that has any NULLs, those rows still get included, but the group label shows up blank. That can make it look like something was skipped or dropped when the rows are actually valid, just missing a value.

To clean that up, you can catch the NULLs with a CASE expression and turn them into a clearer label. It doesn’t change the data, just the way the group appears in the result.

Suppose you’re reviewing product returns and want to group them by return_reason, but sometimes that reason isn't filled out.

Now instead of seeing a blank label in your report, you get a named group that makes the missing data easier to spot. This makes the output easier to read and avoids confusion in shared reports.

You can apply this same fix to other fields, too. Anywhere blanks might appear, a CASE expression can help fill the gap with a label that makes things clearer.

Conclusion

GROUP BY and CASE work together by splitting rows into labeled groups and applying your logic as the data is processed. GROUP BY shapes the structure of the output based on repeated values, and CASE gives you a way to control how those values are named, counted, or summed. When the database runs your query, it walks through the rows, applies your conditions, and fills in the results based on how everything lines up. It doesn’t store anything, and it doesn’t look ahead. Everything happens row by row, group by group, using the logic you’ve written. That’s what makes this pairing so effective for building summary tables that show exactly what’s happening in the data.

Share Alexander Obregon's Substack

Alexander Obregon's Substack