Avoiding Cartesian Products by Fixing Broken JOINs in SQL

Aug 20, 2025

Accidental Cartesian products are one of the most common mistakes that turn a simple query into something that drags down performance. They happen when a JOIN between two or more tables is missing the condition that tells the database how rows should match. Without that condition, the database pairs every row from one table with every row from the other, creating a result set that explodes in size. A query that should return a few dozen rows can suddenly return millions, which slows things down and produces results that don’t make sense.

How Cartesian Products Happen

When queries produce far more rows than expected, the root cause often comes from how joins are handled. A join tells the database how rows from one table should connect to rows from another, but if that instruction is missing or incorrect, the database falls back on combining everything with everything. This is how a Cartesian product happens, and it explains why results can suddenly explode in size.

The Mechanics of a JOIN

A join is a directive to the database that says, “take these two sets of rows and line them up based on some condition.” That condition, usually written in the ON clause, acts as the bridge between the tables. If the bridge is properly defined, rows that relate to each other get paired up, and irrelevant ones are excluded.

Take a basic case where you have a table of customers and a table of invoices. A customer can have many invoices, and each invoice belongs to one customer. The natural link is the customer_id field.

This query produces only the rows where a customer’s id matches the customer_id in the invoices table. If a customer has three invoices, you’ll see three rows for that customer. If another customer has none, they won’t appear in this inner join at all.

Another example helps reinforce how the join condition acts as the glue. Suppose you join employees with departments:

The result set connects people to the department they belong to, not to every department in the company. Without the join condition, things would go very differently.

What Happens When the Condition Is Missing

When no condition is specified, the database produces a Cartesian product. This means every row in the first table is paired with every row in the second. The database is following instructions literally. It was told to combine the tables, and without guidance on how, it produces all possible combinations.

Let’s go back to the customers and invoices tables, but this time omit the condition:

If you had 200 customers and 2,000 invoices, this query would generate 400,000 rows. Instead of seeing only the invoices that belong to each customer, every customer gets paired with every invoice, which makes the data meaningless and the query painfully slow.

Alex’s record might be listed with every invoice in the system, not just the three she actually owns. Ben’s record would be listed with those same invoices as well, giving the false impression that everyone is tied to everything. A Cartesian product happens when tables are combined without any condition linking them. That can occur with a comma-separated FROM list that has no filtering in the WHERE clause, or with an explicit CROSS JOIN. Inner joins and left joins, on the other hand, need an ON or USING clause, and leaving it out will trigger an error in most database systems.

Why the Result Grows So Fast

The explosion in row count comes from simple multiplication. Every row in one table pairs with every row in the other. If table A has 1,000 rows and table B has 5,000 rows, the result has 5 million rows. Add a third table with 10,000 rows, and the output now balloons to 50 billion rows.

This is not an abstract risk, it can happen in real systems where tables hold millions of rows. Queries that should finish in milliseconds suddenly run for minutes or longer, consuming CPU and memory, and sometimes never finishing at all.

A short example makes the math clear. Take two small tables:

A proper join on a link field doesn’t exist here, but let’s say someone writes:

This result set is 9 rows because 3 × 3 = 9. Each user is matched with every role. If users were in the thousands and roles in the hundreds, the multiplication effect would be huge.

The bigger the inputs, the worse the outcome. Imagine a case where a sales table has 1 million rows and someone joins it to a products table of 5,000 rows with no condition. That’s 5 billion combinations, which explains why queries like this often grind a database server to a halt. It’s why missing join conditions cause such dramatic blowups, and why catching and fixing them early is so important before a query ever reaches production workloads.

How to Spot and Fix Broken JOINs

Queries that create Cartesian products are often caught too late, after they’ve already slowed down reports or applications. Spotting the early signs and fixing the underlying joins keeps queries both correct and efficient.

Spotting the Signs of a Cartesian Product

The easiest clue that a query is broken comes from the row count. If a dataset that should only be a few hundred rows suddenly returns hundreds of thousands, a missing join condition is usually the cause. Another giveaway is repetition. A single row from one table appears over and over, paired with unrelated rows from the other table.

Take a small example. Imagine two tables, students with 50 rows and classes with 10 rows. A join without a condition looks like this:

The result isn’t 50 or 10 rows but 500. Every student appears paired with every class, no matter what they’re actually enrolled in. If you see the same student listed with courses they never registered for, that’s the signal.

Tools like EXPLAIN help here. If the plan shows a CROSS JOIN or a join between relations with no predicate, that confirms there’s no join condition between those tables in the plan. Many people use this to track down queries that balloon unexpectedly.

Correcting the Join Condition

The cure is to write a condition that ties the two tables together based on their real relationship. If students are linked to classes through a table called enrollments, the join condition should reflect that.

Now the output only shows the classes a student is actually enrolled in. Instead of multiplying everything together, the query follows the relationship path through the linking table. Sometimes the fix is as simple as adding back the condition that was accidentally left out. In the earlier customers and invoices case, the repair was just ON c.id = i.customer_id. Other times, it requires being precise about which fields actually connect the tables.

Another point to think about is that join conditions don’t always have to use equality. In certain cases, like matching ranges or effective dates, other operators may be valid. What matters is that the condition reflects the true logic of how the rows relate.

Spotting Typos in Join Fields

Not all broken joins come from missing conditions. Some come from pointing to the wrong column. A typo can be enough to cause results that look valid at a glance but are completely wrong.

Take this query:

The join is written, but it connects emp_id to dept_id, which makes no logical sense. It produces rows, but the matches are meaningless. The correct join should use the dept_id from the employee record.

A subtle mistake that is known to happen is mixing up columns that share a similar name. A query that joins on orders.id = shipments.id instead of orders.id = shipments.order_id may not fail outright, but the matches will be wrong, and the row count will be far higher than expected. Careful use of aliases and checking table schemas can help prevent these errors.

Side by Side Example

A direct comparison makes the problem stand out. Suppose you have:

Broken join without a condition:

Output:

That’s six rows, while the correct answer is only three. With the proper join condition:

Correct output:

Seeing the results side by side makes it clear how a missing condition turns a meaningful result into a meaningless expansion, and that contrast is far easier to understand when it’s visualized.

Avoiding Hidden Cartesian Products with Multiple Joins

Problems compound as more tables are joined. Even when most conditions are correct, leaving out just one causes the same multiplication effect.

Take a query that joins three tables:

The first two joins are correct, but the teachers table is joined without any condition. The result multiplies every row from the student-course combination with every teacher, producing a flood of rows that don’t reflect reality.

The fix is straightforward when you know the right link. If each course has a teacher, the condition belongs on c.teacher_id = t.teacher_id.

This brings the results back to sanity. Each student appears with the classes they’re actually enrolled in, and each class is matched with its real teacher, not all teachers at once.

These kinds of oversights are common in complex reports or data pipelines, where several joins are stacked together. Checking each join one by one and confirming the condition lines up with the schema prevents the runaway effect before it turns into a production issue.

Conclusion

Cartesian products are the result of mechanics, not chance. When a join condition is missing or written incorrectly, the database engine carries out the instructions exactly as given and pairs every row from one table with every row from the other. That’s why queries swell in size so quickly and performance drops. Spotting unusual row counts, repeated data, or cross join steps in an execution plan are all clues that a condition is broken. The fix comes from tying tables together with the right fields so the result reflects real relationships rather than mathematical multiplication. Keeping an eye on how joins are written is what keeps queries efficient and results meaningful.

Share Alexander Obregon's Substack

Alexander Obregon's Substack

Discussion about this post

Ready for more?