The most general form of joining two tables is called the cross-join or, for the
more mathematically inclined, the Cartesian product of the two tables. As
discussed earlier in the section on dataflows, a cross-join results in an output
consisting of all columns from both tables and every combination of rows from
one table with rows from the other. The number of rows in the output grows
quickly as the two tables become bigger. If the first table has four rows and two
columns, and the second has three rows and two columns, then the resulting
output has twelve rows and four columns.
Because the number of rows in the output is the number of rows in each table
multiplied together, the output size grows quickly. If one table has 3,000 rows and
the other 4,000 rows, the result has 12,000,000 rows—which is a bit big for an
illustration. The number of potential columns is the sum of the number of
columns in each input table.
In the business world, tables often have thousands, or millions, or even more
rows, so a cross-join quickly gets out of hand, with even the fastest computers. If
this is the case, why are joins so useful, important, and practical?
The reason is that the general form of the join is not the form that gets used very
often, unless one of the tables is known to have only one row or a handful of rows.
By imposing some restrictions—say by imposing a relationship between columns
in the two tables—the result becomes more tractable. Even though more
specialized joins are more commonly used, the cross-join is still the foundation
that explains what they are doing.