Sql Excel : Equijoins

An equijoin is a join that has at least one condition asserting that two columns in
the tables have equal values, and all the conditions are connected by AND (which is
normally the case). In SQL, the conditions are the ON clause following the JOIN
statement.
An equijoin can return extra rows the same way that a cross-join can. If a column
value in the first table is repeated three times, and the same value occurs in the
second table four times, the equijoin between the two tables produces twelve rows
of output for that column. This is similar to the situation depicted in Figure 1.6
(page 27) that illustrates the cross-join. Using an equijoin, it is possible to add
many rows to output that are not intended, especially when the equijoin is on nonkey
columns.
Equijoins can also filter out rows, when there are no matching key values in the
second table. This filtering can be a useful feature. For instance, one table might
have a small list of ids that are special in some way. The join would then apply this
filter to the bigger table.
Although joins on primary keys are more common, there are cases where such a
many-to-many equijoin is desired. Consider this question: For each zip code, how
many zip codes in the same state have a larger population?
The following query uses a self-join (followed by an aggregation) to answer this
question. A self-join simply means that two copies of the ZipCensus table are
joined together. The equijoin uses the state column as a key, rather than the zip
code column.
SELECT zc1.zcta5,
SUM(CASE WHEN zc1.totpop < zc2.totpop THEN 1 ELSE 0 END) as numzip FROM ZipCensus zc1 JOIN ZipCensus zc2 ON zc1.stab = zc2.stab GROUP BY zc1.zcta5 Notice that ZipCensus is mentioned twice in the FROM clause. Each occurrence is given a different table alias to distinguish them in the query. The dataflow for this query, in Figure 1.8, reads the ZipCensus table twice, feeding both into the JOIN operator. The JOIN in the dataflow is an equijoin because the condition is on the stab column. The results from the join are then aggregated.