Sql Excel : Rewriting the “IN” as a JOIN

Strictly speaking, the IN operator is not necessary, because queries with INs and
subqueries can be rewritten as joins. For example, this is equivalent to the
previous query:
FROM ZipCensus zc JOIN
(SELECT stab, COUNT(*) as numstates
FROM ZipCensus
) zipstates
ON zc.stab = zipstates.stab AND
zipstates.numstates < 100 Note that in the rewritten query, the zipstates subquery has two columns instead of one. The second column contains the count of zip codes in each state. Using the IN statement with a subquery makes it impossible to get this information. On the other hand, the IN does have a small advantage, because it guarantees that there are no duplicate rows in the output, even when the “in” set has duplicates. To guarantee this using the JOIN, aggregate the subquery by the key used to join the tables. In this case, the subquery is doing aggregation anyway to find the states that have fewer than one hundred zip codes. This aggregation has the additional effect of guaranteeing that the subquery has no duplicates. The general way of rewriting an IN subquery using join requires eliminating the duplicates. So, the query: SELECT x.* FROM x WHERE x.col_a IN (SELECT y.col_b FROM y) would be rewritten as: SELECT DISTINCT x.* FROM x JOIN y ON x.col_a = y.col_b; or: SELECT x.* FROM x JOIN (SELECT DISTINCT y.col_b FROM y) y ON x.col_a = y.col_b; The DISTINCT keyword removes duplicates from the output. However, this requires additional processing so it is best to avoid DISTINCT unless it is really necessary.