The CASE expression adds conditional logic into the SQL language. Its most general

form is:

CASE WHEN

. . .

WHEN

ELSE

The

arbitrarily complicated. The

and these should all be the same type. The

the order they are written. When no

statement returns NULL if none of the previous clauses match.

One common use of CASE is to create indicator variables. Consider the following

question: How many zip codes in each state have a population of more than

10,000 and what is the total population of these? The following SQL query is,

perhaps, the most natural way of answering this question:

SELECT zc.stab, COUNT(*) as numbigzip, SUM(totpop) as popbigzip

FROM ZipCensus zc

WHERE totpop > 10000

GROUP BY zc.stab

This query uses a WHERE clause to choose the appropriate set of zip codes.

Now consider the related question: How many zip codes in each state have a

population of more than 10,000, how many have more than 1,000, and what is

the total population of each of these sets?

Unfortunately, the WHERE clause solution no longer works, because two overlapping

sets of zip codes are needed. One solution is to run two queries, which is messy.

Combining the results into a single query is easy using conditional aggregation:

SELECT zc.stab,

SUM(CASE WHEN totpop > 10000 THEN 1 ELSE 0 END) as num_10000,

SUM(CASE WHEN totpop > 1000 THEN 1 ELSE 0 END) as num_1000,

SUM(CASE WHEN totpop > 10000 THEN totpop ELSE 0 END

) as pop_10000,

SUM(CASE WHEN totpop > 1000 THEN totpop ELSE 0 END

) as pop_1000

FROM ZipCensus zc

GROUP BY zc.stab

Notice that in this version, the SUM() function is used to count zip codes that meet

the appropriate condition; it does so by adding 1 for each matching row. COUNT() is

not the right function, because it would count the number of non-NULL values.

The following two statements are very close to being the same, but the second

lacks the ELSE clause:

SUM(CASE WHEN totpop > 10000 THEN 1 ELSE 0 END) as num_10000,

SUM(CASE WHEN totpop > 10000 THEN 1 END) as num_10000,

Each counts the number of zip codes where population is greater than 10,000. The

difference is what happens when no zip codes have such a large population. The

first returns the number 0. The second returns NULL. Usually when counting

things, it is preferable to have the value be a number rather than NULL, so the first

form is generally preferred.

The CASE statement can be much more readable than the WHERE clause because the

CASE statement has the condition in the SELECT, rather than much further down in

the query. On the other hand, the WHERE clause provides more opportunities for

optimization.