Sql Excel : Column Types

The second important attribute of a column is its type, which tells the database
exactly how to store values. A well-designed database usually has parsimonious
columns, so if two characters suffice for a code, there is no reason to store eight.
There are a few important aspects of column types and the roles that columns
Primary key columns uniquely identify each row in the table. That is, no two rows
have the same value for the primary key and the primary key is never NULL.
Databases guarantee that primary keys are unique by refusing to insert rows with
duplicate primary keys. Chapter 2, “What s in a Table? Getting Started with Data
Exploration,” shows techniques to determine whether this condition holds for any
given column. Typically the primary key is a single column, although SQL does
allow composite primary keys, which consist of multiple columns.
Numeric values are values that support arithmetic and other mathematical
operations. In SQL, these can be stored in different ways, such as floating-point
numbers, integers, and decimals. The details of how these formats differ are much
less important than what can be done with numeric data types.
Within the category of numeric types, one big difference is between integers,
which have no fractional part, and real numbers, which do. When doing
arithmetic on integers, the result might be an integer or it might be a real number,
depending on the database. So 5/2 might evaluate to 2 rather than 2.5, and the
average of 1 and 2 might turn out to be 1 instead of 1.5, depending on the database.
To avoid this problem, examples in this book multiply integer values by 1.0 to
convert them to decimal values when necessary.
Of course, just because it walks like a duck and talks like a duck does not mean
that it is a duck. Some values look like numbers, but really are not. Zip codes (in
the United States) are an example, as are primary key columns stored as numbers.
What is the sum of two zip codes? What does it mean to multiply a primary key
value by 2? These questions yield nonsense results (although the values can be
calculated). Zip codes and primary keys happen to look like numbers, but they do
not behave like numbers.
The datasets used in this book use character strings for zip codes and numbers for
primary keys. To distinguish such false numeric values from real numbers, the
values are often left padded with zeros to get a fixed length. After all, the zip code
for Harvard Square in Cambridge, MA, is 02138, not 2,138.
Dates and date-times are exactly what their names imply. SQL provides several
functions for common operations, such as determining the number of days
between two dates, extracting the year and month, and comparing two times.
Unfortunately, these functions often differ between databases. The Appendix
provides a list of equivalent functions in different databases for functions used in
this book, including date and time functions.
Another type of data is character string data. These are commonly codes, such as
the state abbreviation in the zip code table, or a description of something, such as
a product name or the full state name. SQL has some very rudimentary functions
for handling character strings, which in turn support rudimentary text processing.
Spaces at the end of a character string are ignored, so the condition NY = NY
evaluates to TRUE. However, spaces at the beginning of a character string are
counted, so NY = NY evaluates to FALSE. When working with data in
character columns, it might be worth checking out whether there are spaces at the