Sql Excel : Hadoop and Hive

One of the technologies highly associated with big data is Hadoop in conjunction
with MapReduce. Hadoop is an open-source project, meaning that the code is
available for free online, with the goal of developing a framework for “reliable,
scalable, distributed computing.” (The SQL world has free open-source databases
such as MySQL, Postgres, and SQLite; in addition, several commercial databases
have free versions.) In practice, Hadoop is a platform for processing humongous
amounts of data, particularly data from sources such as web logs, high-energy
physics, high volumes of streaming images, and other voluminous data sources.
The roots of MapReduce go back to the 1960s and a language called Lisp. In the
late 1990s, Google developed a parallel framework around MapReduce, and now it
is a framework for programming data-intensive tasks on large grid computers. It
became popular because both Google and Yahoo developed MapReduce engines;
and, what big successful internet companies do must be interesting.
Hadoop actually has a family of technologies and MapReduce is only one
application. Built on Hadoop are other tools, all with colorful names such as Hive,
Mahout, Cassandra, and Pig. Although the underlying technology is different from
relational databases, there are similarities in the problems these technologies are
trying to solve. Within the Hadoop world are languages, such CQL, which is based
on SQL syntax. Hive, in particular, is being developed into a fully functional SQL
engine and can run many of the queries in this book.