BigData SQL: Impala

One of the initial main goals of Impala was to provide an SQL-on-Hadoop solution for fast interactive workloads. Impala is part of the crowded marketplace of low-latency SQL engines on large data sets for analytic queries.
Like all MPP SQL engines, Impala is architected to be a shared nothing architecture in which all the processors are loosely coupled, have their own memory and CPU, and own chunks of data to work with. Impala is designed for analytic workloads, rather than transaction or operational workloads.
However, unlike other high-cost MPP engines out there, Impala has been designed to scale out by adding commodity servers to the Impala cluster. Impala has been designed to work as the back-end engine of BI tools for fast analytic queries on large data sets, and it supports application connectivity with support for Java Database Connectivity (JDBC) and open database connectivity (ODBC).
Impala has been written in the C++ language to increase its speed and make possible lots of machine-generated code for faster execution. As an engine based on C++, Impala avoids many of the problems associated with Java-based engines related to garbage collection slowdowns and heap size issues. Impala is not a storage engine; it is an SQL query engine that leverages data stored on HDFS/HBase or flat files and directories.
Impala uses an internal in-memory tuple format that puts fixed-width data at fixed offsets for faster data access. It uses special CPU instructions for text parsing and crc32 computation.