BigData SQL: Apache Drill

In this section, we look at Apache Drill, which is a low-latency, distributed SQL engine for large-scale data sets. Drill has propounded the theory of SQL on everything ,
which illustrates the power of Apache Drill in its ability to query almost any data, irrespective of what format it is in and where it resides, using SQL. Drill has shown how to do SQL on
any data source, whether that be RDBMS, NoSQL database, files of any format—structured, unstructured, or semi-structured—and even on directories that can contain files in multiple
formats.
Drill has been designed to scale out to thousands of nodes and query multi-terabytes of data at interactive speeds, which is very essential for BI and analytic tools. Drill supports SQL against a plethora of data sources—both relational, file-based data sources and NoSQL databases—as well as access to both structured and semi-structured data. Apache Drill is based partly on Googles research into building Dremel, which added innovations to generically handle nested data sets with columnar representation.
As with Impala, Apache Drill is not a storage engine—it is a query engine that can leverage a distributed framework architecture to scale out SQL queries across a cluster of machines on large data sets. Apache Drill relies completely on its modular and scalable architecture to perform low-latency SQL queries on multi-TB data sets. It does not rely on keeping any special indices or metadata for speeding up the queries and, like Impala,
relies on building optimized full-table scans to get results.