Drill is an MPP-based SQL query execution engine that performs distributed query processing across the nodes in the cluster.
During query execution, Drill optimizes for columnar storage and execution by using an in-memory columnar data model. With columnar data formats, Drill avoids disk access for columns not used in the query. Drills execution layer performs SQL processing directly on columnar data, without any intermediate conversion to row-oriented data. Drills query engine is characterized by
• Columnar/Vectorized : Drill operates on more than one record at a time with SIMD-based optimized instructions using LLVM and JVM optimizations. Internally Drill also maintains bitmaps to
allow checking for null values.
• Pipelining : Drill works in record batches (in columnar format) and pipelines the results of such batches in between the drillbits on each node. Pipelining occurs in memory and, hence, reduces the serialization/de-serialization costs.
Drills query engine is characterized by the following:
• Runtime compilation
• Late binding
Apache Drill is a very mature product—more mature than Impala or Apache Spark. Selecting the right SQL-on-big-data engine depends to a certain extent on the kind of Hadoop distribution your organizations has. If it has Cloudera as the Hadoop distribution,Impala is probably the best way to go. If it has MapR distribution, definitely use Apache
Drill. If you are using Apache Spark as your framework for different workloads, it is worth giving Spark SQL a try for your interactive queries to see how it performs. Spark SQL is very new and evolving in terms of its support for SQL coverage and analytic queries.
If you are starting from a blank slate, it is recommended that you use Apache Drill before trying other products on the market. Apache Drill is a very versatile product, reflecting the kind of talent behind it. Apache Drill has wide support for a variety of data types, data sources, and complex SQL queries.