BigData SQL: General Architecture Pattern

In the next few sections, we take a deeper look at some of the state-of-the-art massive parallel processing (MPP) analytic processing engines for big data processing. Impala from Cloudera (now open source) and Apache Drill are the two MPP SQL engines we will cover in the following sections. The core ideas for these two MPP engines have evolved from Googles Dremel, which introduced two main innovations: handling nested data with column-striped formats and multilevel query execution trees, which allow parallel processing of large data sets over large-scale computing clusters. Both these MPP engines rely on full scan to return the relevant data but also on smart optimizations to figure out what to scan before the actual scanning process starts.