Bigdata SQL: SQL Engines on Traditional Databases

When a SQL query is submitted to the database engine, a number of processes get to work to satisfy the query. At a high level there are two major sets of processes that spring into action: query engine and the storage engine

The query engine parses the query, verifies the semantics of the query against the catalog store, and generates the logical query plan that is then optimized by the query optimizer to generate the optimal physical execution plan, based on CPU, I/O, and network costs to execute the query. The final query plan is then used by the storage engine to retrieve the underlying data from various data structures, either on disk or in
memory. It is in the storage engine that processes such as Lock Manager, Index Manager, Buffer Manager, etc., get to work to fetch/update the data, as requested by the query.

Most RDBMS are SMP-based architectures. Traditional database platforms operate by reading data off disk, bringing it across an I/O interconnect, and loading it into memory for further processing. An SMP-based system consists of multiple processors, each with its own memory cache. Memory and I/O subsystems are shared by each of the processors. The SMP architecture is limited in its ability to move large amounts of data, as
required in data warehousing and large-scale data processing workloads.

The major drawback of this architecture is that it moves data across backplanes and I/O channels. This does not scale and perform when large data sets are to be queried, and more so when the queries involve complex joins that require multiple phases of processing. A huge inefficiency lies in delivering large data off disk across the network and into memory for processing by the DataBase Management System (DBMS).