Bigdata SQL: Apache Hive Query Compiler

The overall objective of the series of steps is for the Hive compiler to take a HiveQL query and translate it into one or more MapReduce jobs. The parser will parse the HQL and generate a Parse Tree, also known as an Abstract Syntax Tree (AST). The parser tokenizes the query and identifies the type of the query and the components involved in the query: table name, WHERE clause, selected columns, join type, etc. The parser also makes sure that the query is correct in terms of its syntax and structure.
The Semantic Analyzer takes the parse tree (AST) and makes sure that the query is semantically correct, in terms of validity of the objects used in the query, ensuring that the query is valid and the objects that the query refers to exist, making sure, for example, that the table referred exists. Semantic Analyzer also performs security-based authorization, from an access perspective, as to whether the given user is allowed to access the relevant objects used in the query. Metadata from the metastore is used to complete this step.
The Logical Plan Generator takes the output of the Semantic Analyzer and generates a logical plan to execute the query—in terms of what types of operators (Scan operators, Filter operators, Join operators, Select operators) would be used to satisfy the query—and builds a logical query plan that is like an inverted tree.
The Logical Optimizer takes the logical plan for executing the query and applies algorithms to optimize the query at a logical level. In other words, it applies optimizations to the logical plan with two things in mind: reducing the data scanned and improving the query latency. It does this by intelligently applying rules and using basic descriptive statistics of the existing objects to prune the data for the tables involved in the query as early as possible. This layer makes sure that whatever optimization is being done or applied will result in the same result set, without applying the optimization. This is based on the set equivalence theory.