Bigdata SQL: Apache Hive Pipelining the Data for Joins

Some SQL queries—especially in large data warehouses—involve star or snowflake schemas. The joins across the fact tables generally involve a lot of the data from the dimension tables too. These dimension tables are not generally small, especially if the dimensions have a lot of attributes. In such cases, broadcasting may not be an option, in terms of efficiency.
Under such conditions, Tez allows records from one data node to be streamed to the next. This allows Hive to build a pipeline of in-memory joins that can efficiently stream records during the join process.