Bigdata SQL: Apache Hive Dynamically Partitioned Joins

In cases in which the tables to be joined are already partitioned or bucketed, and the table on the right side of the JOIN clause is not small enough to be broadcasted to all the Map or Reduce processes in the next layer, Tez just transmits the right partitions to the process in the next layers that are required by them, based on the partitioning/bucketing information. This works very well when the Hash join algorithm is used to perform joins.
For example, take the following SQL query (seen earlier, under “Optimization by Using a Broadcast Join”). Assume that both the tables are partitioned/bucketed on the join key, and the inventory table is not small enough to be broadcasted to all the processes.
SELECT sales.item, sales.quantity,inventory.quantityleft
FROM sales JOIN inventory
ON (sales.productId = inventory.productID)
In this case, Tez would optimize the SQL query by broadcasting to the later processes, not the whole table to the later processes in the chain, but only the correct
partitions from the right-hand side of the table