One of the new additions to Hive is a new execution model that is a combination of process-based tasks and long-lived daemons running on worker nodes. LLAP (Long Live and Process) is a long-lived daemon process on the data nodes within the Hive framework that reduces the startup costs of tasks and gives the JIT (just in time) compiler a few extra ms with which to optimize the code. LLAP is an in-memory columnar cache that replaces direct interactions with the DataNode.
Functionality such as caching, pre-fetching, query fragment processing, and access control can be moved into the daemon. Small queries can be processed by the LLAP daemon directly, while any resource-intensive queries can be performed inside YARN containers.
LLAP is not an execution engine or a storage layer. The idea of introducing LLAP was to improve the latency of the queries with some local caching. The LLAP process has a number of executors within it, and each of them executes some task of the Map or Reduce process.
An LLAP process is an optional daemon process running on multiple nodes with the following capabilities:
• Caching and data reuse across queries with compressed columnar data in-memory (off-heap)
• Multi-threaded execution, including reads with predicate pushdown and hash joins
• High throughput I/O using Asynchronous Elevator algorithms with dedicated thread and core per disk
• Granular column level security across applications
This hybrid engine approach provides fast response times by efficient in-memory data caching and low-latency processing, provided by node-resident LLAP processes. LLAP enables efficient execution of queries by approaches such as caching columnar data, building JIT-friendly operator pipelines, and adding new performance-enhancing features like asynchronous I/O, pre-fetching, and multi-threaded processing
LLAP works only with the TEZ engine, does not currently support ACID-based transactions in Hive, and only supports the ORC format.
All these optimizations enable Hive to execute not only ETL/ELT workloads but also such applications as data discovery. The ability to work interactively with SQL on Hive expands the roles of Hive users and expedites development, testing, data flow, and data pipeline execution.