Hive is one of the first tools in the Hadoop ecosystem that most people learn to use. Hive is an SQL engine built on top of HDFS and leverages MapReduce internally. It allows querying of data stored on HDFS via HQL (Hive Query Language, a SQL-like language translated to MapReduce jobs). Hive was designed to run SQL queries as batch processing jobs.It was not built to provide interactive querying of data on HDFS, wherein results would come back within a few seconds. However, as Hadoop began being adopted within organizations, the requirements for Hive morphed, and users started demanding more from Hive in terms of its capabilities as well as performance.
Surprisingly, however, Hive stores its metadata in a relational database, mostly either MySQL or Postgres deployed as a single instance on the Hadoop cluster. Hive supports multiple data and file formats, such as Text, SequenceFile, ORC, RCFile, Parquet, and Avro. Hive has continually expanded its SQL capabilities by adding windowing functions, support for subqueries, and additional new data types. Hive is probably the most mature and most comprehensive SQL support structure within the Hadoop
Following is a list of some of Hive s most notable features
• Hive allows applications to be written, using high-level APIs such as JDBC, ODBC, and Thrift, without writing MapReduce jobs.
• It supports external tables, which makes data processing possible without actually storing it in HDFS.
• Hive offers support for structured and semi-structured data, such as JSON and XML using SerDe.
• It supports multiple file formats, including TextFile, SequenceFile, ORC, RCFile, Avro files, Parquet, and LZO compression.
• It supports partitioning of data on different columns to improve performance.
• Hive provides complex data structure, such as Array, Struct, and Map, which facilitates working with nested, hierarchical, and
• It supports enhanced aggregation and analytic functions, such as Cube, Grouping Sets, and Rollup.
• Hive offers user-defined functions (UDFs), which can be written in Python or Java.
• It supports out-of-the-box UDFs to work with XML and JSON data, such as xpath , explode , LATERAL VIEW , json_tuple , get_json_object , etc.
• Hive has out-of-the-box support for Text processing with UDFs.