Bigdata SQL: Serialization and SerDe in Hive p2

The entire SerDe mechanism in Hive is an extensible and flexible way to allow Hive to work with any file format. If one wanted to make Hive work with an unknown format
that is still not invented, writing a SerDe for that format would provide an easy way to integrate Hive with that data format. Let take an example of how to work with JSON data in a Hive table using JSON SerDe. If we have a file with JSON data as the following:

{“Country”:”A”,”Languages”:[“L1″,”L2″,”L3″],”Geography”:{“Lat”:”Lat1″, “Long
“:”Long1″},”Demographics”:{“Male”:”10000″, “Female”:”12000″}}

Let us create a Hive table using JSON SerDe to work with the JSON data in the
preceding file.
The Hive DDL would look like

CREATE TABLE JSonExample (
Country string,
Languages array,
Geography map,
Demographics map
)
ROW FORMAT SERDE org.openx.data.jsonserde.JsonSerDe
STORED AS TEXTFILE;

Here, we are using the JSON from openx . Once the table is defined and data is loaded using

LOAD DATA LOCAL INPATH json_data_file.txt OVERWRITE INTO TABLE JSonExample;

one can run SQL queries on the preceding table, using the following queries:

Select Country from JSonExample; // Will return “A”
Select Languages[0] from JSonExample; // Will return “L1”
Select Languages from JSonExample; // Will return [“L1”, “L2”, “L3”]
Select Geography[“Lat”] from JSonExample; // Will return “Lat1”

Although at the highest level, we are using SQL under the hood, the Hive engine is using SerDe to read the data from the file, parsing it, and passing objects to the Hive Java
code, to run the MapReduce job to get the results.