BigData Sql :File Format Selection

Different file formats and compression codecs work better for different data sets. Impala provides performance gains irrespective of file format; however, choosing the most
optimized and efficient format for the data you work with yields further performance improvements. Better data formats allow users to leverage lower storage and optimization
at query time, by processing less data, and also during I/O and network, by reading and transmitting less data.
Text format data is not efficient for storage and query, unlike Parquet and ORC, which are highly optimized file formats that result in better storage and I/O efficiency. Hence, is it always advisable to convert text data to a Parquet format, for which Impala is most optimized. If the data is available as a text file, create a new table with a Parquet format and use Impala to query that data format. Natively, Impala has been designed to work best with a Parquet format. Parquet has lots of optimizations built in, which makes it suitable for querying large data sets with low latency.
Optimizations in Parquet make it suitable for low-latency queries, which include those optimized for large data blocks and nested data. Internally, Parquet uses an extensible set of column encodings and also includes embedded inlined column statistics for optimization of scan efficiency through min/max values for a block.