BigData Excel: HDFS Caching

Impala can leverage HDFS (Hadoop distributed file system) caching to use the memory effectively, especially for repeated queries that take advantage of data pinned in memory, regardless of the size of the data being processed. With HDFS caching, one can designate a subset of frequently used data to be pinned in memory. This applies to tables or table partitions frequently accessed and small enough to fit in the HDFS memory cache.
Once HDFS caching is set up, within Impala DDL, CREATE and ALTER statements specify the cache pool name, to enable HDFS caching for that table. The actual syntax looks like
CREATE TABLE … CACHED IN or
ALTER TABLE … SET CACHED
IN .
For a table that is already cached, if new partitions are added through ALTER TABLE … ADD PARTITION statements, the data in those new partitions is automatically cached in the same pool.