The following properties have to be set in order to support transactions:
• hive.support.concurrency — true
• hive.enforce.bucketing — true
• hive.exec.dynamic.partition.mode — nonstrict
• hive.txn.manager — org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
• hive.compactor.initiator.on — true
• hive.compactor.worker.threads —must be set to at least 1 for the Thrift metastore service.
When creating a table that supports a transaction, the following clauses need to be used: buckets , ORC , and TBLPROPERTIES , as follows:
CREATE TABLE Users(Name string, Address string, Designation string)
clustered by (Name) into 10 buckets stored as orc TBLPROPERTIES
However, Update is not supported on the column on which the table is bucketed. Typically, this feature in Hive should be used only for data warehousing applications and not for OLTP-based applications.
As a part of Hadoops new API changes to support ACID, a new input format called AcidInputFormat has been added. This class and its associated methods allow external applications to write into Hive, using ACID semantics.
Examples of such applications can include streaming applications and data warehousing applications that update dimension tables and fact table insert and updates. This API only supports batch updates, and not simultaneous updates, as replacements for OLTP applications. Each operation writes to a new delta directory, which is created when events insert, update, or delete rows. Files are stored and sorted by the original transaction id, bucket id, and row id, and current transaction id, which is modifying the row in question.