We are happy to announce that the Hive transactions feature is now available on the Altiscale Data Cloud.
Why the Hive Transactions Feature?
ACID properties (Atomicity, Consistency, Isolation, Durability) are important traits for any database system. These properties are some of the key considerations when selecting a database. With transactions enabled, Hive supports all ACID properties at the row level. While it’s easy to see how transactions apply to an online transaction processing (OLTP) system, it’s less obvious for Hive since it is primarily used for online analytical processing (OLAP) scenarios. In particular, feedback from our customers here at Altiscale have led us to pinpoint two use cases of Hive transactions: data restatement and changing dimension tables.
Let’s say a table is created and populated with data in Hive. It is later discovered that some of the data in the table is wrong. The table needs to be updated with the correct values.
Changing dimension tables
As incoming data volume increases on a daily basis, it’s often necessary to modify the dimension table in the star schema. For example, consider a table with merchant information which has geo-location, address, and so on. If the merchant moves to a different location, it becomes necessary to update those fields with new values.
These two use cases are difficult to address without Hive transactions. Hive uses HDFS which does not support in-place updates to underlying files. Rather than replacing the entire table for every transaction, Hive follows the “base and delta” files approach which is a proven technique in the data warehouse world. Hive takes care of merging many delta files into one delta file and then automatically merges a delta file into a base file via minor and major compactions.
However, users should be aware that Hive transactions come with some limitations. The Hive community plans to address these limitations eventually. The top three of them are:
(1) The table must be in Optimized Row Columnar (ORC) file format
(2) The table must be bucketed
(3) The BEGIN, COMMIT and ROLLBACK model is not yet supported. All operations are auto-committed by default.
If Altiscale customers would like to experiment with this feature, feel free to reach out to us via the Altiscale support channel to enable this feature for your cluster.