Data for the table is stored in a set of base files. New records, updates, and deletes are stored in delta files. A new set of delta files is created for each transaction (or in the case of streaming agents such as Flume or Storm, each batch of transactions) that alters a table. At read time the reader merges the base and delta files, applying any updates and deletes as it reads.
Subsequently, the major compaction merges the larger delta files and/or base file into another base file on periodic interval of time that would speed up the further table scan operation.
Inserted/updated/deleted data are periodically compacted to save space and optimize data access.
The ACID Transaction feature currently has these limitations:
- It only works for ORC file. There is a JIRA in open source to add support for Parquet tables.
- It works only for non-sorted bucketed tables.
- INSERT OVERWRITE is not supported for transactions.
- It does not support for BEGIN, COMMIT, or ROLLBACK Transactions.
- It is not recommended for OLTP.
ACID doesn't support with AVRO file and HDFS block replacement policies are same for ACID tables too.
Below link can be more helpful to understand ACID tables in Hive.
http://docs.qubole.com/en/latest/user-guide/hive/use-hive-acid.html
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions