I am trying to import data from MySQL to HBase using sqoop. There are about 9 million records in the MySQL table, size being nearly 1.2GB. The replication factor of the hadoop cluster is three.
Here are the issues I am facing:
The data size after import to hbase is more than 20 GB!!! Ideally it should be close to, say 5GB (1.2G*3 + some overhead)
The HBase table has VERSIONS defined as 1. In case I import the same table again from MySQL, the file size in /hbase/ increases (almost doubles). Although the row count in HBase tables remains same. This seems weird as I am inserting the same rows in HBase, hence the filesize should remain the same, similar to the row count value.
As far as my understanding goes, the file size in the second case shouldn't increase if I am importing the same rowset as max version maintained for each entry should be one only.
Any help would be highly appreciated.