Im using the Cloudera quickstart vm 5.1.0-1
Im trying to load my 3GB csv in Hadoop via Hue and what I tried so far is: - Load the csv into the HDFS and specifically into a folder called datasets positioned at /user/hive/datasets - Use the Metastore Manager to load it into the default db
Everything works fine meaning that I manage to load it with the right columns. The main problem is that when I query the table with Impala launching the following query:
show table stats new_table
I realize that the size is only 64 MB instead of the actual size of the csv which should be 3GB.
Also, if I do a count(*) via Impala the number of rows is only 70000 against the actual 7 million.
Any help would be deeply appreciated.
Thanks in advance.