I have log files stored as text in HDFS. When I load the log files into a Hive table, all the files are copied.
Can I avoid having all my text data stored twice?
EDIT: I load it via the following command
LOAD DATA INPATH '/user/logs/mylogfile' INTO TABLE `sandbox.test` PARTITION (day='20130221')
Then, I can find the exact same file in:
/user/hive/warehouse/sandbox.db/test/day=20130220
I assumed it was copied.
LOAD DATA INPATH 'xxx' INTO TABLE yyy(see post edit) then I find the file in/user/hive/warehouse. I am wondering if it can leave it there (I guess I would have to enforce partition structure in my directories but that is fine) - Mad Echethive.metastore.warehouse.dirproperty poins in your hive configuration? - Abimaran Kugathasan