I am trying to load csv files into a Hive table. I need to have it done through HDFS.
My end goal is to have the hive table also connected to Impala tables, which I can then load into Power BI, but I am having trouble getting the Hive tables to populate.
I create a table in the Hive query editor using the following code:
CREATE TABLE IF NOT EXISTS dbname.table_name (
time_stamp TIMESTAMP COMMENT 'time_stamp',
attribute STRING COMMENT 'attribute',
value DOUBLE COMMENT 'value',
vehicle STRING COMMENT 'vehicle',
filename STRING COMMENT 'filename')
Then I check and see the LOCATION using the following code:
SHOW CREATE TABLE dbname.table_name;
and find that is has gone to the default location: hdfs://our_company/user/hive/warehouse/dbname.db/table_name
So I go to the above location in HDFS, and I upload a few csv files manually, which are in the same five-column format as the table I created. Here is where I expect this data to be loaded into the Hive table, but when I go back to dbname in Hive, and open up the table I made, all values are still null, and when I try to open in browser I get:
DB Error AnalysisException: Could not resolve path: 'dbname.table_name'
Then I try the following code:
LOAD DATA INPATH 'hdfs://our_company/user/hive/warehouse/dbname.db/table_name' INTO TABLE dbname.table_name;
It runs fine, but the table in Hive still does not populate.
I also tried all of the above using CREATE EXTERNAL TABLE instead, and specifying the HDFS in the LOCATION argument. I also tried making an HDFS location first, uploading the csv files, then CREATE EXTERNAL TABLE with the LOCATION argument pointed at the pre-made HDFS location.
I already made sure I have authorization privileges.
My table will not populate with the csv files, no matter which method I try.
What I am doing wrong here?