0
votes

I have a csv file called test.csv in hdfs. The file was placed there through filezilla. I am able to view the path as well as the contents of the file when I log in to Edge node through putty using the same account credentials that I used to place the file into hdfs. I then connect to Hive and try to create an external table specifying the location of my csv file in hdfs using the statement below:

CREATE EXTERNAL TABLE(col1 string, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC LOCATION '/file path'

when I execute this command it is creating an external table on hive but the table that is being created is empty with only the columns showing up which i have already mentioned in the create statement. My question is, am I specifying the correct path in the location parameter in the create statement above? I tried using the path which I see on filezilla when I placed my csv file into hdfs which is in the format home/servername/username/directory/subdirectory/file but this returns an error saying the user whose username is specified in the path above does not have ALL privileges on the file path.

NOTE: I checked the permissions on the file and the directory in which it resides and the user has all permissions(read,write and execute).

I then tried changing the path into the format user/username/directory/subdirectory/file and when I did this I was able to create the external table however the table is empty and does not load all the data in the csv file on which it was created.

I also tried the alternative method of creating an internal table as below and then using the LOAD DATA INPATH command. But this also failed as I am getting an error saying that "there are no files existing at the specified path".

CREATE TABLE foobar(key string, stats map<string, bigint>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':' ;

LOAD DATA INPATH '/tmp/foobar.csv' INTO TABLE foobar;
1
A little confused here, are you not using hadoop fs -put <file> command in order to upload files to the HDFS?. Not sure what you mean when you said 'the file was placed there through filezilla'. You will need to have this file on the HDFS and you can check the file using hadoop fs -ls hdfs://hdfshostname/path/to/the/file - dbustosp
I am completely new to this so forgive me if i dont use the proper terms but I just mean that my csv file is already on hdfs and I am having trouble loading that file on to hive. That is my main problem. When I mention file zilla I am just talking about an FTP application which I used to place the file into hdfs instead of using the put file command. Please let me know if you need me to elaborate more - username0594
You will not be able to put your file into hdfs using filezilla. From your explanation and comments, I believe you are placing your file in local filesystem and trying to point to the file in hdfs while creating table. Once you put your file in local filesystem using filezilla, either use load data local inpath or put your file in hdfs using the command hdfs dfs put /user/foobar.csv and then create table and load the data using load data inpath - Prabhat Ratnala
the "/user/foobar.csv" path that you mentioned in the put command. Is that the location where the file is already present or the destination path where the file needs to go? - username0594
@Prabhat Ratnala the put command was the missing piece! thank you. It worked for me. I was able to load the file into my hive table - username0594

1 Answers

-1
votes

First thing you can't load csv file directly into Hive table which is specified with orc file format while creating. Orc is a compression technique to store data in optimised way. So you can load your data into orc format table by following below steps.

  • You should create a temp table as text file format.

  • Load data into it by using the command. hive> load data in path..... or else u can use location parameter while creating the table itself.

  • Now create a hive table as your required file format (RC, ORC, parquet, etc).

    -Now load data into it by using following command.

    hive> insert overwrite into table foobar as select * from temptbl;

  • You will get table in orc file format.

In second issue is if you Load data into the table by using LOAD DATA command, the data which is in your file will become empty and new dir will be created in default location (/user/hive/warehouse/) with the table name and data will moved into that file. So check in that location you will see the data.