3
votes

I have a csv file which has contents like this.

"DepartmentID","Name","GroupName","ModifiedDate"
"1","Engineering","Research and Development","2008-04-30 00:00:00"

I have

create external table if not exists AdventureWorks2014.Department
( 
    DepartmentID smallint , 
    Name string ,
   GroupName string, 
    rate_code string, 
    ModifiedDate timestamp 
)   
ROW FORMAT DELIMITED FIELDS TERMINATED BY '","' lines terminated by '\n'
STORED AS TEXTFILE LOCATION 'wasb:///ds/Department' TBLPROPERTIES('skip.header.line.count'='1');`

And after loading the data

LOAD DATA INPATH 'wasb:///ds/Department.csv' INTO TABLE AdventureWorks2014.Department;

The data is not loaded.

select * from AdventureWorks2014.Department;

The above select returns nothing.

I think the double quotes around each fileds is the issue. Is there a way to load the data from such a file to hive tables, Without having to strip out the double quotes?

3
You shouldn't put double quotes for a smallint type. use 1 instead of "1" - Venky
So if I change the table definition DepartmentID smallint to DepartmentID string will fix the issue? I will try that. The double quotes are a result of the extract process using SSIS. - user5299

3 Answers

2
votes

Try this (cellphone...)

create external table if not exists AdventureWorks2014.Department ( DepartmentID smallint , Name string , GroupName string, rate_code string, ModifiedDate timestamp )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'      
STORED AS TEXTFILE 
LOCATION 'wasb:///ds/Department' 

** Limitation **
This SerDe treats all columns to be of type String. Even if you create a table with non-string column types using this SerDe, the DESCRIBE TABLE output would show string column type. The type information is retrieved from the SerDe. To convert columns to the desired type in a table, you can create a view over the table that does the CAST to the desired type.

https://cwiki.apache.org/confluence/display/Hive/CSV+Serde

2
votes

FIELDS TERMINATED BY '","' is incorrect. Your fields are terminated by a , not ",". Change your DDL to FIELDS TERMINATED BY ','.

0
votes

LOAD DATA LOCAL INPATH '/home/hadoop/hive/log_2013805_16210.log'into table_name