2
votes

I am using load data syntax to load a csv file to a table.The file is same format as hive accepts. But still after load data is issued, Last 2 columns returns null on select.

1750,651,'2013-03-11','2013-03-17'
1751,652,'2013-03-18','2013-03-24'
1752,653,'2013-03-25','2013-03-31'
1753,654,'2013-04-01','2013-04-07'

create table dattable(
DATANUM    INT,  
ENTRYNUM BIGINT, 
START_DATE  DATE,
END_DATE    DATE ) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;

 LOAD DATA LOCAL INPATH '/path/dtatable.csv' OVERWRITE INTO TABLE dattable ;

Select returns NULL values for the last 2 cols

Other question was what if the date format is different than YYYY-MM-DD. is it possible to make hive identify the format? (Because right now i am modifying the csv file format to accept by hive)

2

2 Answers

1
votes

LasySimpleSerDe (default) does not work with quoted CSV. Use CSVSerDe:

create table dattable(
DATANUM    INT,  
ENTRYNUM BIGINT, 
START_DATE  DATE,
END_DATE    DATE ) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "quoteChar"     = "'"
)  
STORED AS TEXTFILE;

Also read this: CSVSerDe treats all columns to be of type String

Define you date columns as string and apply conversion in select.

2
votes

Answer to your 2nd question:

You will need an additional temporary table to read your input file, and then you can do date conversions in your insert select statements.In your temporary table store date fields as string. Ex.

create table dattable_ext(
DATANUM    INT,  
ENTRYNUM BIGINT, 
START_DATE  String,
END_DATE    String) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

Load data into temporary table

LOAD DATA LOCAL INPATH '/path/dtatable.csv' OVERWRITE INTO TABLE dattable_ext;

Insert from temporary table to the managed table.

insert into table dattable select DATANUM, ENTRYNUM,
from_unixtime(unix_timestamp(START_DATE,'yyyy/MM/dd'),'yyyy-MM-dd'),
from_unixtime(unix_timestamp(END_DATE,'yyyy/MM/dd'),'yyyy-MM-dd') from dattable_ext;

You can replace date format in unix_timestamp function with your input date format.