I have a spark Dataframe which contains a field as a timestamp. I am storing the dataframe into HDFS location where hive external table is created. Hive table contains the field with timestamp type. But while reading data from the external location hive is populating the timestamp field as a blank value in the table. my spark dataframe query:
df.select($"ipAddress", $"clientIdentd", $"userId", to_timestamp(unix_timestamp($"dateTime", "dd/MMM/yyyy:HH:mm:ss Z").cast("timestamp")).as("dateTime"), $"method", $"endpoint", $"protocol", $"responseCode", $"contentSize", $"referrerURL", $"browserInfo")
Hive create table statement:
CREATE EXTERNAL TABLE `finalweblogs3`(
`ipAddress` string,
`clientIdentd` string,
`userId` string,
`dateTime` timestamp,
`method` string,
`endpoint` string,
`protocol` string,
`responseCode` string,
`contentSize` string,
`referrerURL` string,
`browserInfo` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'serialization.format'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://localhost:9000/streaming/spark/finalweblogs3'
I am not able to get it why this is happening.