1
votes

I have a spark Dataframe which contains a field as a timestamp. I am storing the dataframe into HDFS location where hive external table is created. Hive table contains the field with timestamp type. But while reading data from the external location hive is populating the timestamp field as a blank value in the table. my spark dataframe query:

df.select($"ipAddress", $"clientIdentd", $"userId", to_timestamp(unix_timestamp($"dateTime", "dd/MMM/yyyy:HH:mm:ss Z").cast("timestamp")).as("dateTime"), $"method", $"endpoint", $"protocol", $"responseCode", $"contentSize", $"referrerURL", $"browserInfo")

Hive create table statement:

CREATE EXTERNAL TABLE `finalweblogs3`(
   `ipAddress` string,
   `clientIdentd` string,
   `userId` string,
   `dateTime` timestamp,
   `method` string,
   `endpoint` string,
   `protocol` string,
   `responseCode` string,
   `contentSize` string,
   `referrerURL` string,
   `browserInfo` string)
 ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
 WITH SERDEPROPERTIES (
   'field.delim'=',',
   'serialization.format'=',')
 STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
   'hdfs://localhost:9000/streaming/spark/finalweblogs3'

I am not able to get it why this is happening.

1
Try by removing the unix_timestamp part : to_timestamp($"dateTime", "dd/MMM/yyyy:HH:mm:ss Z").cast("timestamp").as("dateTime") - roh
can I also know How is original timestamp looks like? hive only take the time stamp format as yyyy-mm-dd hh:mm:ss[.f...] - roh
25/Oct/2011:01:41:00 -0500 This is how the timestamp looks like. - Naman Agarwal
Did you try the one I suggested in the first comment? - roh
yes, I have tried and still, in Hive, it is getting populated as a blank value. - Naman Agarwal

1 Answers

0
votes

I resolved it by changing the storing format as "Parquet". I still don't know why it is not working for CSV format.