2
votes

I am reading call records from a file and converting them to columns in a dataframe. How can I convert a column with a string value, which is actually a timestamp in format "yyyymmddhhMMss" to a timestamp before saving that dataframe to hive table? I used below snippet but I get null value in the hive table. As I understand I have to specify format i.e., "yyyymmddhhMMss" in which string values exist. But not sure how to do that. I am using spark 2.1.1

val df2 = df.withColumn("originTimeStamp",$"originTimeStamp".cast("timestamp")
2

2 Answers

3
votes

For converting a String value from dataframe to timestamp, you can use to_timestamp function available in org.apache.spark.sql.functions package. Like this:

For Spark 2.2 and above

val df2 = df.withColumn("originTimeStamp",to_timestamp($"originTimeStamp", "yyyymmddhhMMss"))

For Spark 2.1 and below

val df2 = df.withColumn("originTimeStamp",unix_timestamp($"originTimeStamp", "yyyymmddhhMMss").cast("timestamp"))

It will give you following output:

+-------------------+
|    originTimeStamp|
+-------------------+
|2017-01-07 10:06:00|
+-------------------+

I hope it helps!

2
votes

You can use unix_timestamp like below

import org.apache.spark.sql.types._
val df2 = Seq((1, "20180508022659")).toDF("id", "originTimeStamp")
df2.withColumn("originTimeStamp", unix_timestamp($"originTimeStamp", "yyyyMMddHHmmss").cast(TimestampType))