How do I convert column of unix epoch to Date in Apache spark DataFrame using Java?

Question

I have a json data file which contain one property [creationDate] which is unix epoc in "long" number type. The Apache Spark DataFrame schema look like below:

root 
 |-- creationDate: long (nullable = true) 
 |-- id: long (nullable = true) 
 |-- postTypeId: long (nullable = true)
 |-- tags: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- title: string (nullable = true)
 |-- viewCount: long (nullable = true)

I would like to do some groupBy "creationData_Year" which need to get from "creationDate".

What's the easiest way to do this kind of convert in DataFrame using Java?

ErhWen Kuo ErhWen Kuo · Accepted Answer · 2016-01-06T05:50:20

After checking spark dataframe api and sql function, I come out below snippet:

DateFrame df = sqlContext.read().json("MY_JSON_DATA_FILE");

DataFrame df_DateConverted = df.withColumn("creationDt", from_unixtime(stackoverflow_Tags.col("creationDate").divide(1000)));

The reason why "creationDate" column is divided by "1000" is cause the TimeUnit is different. The orgin "creationDate" is unix epoch in "milli-second", however spark sql "from_unixtime" is designed to handle unix epoch in "second".

How do I convert column of unix epoch to Date in Apache spark DataFrame using Java?

3 Answers