29
votes

I have a DataFrame with Timestamp column, which i need to convert as Date format.

Is there any Spark SQL functions available for this?

5

5 Answers

65
votes

You can cast the column to date:

Scala:

import org.apache.spark.sql.types.DateType

val newDF = df.withColumn("dateColumn", df("timestampColumn").cast(DateType))

Pyspark:

df = df.withColumn('dateColumn', df['timestampColumn'].cast('date'))
16
votes

In SparkSQL:

SELECT
  CAST(the_ts AS DATE) AS the_date
FROM the_table
5
votes

Imagine the following input:

val dataIn = spark.createDataFrame(Seq(
        (1, "some data"),
        (2, "more data")))
    .toDF("id", "stuff")
    .withColumn("ts", current_timestamp())

dataIn.printSchema
root
 |-- id: integer (nullable = false)
 |-- stuff: string (nullable = true)
 |-- ts: timestamp (nullable = false)

You can use the to_date function:

val dataOut = dataIn.withColumn("date", to_date($"ts"))

dataOut.printSchema
root
 |-- id: integer (nullable = false)
 |-- stuff: string (nullable = true)
 |-- ts: timestamp (nullable = false)
 |-- date: date (nullable = false)

dataOut.show(false)
+---+---------+-----------------------+----------+
|id |stuff    |ts                     |date      |
+---+---------+-----------------------+----------+
|1  |some data|2017-11-21 16:37:15.828|2017-11-21|
|2  |more data|2017-11-21 16:37:15.828|2017-11-21|
+---+---------+-----------------------+----------+

I would recommend preferring these methods over casting and plain SQL.

2
votes

For Spark 2.4+,

import spark.implicits._
val newDF = df.withColumn("dateColumn", $"timestampColumn".cast(DateType))    

OR

val newDF = df.withColumn("dateColumn", col("timestampColumn").cast(DateType))
0
votes

Best thing to use..tried and tested -

df_join_result.withColumn('order_date', df_join_result['order_date'].cast('date'))