date_trunc function does not work with the spark dataframe while adding new column

Question

I wanted to use the date_trunc function on the dataframe which has the date column so that I can create new column that would give me information about which quarter the record is associated with.

Stuffs that i have tried is as below :

import org.apache.spark.sql.functions._
val test = Seq(("2010-03-05"),("2018-01-16"),("2018-04-20")).toDF("TestDates")
display(test)  //this displays the date in the notebook

val datetrunctest = test.withColumn("Quarter", date_trunc("QUARTER",$"TestDates"))
display(datetrunctest)  //this gives me an error saying **error: not found: value date_trunc**

Also when I try to use the import statement with the function name it gives me error as below :

import org.apache.spark.sql.functions.date_trunc
Error : value date_trunc is not a member of object org.apache.spark.sql.functions

I am able to use the same function in spark sql as below by saving above dataframe test as a table "DailyDates":

val ddd = spark.sql("Select TestDates,date_trunc('QUARTER', TestDates) as QuarterDate from test.DailyDates")
display(ddd)

I have lot of transformation/aggregation that needs to be performed on the dataframe so I am looking to find an way by which I could make this work on the dataframe by adding additional column. According to the documentation if you are using spark version greater than 2.3.0 this should work and I am using spark version 2.4.3 .

Snapshot image for the spark version :

Document Link : https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/functions.html#date_trunc-java.lang.String-org.apache.spark.sql.Column-

Does anyone have any idea on what could be the issue and how I can get this working?

Looks totally fine, just make sure you use spark version >2.3.0 — koiralo
@koiralo I have verified that the version that I am using is 2.4.3 but it is still giving me that error. — Nikunj Kakadiya

Shrey Jakhmola Shrey Jakhmola · Accepted Answer · 2020-09-16T07:48:19

Check the version of spark you are using,

Open Spark shell Terminal and enter the command

sc.version Or spark-submit --version

Make sure you import the function

import org.apache.spark.sql.functions._

and then

The function requires the parameter to be Column

public static Column quarter(Column e)

import org.apache.spark.sql.functions._
val test = Seq(("2010-03-05"),("2018-01-16"),("2018-04-20")).toDF("TestDates")
val datetrunctest = test.withColumn("Quarter", quarter($"TestDates"))

date_trunc function does not work with the spark dataframe while adding new column

2 Answers