I wanted to use the date_trunc
function on the dataframe which has the date column so that I can create new column that would give me information about which quarter the record is associated with.
Stuffs that i have tried is as below :
import org.apache.spark.sql.functions._
val test = Seq(("2010-03-05"),("2018-01-16"),("2018-04-20")).toDF("TestDates")
display(test) //this displays the date in the notebook
val datetrunctest = test.withColumn("Quarter", date_trunc("QUARTER",$"TestDates"))
display(datetrunctest) //this gives me an error saying **error: not found: value date_trunc**
Also when I try to use the import statement with the function name it gives me error as below :
import org.apache.spark.sql.functions.date_trunc
Error : value date_trunc is not a member of object org.apache.spark.sql.functions
I am able to use the same function in spark sql as below by saving above dataframe test as a table "DailyDates":
val ddd = spark.sql("Select TestDates,date_trunc('QUARTER', TestDates) as QuarterDate from test.DailyDates")
display(ddd)
I have lot of transformation/aggregation that needs to be performed on the dataframe so I am looking to find an way by which I could make this work on the dataframe by adding additional column. According to the documentation if you are using spark version greater than 2.3.0 this should work and I am using spark version 2.4.3 .
Snapshot image for the spark version :
Document Link : https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/functions.html#date_trunc-java.lang.String-org.apache.spark.sql.Column-
Does anyone have any idea on what could be the issue and how I can get this working?