I'm trying to round hours using pyspark and udf.
The function works properly in python but not well when using pyspark.
The input is :
date = Timestamp('2016-11-18 01:45:55') # type is pandas._libs.tslibs.timestamps.Timestamp
def time_feature_creation_spark(date):
return date.round("H").hour
time_feature_creation_udf = udf(lambda x : time_feature_creation_spark(x), IntegerType())
Then I use it in the function that feeds spark :
data = data.withColumn("hour", time_feature_creation_udf(data["date"])
And the error is :
TypeError: 'Column' object is not callable
The expected output is just the closest hour from the time in the datetime (e.g. 20h45 is closest to 21h, so returns 21)