0
votes

I have four string columns 'hour', 'day', 'month', 'year' in my data frame. I would like to create new column fulldate in format 'dd/MM/yyyy HH:mm'.

df2 = df1.withColumn("fulldate", to_date(concat(col('day'), lit('/'), col('month'), lit('/'), col('year'), lit(' '), col('hour'), lit(':'), lit('0'), lit('0')), 'dd/MM/yyyy HH:mm'))

but it doesn't seem to work. I'm getting format "yyyy-mm-dd".

Am I missing something?

2

2 Answers

1
votes

Use date_format instead of to_date.

to_date converts a column to date type from the given format, while date_format converts a date type column to the given format.

from pyspark.sql.functions import date_format, concat, col, lit

df2 = df1.withColumn(
    "fulldate",
    date_format(
        concat(col('year'), lit('/'), col('month'), lit('/'), col('day'), lit(' '), col('hour'), lit(':'), lit('00'), lit(':'), lit('00')),
        'dd/MM/yyyy HH:mm'
    )
)

For better readability, you can use format_string:

from pyspark.sql.functions import date_format, format_string, col

df2 = df1.withColumn(
    "fulldate",
    date_format(
        format_string('%d/%d/%d %d:00:00', col('year'), col('month'), col('day'), col('hour')),
        'dd/MM/yyyy HH:mm'
    )
)
1
votes

For Spark 3+, you can use make_timestamp function to create a timestamp column from those columns and use date_format to convert it to the desired date pattern :

from pyspark.sql import functions as F

df2 = df1.withColumn(
    "fulldate",
    F.date_format(
        F.expr("make_timestamp(year, month, day, hour, 0, 0)"),
        "dd/MM/yyyy HH:mm"
    )
)