df = spark.read.parquet('xxx')
tmstmp = df['timestamp']
spark.conf.set("spark.sql.session.timeZone", "Singapore")
time_df = spark.createDataFrame([('tmstmp',)], ['unix_time'])
time_df.select(from_unixtime('unix_time').alias('ts')).collect()
df['timestamp'] = time_df
spark.conf.unset("spark.sql.session.timeZone")
there is an error at this line:
time_df.select(from_unixtime('unix_time').alias('ts')).collect()
with the exception error msg:
Exception: Python in worker has different version 2.7 than that in driver 3.7, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.