I'm trying to cast a rfc2822 datetime column to a timestamp column. if i'm working with the variable outside a dataframe it's worked. But in a dataframe I receive an error message
My imports:
from pyspark.sql.types import *
from pyspark.sql.column import *
from pyspark.sql.functions import *
from email.utils import parsedate_to_datetime
Working outside the dataframe this is the code:
datestr = "Thu Sep 12 2019 15:58:30 GMT-0500 (hora estándar de Colombia)"
print(parsedate_to_datetime(datestr))
Output:
2019-09-12 15:58:30
But, if i'm working with this dataframe:
df =
spark.createDataFrame(["Thu Sep 12 2019 15:58:30 GMT-0500 (hora estándar de Colombia)"], "string",).toDF("Date")
And try to create another column with the following code:
df2 = df.withColumn("timestamp", parsedate_to_datetime(col("Date")))
I receive the error Message:
"Cannot convert column into bool: please use '&' for 'and', '|' for 'or', " ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.