First I called sha2
function from pyspark.sql.functions
incorrectly, passing it a column of DoubleType and got the following error:
cannot resolve 'sha2(`metric`, 256)' due to data type mismatch: argument 1 requires binary type, however, '`metric`' is of double type
Then I tried to first cast the columns to a StringType but still getting the same error. I probably miss something on how column transformations are processed by Spark.
I've noticed that when I just call a df.withColumn(col_name, F.lit(df[col_name].cast(StringType())))
without calling .withColumn(col_name, F.sha2(df[col_name], 256))
the columns type is changed to StringType.
How should I apply a transformation correctly in this case?
def parse_to_sha2(df: DataFrame, cols: list):
for col_name in cols:
df = df.withColumn(col_name, F.lit(df[col_name].cast(StringType()))) \
.withColumn(col_name, F.sha2(df[col_name], 256))
return df