I'm new to the PySpark environment and came across an error while trying to encrypt data in an RDD with the cryptography module. Here's the code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('encrypt').getOrCreate()
df = spark.read.csv('test.csv', inferSchema = True, header = True)
df.show()
df.printSchema()
from cryptography.fernet import Fernet
key = Fernet.generate_key()
f = Fernet(key)
dfRDD = df.rdd
print(dfRDD)
mappedRDD = dfRDD.map(lambda value: (value[0], str(f.encrypt(str.encode(value[1]))), value[2] * 100))
data = mappedRDD.toDF()
data.show()
Everything works fine of course until I try mapping the value[1]
with str(f.encrypt(str.encode(value[1])))
. I receive the following error:
PicklingError: Could not serialize object: TypeError: can't pickle CompiledFFI objects
I have not seen too many resources referring to this error and wanted to see if anyone else has encountered it (or if via PySpark you have a recommended approach to column encryption).