I have a requirement where I need to collect some columns onto Spark driver and some columns contain non-ascii characters. But while collecting them its gives error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 187: ordinal not in range(128).
Any idea how maybe I can apply udf to column content while fetching it and then collect it onto driver?
I am using PySpark for this.