I have data in mysql table with charset- utf-8. I have one pyspark script which loads mysql data and write a parquet file in s3 bucket. While fetching data from mysql i am getting data in below Format:
'الشرقية'
Then i converted it to utf-8 encoding i got below unicode string:
'\xc3\x98\xc2\xa7\xc3\x99\xe2\x80\x9e\xc3\x98\xc2\xb4\xc3\x98\xc2\xb1\xc3\x99\xe2\x80\x9a\xc3\x99\xc5\xa0\xc3\x98\xc2\xa9'
After that i am decoded it to mac_arabic encoding then i am getting below text:
'أ»آ'أôقÄûأ»آ٤أ»آ١أôقÄöأôإ أ»آ)'
Is there a way to generate arabic text from any one these string.
below is the code
sqlContext = SQLContext(sc)
df = sqlContext.read.format("jdbc").options(
url="jdbc:mysql://localhost/db_name",
driver="com.mysql.jdbc.Driver",
dbtable="table",
user="root",
password="root"
).load()
df.show()
For columns in table below config is set: CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL
for database below config is set: ENGINE=InnoDB AUTO_INCREMENT=42627 DEFAULT CHARSET=latin1
Thanks in advance.
sqlContext? How are you printing? Which client, etc. btw, your first string is gibberish, no encoding, decoding will convert it back into arabic. - mehdix