Removing non-ascii characters in a csv file

Question

I am currently inserting data in my django models using csv file. Below is a simple save function that am using:

def save(self):
myfile = file.csv
data = csv.reader(myfile, delimiter=',', quotechar='"')
i=0
for row in data:
    if i == 0:
        i = i + 1
        continue    #skipping the header row        

    b=MyModel()
    b.create_from_csv_row(row) # calls a method to save in models

The function is working perfectly with ascii characters. However, if the csv file has some non-ascii characters then, an error is raised: UnicodeDecodeError 'ascii' codec can't decode byte 0x93 in position 1526: ordinal not in range(128)

My question is: How can i remove non-ascii characters before saving my csv file to avoid this error.

Thanks in advance.

DivinusVox DivinusVox · Accepted Answer · 2013-08-29T22:57:42

If you really want to strip it, try:

import unicodedata

unicodedata.normalize('NFKD', title).encode('ascii','ignore')

* WARNING THIS WILL MODIFY YOUR DATA * It attempts to find a close match - i.e. ć -> c

Perhaps a better answer is to use unicodecsv instead.

----- EDIT ----- Okay, if you don't care that the data is represented at all, try the following:

# If row references a unicode string
b.create_from_csv_row(row.encode('ascii', 'ignore'))

If row is a collection, not a unicode string, you will need to iterate over the collection to the string level to re-serialize it.

Removing non-ascii characters in a csv file

3 Answers