4
votes

I am currently inserting data in my django models using csv file. Below is a simple save function that am using:

def save(self):
myfile = file.csv
data = csv.reader(myfile, delimiter=',', quotechar='"')
i=0
for row in data:
    if i == 0:
        i = i + 1
        continue    #skipping the header row        

    b=MyModel()
    b.create_from_csv_row(row) # calls a method to save in models

The function is working perfectly with ascii characters. However, if the csv file has some non-ascii characters then, an error is raised: UnicodeDecodeError 'ascii' codec can't decode byte 0x93 in position 1526: ordinal not in range(128)

My question is: How can i remove non-ascii characters before saving my csv file to avoid this error.

Thanks in advance.

3

3 Answers

6
votes

If you really want to strip it, try:

import unicodedata

unicodedata.normalize('NFKD', title).encode('ascii','ignore')

* WARNING THIS WILL MODIFY YOUR DATA * It attempts to find a close match - i.e. ć -> c

Perhaps a better answer is to use unicodecsv instead.

----- EDIT ----- Okay, if you don't care that the data is represented at all, try the following:

# If row references a unicode string
b.create_from_csv_row(row.encode('ascii', 'ignore'))

If row is a collection, not a unicode string, you will need to iterate over the collection to the string level to re-serialize it.

3
votes

If you want to remove non-ascii characters from your data then iterate through your data and keep only the ascii.

for item in data:
     if ord(item) <= 128: # 1 - 128 is ascii
          [append,write,print,whatever]

If you want to convert unicode characters to ascii, then the response above by DivinusVox is accurate.

3
votes

Pandas csv parser (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html) supports different encodings:

import pandas
data = pandas.read_csv(myfile, encoding='utf-8', quotechar='"', delimiter=',')