0
votes

I have the following code, it works with python 3.5 , but when I tried to run it with python 2.7, it showed an error.

this is the code:

def load_data_and_labels():
    # Load data from files
    with codecs.open('./data/train.txt',encoding="utf8") as inf:
        reader = csv.reader(inf, delimiter='\t',quoting=csv.QUOTE_NONE)
        col = list(zip(*reader)) # <--- The error appeared here.
        x_text = col[2]
        colY = col[1]
    # Split by words
    x_text = [clean_str(sent) for sent in x_text]
    x_text = [s.split(" ") for s in x_text]
    # Generate labels
    y = [[1,0] if int(x)==1 else [0,1] for x in colY]
    y = np.array(y)
    return [x_text, y]

UnicodeEncodeError: 'ascii' codec can't encode character u'\ufe0f' in position 120: ordinal not in range(128)

col = list(zip(*reader))

This is the text file structure "sample":

3   1   Hey there! Nice to see you Minnesota/ND Winter Weather 
4   0   3 episodes left I'm dying over here
5   1   "I can't breathe!" was chosen as the most notable quote of the year 
2

2 Answers

1
votes

If you take the time and do a simple search for the differences between Python 2 and Python 3, you'll see that one of the biggest changes is in unicode support, because in Python 3, strings are unicode by default.

So, if you have a file that contains unicode characters and you try to get a representation of them in Python 2 without any special care, it will fail because the default is to be converted to standard ascii.

If you combine this with the fact that (quoting the documentation of the CSV file reader module https://docs.python.org/2/library/csv.html ) The csv module doesn’t directly support reading and writing Unicode, you'll see why this can't work.

You can have a look here: https://wiki.python.org/moin/Python2orPython3

0
votes

That version of the csv module does not support Unicode input, see the note here:

https://docs.python.org/2/library/csv.html