0
votes

I'm trying to write quite a complicated Python program, but it's mostly done. I am having trouble with just one tiny little detail.

The problematic code section is this:

newData = kmeans.sampleNewData(200, means, covariances, priors)

newData = newData.astype(str)
...loops and logic and stuff...
newData[i, j] = columnsList[j][(indexList[j]).index(closestFit)]

Basically, newData is a numpy matrix size 200 by 4, filled with numbers of type float. I then convert them to strings use the astype method.

I then try to put this columnsList[j][(indexList[j]).index(closestFit)] which is some string, into an entry of newData.

The problem is that columnsList[j][(indexList[j]).index(closestFit)] is not necessarily English. It could for example be hebrew. In which case - I get the error

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

It is worth noting that I have written # -*- coding: utf-8 -*- so we are encoding in utf-8, and when I print columnsList[j][(indexList[j]).index(closestFit)] it indeed prints the correct value. So we can print it. But for some reason I can't put it into the newData matrix.

1
The astype(str) might be creating byte string array. What is the dtype. That's ASCII. You may need to specify unicode dtype to hold these extra characters. - hpaulj

1 Answers

0
votes

Encode is operation when you're trying to convert from some string type to bytes. It seems, that your columnsList[j][(indexList[j]).index(closestFit)] contains Unicode string, so try

newData[i, j] = columnsList[j][(indexList[j]).index(closestFit)].encode('utf-8')

instead.