I'm trying to write quite a complicated Python program, but it's mostly done. I am having trouble with just one tiny little detail.
The problematic code section is this:
newData = kmeans.sampleNewData(200, means, covariances, priors)
newData = newData.astype(str)
...loops and logic and stuff...
newData[i, j] = columnsList[j][(indexList[j]).index(closestFit)]
Basically, newData is a numpy matrix size 200 by 4, filled with numbers of type float. I then convert them to strings use the astype method.
I then try to put this columnsList[j][(indexList[j]).index(closestFit)] which is some string, into an entry of newData.
The problem is that columnsList[j][(indexList[j]).index(closestFit)] is not necessarily English. It could for example be hebrew. In which case - I get the error
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
It is worth noting that I have written # -*- coding: utf-8 -*- so we are encoding in utf-8, and when I print columnsList[j][(indexList[j]).index(closestFit)] it indeed prints the correct value. So we can print it. But for some reason I can't put it into the newData matrix.
astype(str)might be creating byte string array. What is thedtype. That's ASCII. You may need to specify unicode dtype to hold these extra characters. - hpaulj