0
votes

I try to analyse some tweets I got from tweeter, but It seems I have a probleme of encoding, if you have any idea..

import json

#Next we will read the data in into an array that we call tweets.
tweets_data_path = 'C:/Python34/TESTS/twitter_data.txt'

tweets_data = []
tweets_file = open(tweets_data_path, "r")


for line in tweets_file:
    try:
        tweet = json.loads(line)
        tweets_data.append(tweet)
    except:
        continue

print(len(tweets_data))#412 tweets
print(tweet)

I got the mistake : File "C:\Python34\lib\encodings\cp850.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] unicodeEncodeError: 'charpmap' codec can't encode character '\u2026' in position 1345: character maps to undefined

At work, I didn't get the error, but I have python 3.3, does it make a difference, do you think ?

-----EDIT

The comment from @MarkRamson answered my question

1
Can you provide the line where the UnicodeEncodeError happen? How did you write theses tweets? Did you encode them in UTF-8? - Raito
I will look tonight but I got the tweet from the twitter API and checked that the encoding of the file was UTF-8 - Stéphanie C
The problem is that the console you're running on is not capable of handling the character you're trying to print: See stackoverflow.com/questions/3597480/… for some hints. - Mark Ransom
This is totally that ! Thank you so much - Stéphanie C

1 Answers

2
votes

You should specify the encoding when opening the file:

tweets_file = open(tweets_data_path, "r", encoding="utf-8-sig")