1
votes

Working on a project to sentiment analyse stocks using nltk. I've searched through GH and found nothing similar for sentimaent_analyser or popularity_scores calls.

I also looked at Python 3.4 - 'bytes' object has no attribute 'encode' and it is not a duplicate as I'm not calling bcrypt.gensalt().encode('utf-8'). Though it does hint a the issue of something being the wrong type.

Can anyone help in resolving this error?

I get the error:

/lib/python3.5/site-packages/nltk/sentiment/vader.py in init(self, text) 154 def init(self, text): 155 if not isinstance(text, str): --> 156 text = str(text.encode('utf-8')) 157 self.text = text 158 self.words_and_emoticons = self._words_and_emoticons()

AttributeError: 'bytes' object has no attribute 'encode'

The dataframe df_stocks.head(5) is :

            prices  articles
2007-01-01  12469   What Sticks from '06. Somalia Orders Islamist...
2007-01-02  12472   Heart Health: Vitamin Does Not Prevent Death ...
2007-01-03  12474   Google Answer to Filling Jobs Is an Algorithm...
2007-01-04  12480   Helping Make the Shift From Combat to Commerc...
2007-01-05  12398   Rise in Ethanol Raises Concerns About Corn as...                

The code is below with the error occuring on the last line:

import numpy as np
import pandas as pd
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import subjectivity
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import *from nltk.sentiment.vader import     SentimentIntensityAnalyzer
import unicodedata
for date, row in df_stocks.T.iteritems():
    sentence = unicodedata.normalize('NFKD', df_stocks.loc[date, 'articles']).encode('ascii','ignore')
    ss = sid.polarity_scores(sentence)

Thanks

1
Seems the df_stocks.loc[date, 'articles'] is not unicode str, what's the df_stocks is?aircraft
I did check that one and I don't see how it is a duplicate of the above as I'm not calling bcrypt.gensalt().encode('utf-8') ... The error is coming from within the NLTK libraryMike
@aircraft yes got it you're correct ... it was type str in python 3 .. so working on mapping it to unicode at the moment ... I've just realised the code is a port from python 2 which may have caused this errorMike

1 Answers

1
votes

From the unicodedata.normalize() docs, the method is convert a UNICODE string into a common format string.

import unicodedata

print(unicodedata.normalize('NFKD', u'abcdあäasc').encode('ascii', 'ignore'))

It will get :

b'abcdaasc'

So, the issue is here: df_stocks.loc[date, 'articles'] is not a UNICODE string.