0
votes

I am fairly new to Spark . starting my first project . Need to analyze the twitter data for sentiment analysis . I need to use TextBlob library in Python for doing it . I am able to get the twitter data and have the Dstream created after all necessary transformation . I am facing challange as how to make the dstream data available ( which is having the tweet text) to the TextBlob for analysis , as TextBlob accepts only string value . How can i get the dstream value into TextBlob for sentiment analysis. Any pointers is highly appreciated .

Thanks , Kary

1
Can you post what have you tried so far? we can't code from scratch for you. - Spoody
Did you get a chance to look at this link: edureka.co/blog/spark-streaming ? Its pretty self explanatory. I hope you will be able to google out the python counterparts of scala code given in the post.... Happy Sparking! - Abhay Dandekar

1 Answers

0
votes

I recently tried using textblob for streaming dataset and wrote a small function to convert tweets to text and apply Textblob. so you may write somethin glike this :

def getSentiment(self, text):
        sentiment = TextBlob(text).sentiment.polarity
         if sentiment > float(benchmark):
            return float(positive)
        elif sentiment < float(benchmark):
            return float(negative)
        else:
            return float(noresponse)

and then write UDF that accepts the text

sentiment_score_udf = F.udf(lambda x: obj.getSentiment(x), FloatType())

here F is pyspark sql functions and then you may use beow to calculate the sentiment score

sentiment_score_udf(col("value")).alias("sentiment_score")

hope this helps