1
votes

I'm analyzing the use of some specific hashtags of interest in a Twitter dataset. The end goal is to visualize the use of these hashtags over time. The data is organized in a pandas dataframe. Each row holds information about one tweet. One of the columns is called 'text' and this is where all the tweets are, one tweet is one string. The dataset is indexed after time, so what I want to do is to count how many times per day a specific hashtag is used.

So, this is the information about the dataframe

 <class 'pandas.core.frame.DataFrame'>
 DatetimeIndex: 9991 entries, 2018-05-25 15:54:01 to 2018-05-25 14:14:37
 Data columns (total 13 columns):
 Unnamed: 0       9991 non-null int64
 ID               9991 non-null int64
 has_media        2015 non-null object
 is_reply         9991 non-null bool
 is_retweet       9991 non-null bool
 medias           2015 non-null object
 nbr_favorite     9991 non-null int64
 nbr_reply        9991 non-null int64
 nbr_retweet      9991 non-null int64
 text             9991 non-null object
 url              9991 non-null object
 user_id          9991 non-null int64
 usernameTweet    9991 non-null object
 dtypes: bool(2), int64(6), object(5)
 memory usage: 956.2+ KB

And specifically the column 'text'

df['text']

gives the following result:

datetime
2018-05-25 15:54:01    Høj stemmeprocent ved #ok18  urafstemning. Dej...
2018-05-25 16:40:24    Man kan tvivle på at de gode medarbejdere fra ...
2018-05-25 18:19:25    Nej @gitteredder  teknikken drillede hos DLF. ...
2018-05-25 22:32:30    Rekordstor stemmeprocent hos @bibliotekarerne ...
2018-05-26 08:42:44    # ok18  stemte ja igår. Ja fordi folkeskolen i...
2018-05-26 10:21:20    Afstemningen er skudt i gang om #OK18  - 26 ti...
2018-05-26 12:12:28    Her godt et døgn efter afstemnings begyndelse ...
2018-05-26 14:14:35    Ikke vær bekymret for debatten - men vær bekym...
....

So how can I count how many times per day the hashtag #ok18 for instance were used and make a line graph out of that with every day on the x-axis and use of hashtags on the y-axis?

1
Could you provide a link to some sample data? - Szymon Maszke

1 Answers

1
votes

This will get you to a dataframe with all of the occurrences of #ok18 in it:

df.loc[df['text'].str.lower().str.contains('#ok18') == True]

From there, counting is very easy, but if you're going to visualize it you might not want to count immediately; you're going to want to plot the occurrences of the hashtag against a time axis.