I'm working on a data.frame with about 700 000 rows. It's containing the ids of statusupdates and corresponding usernames from twitter. I just want to know how many different users are in there and how many times they've tweeted. So I thought this was a very simple task using tables. But know I noticed that I'm getting different results.
recently I did it converting the column to character like this
>freqs <- as.data.frame(table(as.character(w_dup$from_user))
>nrow(freqs)
[1] 239678
2 months ago I did it like that
>freqs <- as.data.frame(table(w_dup$from_user)
>nrow(freqs)
[1] 253594
I noticed that this way the data frame contains usernames with a Frequency 0. How can that be? If the username is in the dataset it must occur at least one time.
?table didn't help me. Neither was I able to reproduce this issue on smaller datasets.
What I'm doing wrong. Or am I missunderstanding the use of tables?
table
produces a contingency table,tabular
produces a frequency table. – ThomasH