0
votes

I got this data frame (tweets_platform) from Twitter data by TweetteR

id  source  created
7,71627E+17 iPhone  02/09/2016 08:34
7,71627E+17 iPhone  02/09/2016 08:34
7,71627E+17 Android 02/09/2016 08:34
7,71627E+17 Android 02/09/2016 08:34
7,71627E+17 iPhone  02/09/2016 08:34
7,71627E+17 iPhone  02/09/2016 08:34

And I'd like to get this line Chartin order to highlight in which part of the day the tweets occur

library(lubridate)
library(scales)

tweets_platform %>%
  count(source, hour = hour(with_tz(created, "EST"))) %>%
  mutate(percent = n / sum(n)) %>%
  ggplot(aes(hour, percent, color = source)) +
  geom_line() +
  scale_y_continuous(labels = percent_format()) +
  labs(x = "Hour of day (EST)",
       y = "% of tweets",
       color = "")

However when I run the code the console returns this error:

geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?

and it plots only the graph without the lines. How can I fix the problem?

1
I ran your code with the provided data. You have two rows after you processed the data. For Android and iPhone, you have only one data point each, yet you are asking ggplot to draw lines. At least you need two data points, don't you?jazzurro
@jazzurro I'd like to get a line chart like this link with two lines (one for iphone, one for Android). In the y-axis I'd like to put the N. of tweets in the x-axis I'd like to put the hours of day (according the created field)Andrea Angeli
Would you be able to provide an accessible link? If you can upload your file somewhere like dropbox, I am happy to upload your graphic in your question.jazzurro
@jazzurro thank you for your hepl! here's the code linkAndrea Angeli

1 Answers

0
votes

I created a sample data for you. I think something like the following is what you are after. I do not have your actual data. So I may have some unnecessary steps for you. In case there is no tweet for any hour, I wanted to take care of it. That's why added left_join and the second mutate. You can ignore them, if that is appropriate. Hope this will help you.

mydf %>%
count(source, hour = as.numeric(format(created, "%H"))) %>%
group_by(source) %>%
mutate(percent = n / sum(n) * 100) %>%
left_join(data.frame(source = rep(c("iPhone", "Android"), each = 24),
                     hour = rep(0:23, times = 2),
                     stringsAsFactors = TRUE), .) %>%
mutate(percent = recode(percent, .missing = 0)) -> temp


ggplot(data = temp, aes(x = hour, y = percent, group = source, color = source)) +
geom_line() +
scale_x_continuous(limits = c(0, 23), breaks = 0:23) +
scale_y_continuous(limits = c(0, 100))

enter image description here

DATA

mydf <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), source = c("iPhone", 
"Android", "iPhone", "iPhone", "Android", "iPhone", "Android", 
"Android", "iPhone", "Android", "iPhone", "Android", "iPhone", 
"iPhone", "Android", "iPhone", "Android", "Android", "iPhone", 
"Android"), created = structure(c(1472772840, 1472780040, 1472772840, 
1472780040, 1472794440, 1472787240, 1472769240, 1472774160, 1472780040, 
1472805240, 1472808840, 1472812440, 1472808840, 1472816040, 1472819640, 
1472819640, 1472812440, 1472813760, 1472812440, 1472813820), class = c("POSIXct", 
"POSIXt"), tzone = "")), .Names = c("id", "source", "created"
), row.names = c(NA, -20L), class = "data.frame")