1
votes

I have a data frame of integer-count observations listed by date and time interval. I want to find the median of these observations by date using the dplyr package. I've already formatted the date column correctly, and used group_by like so:

data.bydate <- group_by(data.raw, date)

When I use summarise() to find the median of each date group, all I'm getting are a bunch of zeroes. There are NA's in the data, so I've been stripping them with na.rm = TRUE.

data.median <- summarise(data.bydate, median = median(count, na.rm = TRUE)

Is there another way I should be doing this?

2
It is recommended to provide some reproducible data. - cdeterman
Without a reproducible example, we cannot be sure what the problem is. Regarding the code, it looks fine.` If you have only 0's and NAs, then you might get a bunch of zeros. - akrun

2 Answers

4
votes

You can do something like,

data.raw %>% group_by(date) %>% summarise(median = median(count, na.rm = TRUE))
0
votes

It's possible each group has too many zero values. Try to identify number of unique value in each group to check whether the groups have too many zeros in them. The below code could help to see the number of unique values and total values available for count variable in each group.

summarise(data.bydate, unique_code = n_distinct(count), total_count = n(count))