0
votes

I have the following data frame:

    I  UserID | Day_of_week | hour | min | sms
   #1    1           1          0     0      12
   #2    1           1          0     30     20
   #3    1           1          1     0      19
   #4    1           1          1     30     11
   #5    1           1          2     0      12
   #6    1           1          2     30     7
   ...   ...         ...       ...    ...   ....
   #10    1          2          0     0      142
   #11    1          2          0     30     201
   #12    1          2          1     0      129
   #13    1          2          1     30     111
   ...   ...         ...       ...    ...   ....

The column Day_of_week goes from 0 to 6 where

  • (1 = Monday, 2 = Tuesday .... 5 = Friday)
  • 0, 6 are Saturday and Sunday respectively.

The column hour ranges from 0 to 23.

I wanted to create the same date frame where I added days of the week (Monday through Friday) and weekends (Saturday and Sunday) with their respective averages for each hour and minute.

I want something like this: the average of all (Monday, Tuesday, ... Friday) at 0 hours and 0 min was 12.1.

x- represent days of week

y- represent weekends

    I  UserID | Day_of_week | hour | min | sms
   #1    1           x          0     0      12.1
   #2    1           x          0     30     19.1
   #3    1           x          1     0      14
   #4    1           x          1     30     11
   ...   ...         ...       ...    ...   ....
   #10    1          y          0     0      1123
   #11    1          y          0     30     23
   #12    1          y          1     0      45
   #13    1          y          1     30     121
   ...   ...         ...       ...    ...   ....
1

1 Answers

1
votes

Sample data:

# Read and expand the dataset with some weekend days
df = read.table(text='I  UserID  Day_of_week  hour  min  velocity
1    1           1          0     0      12
2    1           1          0     30     20
3    1           1          1     0      19
4    1           1          1     30     11
5    1           1          2     0      12
6    1           1          2     30     7
10    1          2          0     0      142
11    1          2          0     30     201
12    1          2          1     0      129
13    1          2          1     30     111',header=T)
df2 = df %>% mutate(Day_of_week=7)
df = rbind(df,df2)

You could use dplyr and do the following:

library(dplyr)
df %>% mutate(daytype = ifelse(Day_of_week %in% seq(1,5),'weekday','weekend')) %>%
  group_by(UserID,daytype,hour,min) %>% 
  summarize(velocity=mean(velocity))

Alternative with data.table:

library(data.table)
setDT(df)[,daytype := ifelse(Day_of_week %in% seq(1,5),'weekday','weekend')][,.(velocity=mean(velocity)),.(UserID,daytype,hour,min)]

or with base R:

df$daytype = ifelse(df$Day_of_week %in% seq(1,5),'weekday','weekend')
aggregate(velocity~UserID+daytype+hour+min,df,FUN='mean')

Output:

    UserID daytype hour min velocity
 1:      1 weekday    0   0     77.0
 2:      1 weekday    0  30    110.5
 3:      1 weekday    1   0     74.0
 4:      1 weekday    1  30     61.0
 5:      1 weekday    2   0     12.0
 6:      1 weekday    2  30      7.0
 7:      1 weekend    0   0     77.0
 8:      1 weekend    0  30    110.5
 9:      1 weekend    1   0     74.0
10:      1 weekend    1  30     61.0
11:      1 weekend    2   0     12.0
12:      1 weekend    2  30      7.0