0
votes

I'm looking to speed up a loop that assigns a rating to rows based on different conditions. The are six different ratings to be assigned (0 to 5) based on different conditions. I tried to do this using a for loop with if statements for each condition, but with millions of rows to go through this really is no option. I do not even know how long it took to finish. It had been running for hours though before I manually stopped it.

The rules are:

Rating 0: if df$Bounce >= 75 and df$time<10 and df$view<1
Rating 1: if df$Bounce >= 75 or df$Assist<1
Rating 2: if df$Bounce < 75 and df$Assist<2
Rating 3: if df$Bounce < 75 and df$Assist<3
Rating 4: if df$Bounce < 75 and df$Assist<=4
Rating 5: if df$Bounce < 75 and df$Assist>=5

I've got more of these 'slow' statements in my script, so the answer to this question will speed up a lot of processes!

A small example dataset

tc <- textConnection('
belongID   uniqID   Bounce     Assist   time   view    
   1           101     90       10       7       0      
   1           102     75        0       8      10
   2           103     10       30       4       2
   2           104     50        3       1      10
   2           105     74        2       5       4
   3           106      5        1       2       8  ')

df <- read.table(tc,header=TRUE)

The outcome should result in the same dataset with a new column Rating and the ratings according to the rules:

belongID   uniqID   Bounce     Assist   time   view     Rating    
   1           101     90       10       7       0       0
   1           102     75        0       8      10       1
   2           103     10       30       4       2       5
   2           104     50        3       1      10       4
   2           105     74        2       5       4       3
   3           106      5        1       2       8       2

Edit: changed rating 1 condition!

3
Did you implement a separate if statement for each case, e.g. if (df$Bounce >= 75 && df$time < 10 && df$view < 1) df$rating = 0; else if ... or did you make something like a decision tree: if (df$Bounce >= 75) { if (df$time < 10 && df$view < 1) df$rating = 0; else if (df$Assist < 1) df$rating = 1; } else { ... }? - Hristo Iliev
I did the first! - Also edited the rating 1 condition - Max van der Heijden

3 Answers

3
votes

Here is a simple algorithm in a function that does what you ask. Since this contains only three rules it should be really fast. (However, I make the implicit assumption that Assist is always an integer.)

rating <- function(Bounce, Assist, time, view){
  x <- pmin(5, Assist + 1)
  x[Bounce >= 75 & time<10 & view<1] <- 0
  x[Bounce >= 75 & Assist < 1] <- 1
  x
}

within(df, rating <- rating(Bounce, Assist, time, view))

  belongID uniqID Bounce Assist time view rating
1        1    101     90     10    7    0      0
2        1    102     75      0    8   10      1
3        2    103     10     30    4    2      5
4        2    104     50      3    1   10      4
5        2    105     74      2    5    4      3
6        3    106      5      1    2    8      2
2
votes

Don't use a loop:

df$rating <- 999

df[df$Bounce >= 75 & df$time < 10 & df$view<1, "rating"] <- 0
df[df$Bounce >= 75 & df$Assist < 1 & df$rating == 999, "rating"] <- 1
df[df$Bounce < 75 & df$Assist < 2 & df$rating == 999, "rating"] <- 2
df[df$Bounce < 75 & df$Assist < 3  & df$rating == 999, "rating"] <- 3
df[df$Bounce < 75 & df$Assist <= 4  & df$rating == 999, "rating"] <- 4
df[df$Bounce < 75 & df$Assist >= 5 & df$rating == 999, "rating"] <- 5

The rating == 999 check is required because your rules are not mutually exclusive. If they should be, there's an error in your logic. Otherwise, this ensures that no rule overrides an earlier rule.

1
votes

try

dumfun<-function(w,x,y,z){
if(w>=75&&x<10&&y<1){return(0)}
if(w>=75&&z<1){return(1)}
if(w<75&&z<2){return(2)}
if(w<75&&z<3){return(3)}
if(w<75&&z<5){return(4)}
if(w<75&&z>5){return(5)}
}

df$Rating<-mapply(dumfun,df$Bounce,df$time,df$view,df$Assist)