I'm looking to speed up a loop that assigns a rating to rows based on different conditions. The are six different ratings to be assigned (0 to 5) based on different conditions. I tried to do this using a for loop with if statements for each condition, but with millions of rows to go through this really is no option. I do not even know how long it took to finish. It had been running for hours though before I manually stopped it.
The rules are:
Rating 0: if df$Bounce >= 75 and df$time<10 and df$view<1
Rating 1: if df$Bounce >= 75 or df$Assist<1
Rating 2: if df$Bounce < 75 and df$Assist<2
Rating 3: if df$Bounce < 75 and df$Assist<3
Rating 4: if df$Bounce < 75 and df$Assist<=4
Rating 5: if df$Bounce < 75 and df$Assist>=5
I've got more of these 'slow' statements in my script, so the answer to this question will speed up a lot of processes!
A small example dataset
tc <- textConnection('
belongID uniqID Bounce Assist time view
1 101 90 10 7 0
1 102 75 0 8 10
2 103 10 30 4 2
2 104 50 3 1 10
2 105 74 2 5 4
3 106 5 1 2 8 ')
df <- read.table(tc,header=TRUE)
The outcome should result in the same dataset with a new column Rating and the ratings according to the rules:
belongID uniqID Bounce Assist time view Rating
1 101 90 10 7 0 0
1 102 75 0 8 10 1
2 103 10 30 4 2 5
2 104 50 3 1 10 4
2 105 74 2 5 4 3
3 106 5 1 2 8 2
Edit: changed rating 1 condition!
ifstatement for each case, e.g.if (df$Bounce >= 75 && df$time < 10 && df$view < 1) df$rating = 0; else if ...or did you make something like a decision tree:if (df$Bounce >= 75) { if (df$time < 10 && df$view < 1) df$rating = 0; else if (df$Assist < 1) df$rating = 1; } else { ... }? - Hristo Iliev