Sample data:
library(dplyr)
id <- rep(LETTERS[1:5], each = 10)
x <- round(runif(50, -500, 200), digits = 0)
y <- round(runif(50, -700, 700), digits = 0)
z <- round(runif(50, 250, 300), digits = 0)
df.1 <- data.frame(id = id, x = x, y = y, z = z)
> summary(df.1)
id x y z
A:10 Min. :-497.0 Min. :-665.00 Min. :251.0
B:10 1st Qu.:-283.2 1st Qu.:-349.50 1st Qu.:261.2
C:10 Median :-128.0 Median : -33.50 Median :274.5
D:10 Mean :-145.4 Mean : -39.58 Mean :275.3
E:10 3rd Qu.: -15.0 3rd Qu.: 293.25 3rd Qu.:288.0
Max. : 171.0 Max. : 696.00 Max. :299.0
What I'm trying to achieve are:
- put each id into its own dataframe
- create a new column called "direction" which would be a response to conditions below
a - identify the column with the widest range among x, y, z b - within the identified column, calculate direction by whether the next row value is bigger than the current row value - TRUE and FALSE return
i.e where y has the maximum range
id x y z direction
1 A -320 31 251 TRUE
2 A -199 -530 276 FALSE
3 A -228 390 264 TRUE
4 A -158 363 268 TRUE
5 A -308 150 267 FALSE
6 A -47 345 261 NA
It is really important that the direction is calculated on the column that has the maximum range. In the sample data, it's likely column y is always the one with the maximum range, but in my real data it could be any column..
I imagine it'd involve mutate and ifelse?! but not sure how I'd go about it... I normally would use extensive for loop and only started using dplyr last week or two.. Trying not to fall back on to messy for loop and severely nested codes..
Appreciate your help very much! Thanks!
for (i in 1:length(unique(id)) {
x <-
df.1 %>%
filter(id == unique(id)[i] %>%
mutate(direction = ifelse())
assign(unique(id)[i], x)
}
df_list = split(df, df$id)
. – Gregor Thomas