1
votes

I'm still a bit of a novice with R and I'm trying to create a column that has a value based on a condition across the other rows. Basically, I want to create a 4th column (in my dummy data below) that has the year (row heading) of the max value.

country = c("USA", "Canada", "UK", "France", "Mexico")
region = c("North America", "North America", "Europe", "Europe", "North America")  
x2000 = c(50, 100, 75, 60, 25) 
x2001 = c(200, 50, 150, 125, 60)
x2002 = c(125,75,60, 75, 25)
df = data.frame(country, region, x2000, x2001, x2002) 

   country    region        x2000  x2001  x2002
1     USA  North America      50    200    125
2  Canada  North America     100     50     75
3      UK     Europe          75    150     60
4  France     Europe          60    125     75
5  Mexico  North America      25     60     25

So, I want it to look like this:

country = c("USA", "Canada", "UK", "France", "Mexico")
region = c("North America", "North America", "Europe", "Europe", "North America")  
x2000 = c(50, 100, 75, 60, 25) 
x2001 = c(200, 50, 150, 125, 60)
x2002 = c(125,75,60, 75, 25)
Max_year = (2001, 2000, 2001, 2001, 2001)
df = data.frame(country, region, x2000, x2001, x2002, Max) 


   country    region        x2000  x2001  x2002  Max_Year
1     USA  North America      50    200    125    2001
2  Canada  North America     100     50     75    2000
3      UK     Europe          75    150     60    2001
4  France     Europe          60    125     75    2001
5  Mexico  North America      25     60     25    2001

I found this which seems so far to be the most similar to what I'm trying to do ({if...else..} statement after group_by in dplyr chain). But I want it to return a value based on the max of all potential values.

Of course, my actual data is much much larger, and I need it to group the data by two variables (region here is simplified but I need to keep it in my dataset) and then apply the condition across the data within each "group"

Here's what I was trying to work out based on the example question I had found.

Data_Tidy$Max_Year <- Data_Tidy %>%
group_by(Data_Tidy$Country, Data_Tidy$area)%>%
do(model=if(Data_Tidy$Landing = pmax())

I can either have the years as column headings (as above) or as a tidy long data.frame, and then collapse back to a wide frame.

     country     region      Year   Value
 1      USA   North America  2000    50
 2   Canada   North America  2000   100
 3       UK       Europe     2000    75
 4   France       Europe     2000    60
 5   Mexico   North America  2000    25
 6      USA   North America  2001   200
 7   Canada   North America  2001    50
 8       UK       Europe     2001   150
 9   France       Europe     2001   125
10  Mexico    North America  2001    60
11     USA    North America  2002   125
12  Canada    North America  2002    75
13      UK       Europe      2002    60
14  France       Europe      2002    75
15  Mexico    North America  2002    25

I hope this is clear. Any advice would be greatly appreciated!!

2
Your second approach is spot-on: transform into tidy data, get the maximum value and put it back into wide format. Did you manage to do that?Konrad Rudolph
You know, I think I kept making it too complicated in my mind using the long data.frame. But now I can see how I could have done it that way easily, thanks. But Felasa's answer worked perfectly for the wide data which is where I need the column. Thank you though!Abby_studies_fish

2 Answers

1
votes

You can use apply rowwise to get the indices for the max columns.

df$max <- names(df)[apply(df, 1, function(x) which.max(x[3:6]))+2] 
df
country        region x2000 x2001 x2002   max
1     USA North America    50   200   125 x2001
2  Canada North America   100    50    75 x2000
3      UK        Europe    75   150    60 x2001
4  France        Europe    60   125    75 x2001
5  Mexico North America    25    60    25 x2001

You can easily get rid of the 'x' before the year. Or use a vector c(2000, 2001, 2002) instead of the data.frame names.

0
votes
 transform(df,max_year=as.numeric(sub("\\D","",names(df[-(1:2)])[max.col(df[-(1:2)])])))
  country        region x2000 x2001 x2002 max_year
1     USA North America    50   200   125     2001
2  Canada North America   100    50    75     2000
3      UK        Europe    75   150    60     2001
4  France        Europe    60   125    75     2001
5  Mexico North America    25    60    25     2001