R calculate new column from second data frame

Question

I have 2 data frames that I'm trying to keep separate, but would like to calculate a new column in df1 based on the means of rows in df2:

df1:

Name <- c("t1", "t2", "t3")    
Diam <- c(95, 104, 72)    
Rad <- c(47.5, 52, 36)    
df1 <- data.frame(Name, Diam, Rad)

df2:

Name <- c("t1A", "t1B", "t2A", "t2B", "t2C", "t3B")    
Width <- c(4000, 3200, 2300, 2700, 2100, 3500)    
df2 <- data.frame(Name, Width)

I'd like to add a fourth column to df1 that calculates the means of "Width" in df2 when the first 2 characters of "Name" are identical. My intended output looks like:

df3:

Name <- c("t1", "t2", "t3")    
Diam <- c(95, 104, 72)     
Rad <- c(47.5, 52, 36)    
WidthMean <- c(3600, 2366.667, 3500)    
df3 <- data.frame(Name, Diam, Rad, WidthMean)

I can do it by splitting the "Name" column in df2, then finding the mean according to the new split name, then adding the mean values from df2 as a new column to df1. However, I have to do this to approximately 75 data.frames and was hoping to find a faster way to process the data (and with fewer lines of code!).

Current code:

df2$"NameID" <- df2$Name    
df2$NameID <- gsub(".?$", "", df2$NameID)    
df2out <- aggregate(Width ~ NameID, data = df2, FUN = mean)    
df1$"WidthMean" <- df2out$Width    
df1

When you say 75 data frames, do you mean 75 that are like df1 or like df2 in your example? — TARehman
75 of each. Basically I'll be re-running the code 75 times for different pairs of data.frames. — KKL234
Then you'd want a function that includes the looping portion, probably, and then you'd loop over a list of those two data frames. I can try and edit my answer to reflect it. — TARehman
Do you have a corresponding list of which data frames match to which data frames? — maloneypatr
@maloneypatr; I don't have a list, but each .csv file has a unique identifier that matches to the corresponding .csv. For example, 4 of my files are named "C8001dbh", "C8001bai", "E13002dbh", and "E13002bai" and I need to match the corresponding dbh and bai files. — KKL234

janos janos · Accepted Answer · 2014-07-16T18:31:58

Here's a solution to get df3 using sapply:

df3 <- df1
df3$WidthMean <- sapply(df1$Name, function(name) {
  pattern = paste('^', name, sep='')
  mean(subset(df2, grepl(pattern, Name))$Width)
})
df3

Outputs:

  Name Diam  Rad WidthMean
1   t1   95 47.5  3600.000
2   t2  104 52.0  2366.667
3   t3   72 36.0  3500.000

If you have 75 dfs that look like df1, you can use lapply to do them all:

lapply(list(df1, df1), function(df) {
  df_out <- df
  df_out$WidthMean <- sapply(df$Name, function(name) {
    pattern = paste('^', name, sep='')
    mean(subset(df2, grepl(pattern, Name))$Width)
  })
  df_out
})

Outputs:

[[1]]
  Name Diam  Rad WidthMean
1   t1   95 47.5  3600.000
2   t2  104 52.0  2366.667
3   t3   72 36.0  3500.000

[[2]]
  Name Diam  Rad WidthMean
1   t1   95 47.5  3600.000
2   t2  104 52.0  2366.667
3   t3   72 36.0  3500.000

R calculate new column from second data frame

3 Answers