2
votes

I want to create a regression model from a vector (IC50) against a number of different molecular descriptors (A,B,C,D etc).

I want to use,

model <- lm (IC50 ~ A + B + C + D)

the molecular descriptors are found in the columns of a data.frame. I would like to use a function that takes the IC50 vector and the appropriately sub-setted data.frame as inputs.

My problem is that I can't convert the columns to formula for the model.

Can anyone help.

Sample data and feeble attempt,

IC50  <- c(0.1,0.2,0.55,0.63,0.005)

descs  <- data.frame(A=c(0.002,0.2,0.654,0.851,0.654),
                     B=c(56,25,89,55,60),
                     C=c(0.005,0.006,0.004,0.009,0.007),
                     D=c(189,202,199,175,220))

model  <- function(x=IC50,y=descs) {
  a  <- lm(x ~ y)
  return(a)
}

I went down the substitute/deparse route but this didn't import the data.

1
a <- lm(x ~ y[,1]+y[,2]+y[,3]+y[,4])?? Can you make your question more clear?Ven Yao
Aside from your use of x and y being confusing because it is the reverse of the normal usage, why not simply add the single vector as a column to the existing data frame with a known column name and then your formula is just known_column_name ~ .joran
@VenYao can you do that for sixty-three columns, please?DarrenRhodes
@joran I didn't know about the dot formula. Thanks.DarrenRhodes

1 Answers

4
votes

You can do simply

model  <- function(x = IC50, y = descs) 
  lm(x ~ ., data = y)