1
votes

I'm looking to conduct a linear regression in R to model the effects of 5 independent variables on 376 columns of data.

I have a large matrix (541 rows and 402 columns) named 'dd' and I want to only plug in certain columns from the matrix as IVs and DVs in the regression. From dd, I want 376 specific columns to form my DVs and 5 columns to form my IVs. I have used the names of each column (for example 'column_42') as indices, separately for IV and DV:

IVind=paste0('column_',c(4,14,15,24,43)) #index for IV

DVind=paste0('column_',c(10:13, 17:18, 26, 28, 49:54, 58, 60, 1001:1180, 2001:2180)) #index for DV

IV <-(dd[,IVind]) #save independent variables in 'IV'
DV <-(dd[,DVind]) #save independent variables in 'DV'

I have tried plugging IV and DV into a linear regression like so:

try <- lm(DV~IV)

but have received the following error: Error in [[<-.data.frame(*tmp*, i, value = c(2113L, 2031L, 1971L, : replacement has 203040 rows, data has 540

Is there anyway I can get around this error? I understand that it may be due to my IV and DV being saved in separate matrices?

I've tried to index dd directly in the regression function:

lm(dd[,DVind]~dd[,IVind])

only to receive the same error.

Any help is highly appreciated, thank you!

1
Is any of your values nested? can you do str(dd[,DVind]) ?StupidWolf
Doesn't seem to be? The output of str(dd[,DVind]) is chr [1:541, 1:376] NA "10443.250768" "10433.625258" ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:376] "NZMean_10" "NZMean_11" "NZMean_12" "NZMean_13" ...Aleya Marzuki

1 Answers

0
votes

For multivariate response you need to provide a matrix:

dd = data.frame(matrix(rnorm(2180*1000),ncol=2180))
colnames(dd) = paste0("column_",1:ncol(dd))

IVind=paste0('column_',c(4,14,15,24,43)) #index for IV
DVind=paste0('column_',c(10:13, 17:18, 26, 28, 49:54, 58, 60, 1001:1180, 2001:2180))

IV <-as.matrix(dd[,IVind]) #save independent variables in 'IV'
DV <-as.matrix(dd[,DVind]) #save independent variables in 'DV'

fit= lm(IV~DV)

If you want nicer looking coefficients, we specify the dependent variables on the left hand side using cbind separated by ",". Then we subset the data to only your interested dependent / independent variable:

LHS = paste("cbind(",paste(IVind,collapse=","),")")
print(LHS)
"cbind( column_4,column_14,column_15,column_24,column_43 )"

FORM = as.formula(paste(LHS,"~."))
print(FORM)
"cbind(column_4, column_14, column_15, column_24, column_43) ~ ."

fit = lm(FORM,data=dd[,c(IVind,DVind)])

head(fit$coefficients)
               column_4    column_14    column_15    column_24
(Intercept)  0.04386386 -0.044541800  0.005439126  0.033074816
column_10   -0.01849133  0.041040752  0.015390150  0.019472339
column_11   -0.05201253 -0.004719325  0.052012943 -0.027946384
column_12   -0.01194646 -0.063251091  0.017792048  0.004709211
column_13    0.15284270 -0.097150447 -0.038294054  0.003509769
column_17   -0.03693076  0.025828749 -0.039618893  0.023351389
               column_43
(Intercept)  0.003076990
column_10   -0.092318249
column_11   -0.049421542
column_12   -0.065078169
column_13   -0.013206731
column_17    0.006969634