Problem: Reducing a data set used in regression to several smaller sets where the variables are dependent within but independent between matrices. I have a large data set with 1000 dummy variables, but only a few 'positive' for each row, and memory limits my ability to build different models. So i'm trying to split the data set into sets where there ar linear dependency between the variables in the set, but no dependency with the other sets.
Small example:
M1 <- c(1L,0L,0L,0L,1L,1L,0L,0L,0L,0L,1L,1L,0L,0L,1L,0L)
dim(M1) <- c(4,4)
Here M1 can be split into the two 'independent matrices:
M2 <- c(1,0,1,1)
M3 <- c(1,1,1,0)
But changing M1 to
M1[3,2] <- 1
Would make all row dependent and so no split is possible.
Ideally what I would like is a vector of length (nr of rows) specifying which subset a row belongs to, so that regressions could be applied on each subset. So a result in the original case would be a vector:
R <- c(1,1,2,2)
The problem is related to the rank but all answers that i have been able to find related to reducing the dim of the matrix and not sub setting the matrix into independent parts.
model.matrix()
? – abhiieor