I have two large sparse matrices (say A and B). I want to replace non zero elements to zero in A based on B matrix which contains ranks of all the individual elements ranked across every column. My output matrix should contain top n and bottom n ranked elements from A matrix and all the other non zero values should be made equal to zero.
Below is my approach. I am using loops in the function GetTopNBottomN, I am wondering if it could be optimized since it takes ages when the matrices get large.
#input matrix
TestMatrix = Matrix(c(0.80,0.9,0.6,0,0,0.3,0.5,
0,0,0.3,0,0,0,0,
0.4,0.5,0.6,0,0,0.1,0,
0,0,0,0,0,0,0,
0.3,0.4,0.5,0.2,0.1,0.7,0.8,
0.6,0.7,0.5,0.8,0,0,0),7,sparse = TRUE)
#function to genrate ranks across all the columns for the input matrix
GenerateRankMatrix <- function(aMatrix){ ## Function Begins
n <- diff(aMatrix@p) ## number of non-zeros per column
lst <- split(aMatrix@x, rep.int(1:ncol(aMatrix), n)) ## columns to list
r <- unlist(lapply(lapply(lst,function(x) x * -1), rank)) ## column-wise ranking and result collapsing
RankMatrix <- aMatrix ## copy sparse matrix
RankMatrix@x <- r ## replace non-zero elements with rank
return(RankMatrix)
} # Function Ends
## Function to retain Top N and Bottom N records
GetTopNBottomN <- function(aMatrix,rMatrix){
#aMatrix = original SparseMatrix, rMatrix = RankMatrix
n = 2 ## Top 2 and Bottom 2 Elements across all columns
for(j in 1:ncol(aMatrix)){
MaxValue = max(rMatrix[,j])
if(MaxValue <= 2 * n) next ##Ignore the column if there are less than or equal to 2*n nonzero values
for (i in 1: nrow(aMatrix)){
if(rMatrix[i,j] >n & rMatrix[i,j] <= MaxValue-n){ #IF Cond
aMatrix[i,j] = 0
} #IF ends
}
}
return(aMatrix)
}
#Output
RankMatrix = GenerateRankMatrix(TestMatrix) #Genrate Rank Matrix
#Output Matrix
GetTopNBottomN(TestMatrix,RankMatrix)