Adding values to a matrix using index vectors that include row and column names

Question

Suppose I have a really big matrix of sparse data, but i'm only interested in looking at a sample of it making it even more sparse. Suppose I also have a dataframe of triples including columns for row/column/value of the data (imported from a csv file). I know I can use the sparseMatrix() function of library(Matrix) to create a sparse matrix using

sparseMatrix(i=df$row,j=df$column,x=df$value)

However, because of my values I end up with a sparse matrix that's millions of rows by tens of thousands of columns (most of which are empty because my subset is excluding most of the rows and columns). All of those zero rows and columns end up skewing some of my functions (take clustering for example -- I end up with one cluster that includes the origin when the origin isn't even a valid point). I'd like to perform the same operation, but using i and j as rownames and colnames. I've tried creating a dense vector, sampling down to the max size and adding values using

denseMatrix <- matrix(0,nrows,ncols,dimnames=c(df$row,df$column))
denseMatrix[as.character(df$row),as.character(df$column)]=df$value

(actually I've been setting it equal to 1 because I'm not interested in the value in this case) but I've been finding it fills in the entire matrix because it takes the cross of all the rows and columns rather than just row1*col1, row2*col2... Does anybody know a way to accomplish what I'm trying to do? Alternatively i'd be fine with filling in a sparse matrix and simply having it somehow discard all of the zero rows and columns to compact itself into a denser form (but I'd like to maintain some reference back to the original row and column numbers) I appreciate any suggestions!

Here's an example:

> rows<-c(3,1,3,5)
> cols<-c(2,4,6,6)
> mtx<-sparseMatrix(i=rows,j=cols,x=1)
> mtx
5 x 6 sparse Matrix of class "dgCMatrix"

[1,] . . . 1 . .
[2,] . . . . . .
[3,] . 1 . . . 1
[4,] . . . . . .
[5,] . . . . . 1

I'd like to get rid of colums 1,3 and 5 as well as rows 2 and 4. This is a pretty trivial example, but imagine if instead of having row numbers 1, 3 and 5 they were 1000, 3000 and 5000. Then there would be a lot more empty rows between them. Here's what happens when I using a dense matrix with named rows/columns

> dmtx<-matrix(0,3,3,dimnames=list(c(1,3,5),c(2,4,6)))
> dmtx
  2 4 6
1 0 0 0
3 0 0 0
5 0 0 0
> dmtx[as.character(rows),as.character(cols)]=1
> dmtx
  2 4 6
1 1 1 1
3 1 1 1
5 1 1 1

Can you show a small example, say 10x10, sparse matrix, plus the triplets you might use in that situation, and what subset you want? — Gavin Simpson
Spacedman, I just skimmed through the package doc and I have to say, it's not an easy read. Are you suggesting there happens to be a method burried somewhere in there that does what I'm looking for? As far as I can tell (and from everything i'd read in the past) it seems as though it's just an alternate implementation of sparse matrices from the Matrix library. If you know of a reason that one api would be better than the other in this case, I'm all ears. — dscheffy

Gavin Simpson Gavin Simpson · Accepted Answer · 2011-08-23T17:49:45

When you say "get rid of" certain columns/rows, do you mean just this:

> mtx[-c(2,4), -c(1,3,5)]
3 x 3 sparse Matrix of class "dgCMatrix"

[1,] . 1 .
[2,] 1 . 1
[3,] . . 1

Subsetting works, so you just need a way of finding out which rows and columns are empty? If that is correct, then you can use colSums() and rowSums() as these have been enhanced by the Matrix package to have appropriate methods for sparse matrices. This should preserve the sparseness during the operation

> dimnames(mtx) <- list(letters[1:5], LETTERS[1:6])
> mtx[which(rowSums(mtx) != 0), which(colSums(mtx) != 0)]
3 x 3 sparse Matrix of class "dgCMatrix"
  B D F
a . 1 .
c 1 . 1
e . . 1

or, perhaps safer

> mtx[rowSums(mtx) != 0, colSums(mtx) != 0]
3 x 3 sparse Matrix of class "dgCMatrix"
  B D F
a . 1 .
c 1 . 1
e . . 1

Adding values to a matrix using index vectors that include row and column names

4 Answers