5
votes

I have a sparse matrix in R

I now wish to perform nonnegative matrix factorization on R

data.txt is a text file i created using python, it consists of 3 columns where first column specifies the row number, second the column number and third the value

data.txt

1 5 10
3 2 5
4 6 9

original data.txt contains 164009 rows which is data for 250000x250000 sparse matrix

I used NMF library and I am doing

x=scan('data.txt',what=list(integer(),integer(),numeric()))
library('Matrix')
R=sparseMatrix(i=x[[1]],j=x[[2]],x=x[[3]]) 
res<-nmf(R,3)

It is giving me an error:

Error in function (classes, fdef, mtable): unable to find an inherited method for function nmf, for signature "dgCMAtrix", "missing", "missing"

Could anyone help me figure out what am I doing wrong?

2
Give code to build an example sparse matrix, and (working) code to run your example. Do you really mean -> there, or should that be <- ? - Matthew Lundberg
You're still missing a piece. I don't have data.txt. It's best to post R code that creates 'x', but posting example data itself is almost as easy to use. (I can't say about others, but I prefer to use example code without > prompt, so I can paste it right from the website into R.) - Matthew Lundberg
oh data.txt is a text file i created using python, it consists of 3 columns where first column specifies the row number, second the column number and third the value. - user1344389
the point is not that we don't know what structure data.txt would have, it's that providing a reproducible example lowers the barrier to providing questions enormously; rather than starting by constructing an example, would-be answerers can start right in on diagnosing/answering the question. Meet them halfway: tinyurl.com/reproducible-000 - Ben Bolker
I just started coding in R last night and I have a very large dataset for the sparseMatrix ... my dimensions for sparse matrix are 250000x250000 and I dont really know how to provide a reproducible ... but I would really really appreciate any help with this .. I am workin on this for 24 hrs straight and did not get any result on web - user1344389

2 Answers

4
votes

The first problem is that you are providing a dgCMatrix to nmf.

> class(R)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"

The help is here:

help(nmf)

See the Methods section. It wants a real matrix. Coercing with as.matrix is likely to not be of very much service to you, because of the number of entries.

Now, even with your example data, coercion to a matrix is insufficient as written:

> nmf(as.matrix(R))
Error: NMF::nmf : when argument 'rank' is not provided, argument 'seed' is required to inherit from class 'NMF'. See ?nmf.

Let's give it a rank:

> nmf(as.matrix(R),2)
Error in .local(x, rank, method, ...) : 
  Input matrix x contains at least one null row.

And indeed it does:

> R
4 x 6 sparse Matrix of class "dgCMatrix"

[1,] . . . . 10 .
[2,] . . . .  . .
[3,] . . 5 .  . .
[4,] . . . .  . 9
0
votes

Almost 10 years later there are solutions. Here's a fast one.

If you have a dgCMatrix with 250k-square dgCMatrix that is anywhere near 1% sparse, you need a sparse factorization algorithm.

I wrote RcppML::NMF for large sparse matrices:

library(RcppML)
A <- rsparsematrix(1000, 10000, 0.01)
model <- RcppML::nmf(A, k = 10)
str(model)

That should take a few seconds on a laptop.

You might also check out rsparse::WRMF, although it isn't as fast.