I am attempting to perform fastclust on a very large set of distances, but running into a problem.
I have a very large csv file (about 91 million rows so a for loop takes too long in R) of similarities between keywords (about 50,000 unique keywords) that when I read into a data.frame looks like:
> df
kwd1 kwd2 similarity
a b 1
b a 1
c a 2
a c 2
It is a sparse list and I can convert it into a sparse matrix using sparseMatrix():
> myMatrix
a b c
a . . .
b 1 . .
c 2 . .
However, when I attempt to turn it into a dist object using as.dist(), I get the error that the 'problem is too large' from R. I have read the other dist questions on here, but the code others have suggested does not work for my above example data set.
Thanks for any help!