0
votes

Using the h2o R package (v 3.24.0.5) for some deeplearning, I need to import some big sparse matrix [2M * 10k] into it. I've tried using fwrite but got a cholmod problem too large error, so went with svmlight. Original matrix looks like this :

    Count    Dist    
1   nan     10.1266
2   859.124 10.8198
3   nan     10.1266

For this I took the sparsio package, writing goes ok but when reading the file with h2o.importFile I noticed something wrong : I get the column indexes in front of every numbers as you can see below :

library(sparsio)
write_svmlight(HiC_mat.All, file="Rdata/mat_kmer-NA.txt")


HIC_df = h2o.importFile("Rdata/mat_kmer-NA.txt")

HIC_df[1:3,1:3]
  C1        C2        C3
1  0     0:nan 1:10.1266
2  0 0:859.124 1:10.8198
3  0     0:nan 1:10.1266

Any idea on how I can get rid of these ?

Data should look like this:

  C1        C2        C3
1  0       nan     10.1266
2  0    859.124    10.8198
3  0       nan     10.1266
1
can you update your question to include the version of H2O-3 you are using as well as what you expect your dataset to look like if it was imported correctly? You can also specify the parser_type as "SVMLight". thanks!Lauren
@Lauren done editing. The importFile function doesn't have a parse_type argument, when trying the uploadFile I get an error 'NewChunk has type Numeric, but the Vec is of type String'. Trying with a parse after importing itExenter
Same with a parse after importing it, so I guess the problem comes from writing the data with sparsio ?Exenter

1 Answers

1
votes

Ok so the problem seems to be indeed in the writing of the svm file I used this :

write_svmlight(x, y = numeric(nrow(x)), file = filename, zero_based = FALSE) 

and it works for now