1
votes

I have an svmlight-formatted file with values of the form:

92.91 18256731:1 71729421:1 72329637:1 83328561:1 118265976:1 134892759:1 198163358:1 352348616:1 526943048:1 5.30 102156934:1 134892759:1 198163358:1 254112843:1 262373758:1 512748316:1 526943048:1 22.00 32172600:1 72329637:1 118265976:1 134892759:1 198163358:1 411824213:1 443226486:1 445371412:1 526943048:1

I am trying to import this in h2o using h2o.import_file(fname.svmlight)

Does h2o support high dimensional sparse binary features?

Do I need to convert the hashed values in some indexes for this to work?

1
Can you show your complete code and some sample data?Vivek Kumar
And maybe try renaming the file to "fname.svmlight".Vivek Kumar
The example data are as shown, importing the file takes ages, but when I convert these in small indexes it seems to work ok.user90772
Is this something that is planned to be fixed?user90772

1 Answers

0
votes

Your three lines of svmlight is like a virus! According to top the java process is as close to 800% CPU (8-core machine) as it can get. After 45m of cpu effort (5-6 mins wall clock) I had to use kill -9 on it to get my machine back.

Even if your type of file is not officially supported, I think the fact that it brings down a machine makes it a serious bug, so I've reported it here: https://0xdata.atlassian.net/browse/PUBDEV-4798

BTW, you can find a unit test showing use of smvlight here: https://github.com/h2oai/h2o-3/blob/30f382efac687be3959a253d975cb48c341c92b4/h2o-r/tests/testdir_misc/runit_parser_type.R