Importing Glove to h2o with word2vec function throwing NullPointerException

Question

I'm trying to import Glove to h2o cluster via R with word2vec function. Regarding to this Does or will H2O provide any pretrained vectors for use with h2o word2vec? I downloaded pretrained glove.840B.300d.txt file and tried to import it to h2o but there was problem with parsing. Then I read Glove to R, removed one line recognized as a NA and saved it as csv. With the csv file parsing in h2o went well but I couldn't create word2vec model with it hence it threw java.lang.NullPointerException

I have h2o_3.15.0.99999 version.

My code:

h2o.init()
glove<-h2o.importFile("glove.840B.300d.csv",header = F)
model<-h2o.word2vec(pre_trained = glove,vec_size = 300)

Full output:

|==========================================================================| 100%

java.lang.NullPointerException
java.lang.NullPointerException
at water.AutoBuffer.tcpOpen(AutoBuffer.java:488)
at water.AutoBuffer.sendPartial(AutoBuffer.java:679)
at water.AutoBuffer.putA4f(AutoBuffer.java:1383)
at hex.word2vec.Word2VecModel$Word2VecOutput$Icer.write90(Word2VecModel$Word2VecOutput$Icer.java)
at hex.word2vec.Word2VecModel$Word2VecOutput$Icer.write(Word2VecModel$Word2VecOutput$Icer.java)
at water.Iced.write(Iced.java:61)
at water.AutoBuffer.put(AutoBuffer.java:771)
at hex.Model$Icer.write86(Model$Icer.java)
at hex.word2vec.Word2VecModel$Icer.write85(Word2VecModel$Icer.java)
at hex.word2vec.Word2VecModel$Icer.write(Word2VecModel$Icer.java)
at water.Iced.write(Iced.java:61)
at water.Iced.asBytes(Iced.java:42)
at water.Value.<init>(Value.java:348)
at water.TAtomic.atomic(TAtomic.java:22)
at water.Atomic.compute2(Atomic.java:56)
at water.Atomic.fork(Atomic.java:39)
at water.Atomic.invoke(Atomic.java:31)
at water.Lockable.unlock(Lockable.java:181)
at water.Lockable.unlock(Lockable.java:176)
at hex.word2vec.Word2Vec$Word2VecDriver.computeImpl(Word2Vec.java:72)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:205)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1263)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

Michal Kurka Michal Kurka · Accepted Answer · 2017-10-04T02:01:15

Thanks for the report, the current implementation is restricted by JVM's maximum length of an array. This model seems to be too large and it exceeds the JVM's limits.

We will have to fix it in H2O.

Importing Glove to h2o with word2vec function throwing NullPointerException

2 Answers