1
votes

I am wondering if it's possible to provide any r sample code for using word2vec and cnn on text classification in H2O DeepWater R version ? There's very very few documentation on either mexnetR or h2o deep water r

I have already used the h2o r version package to train my word2vec word embedding vocabulary lookup table and the document word vector matrix. I am wondering if there's any sample code to combine the lookup table and the original raw text into the using mxnetR (custom iterator) CNN classification model, or using h2o r to build CNN directly.

I am asking because if I convert all data into the array format at once, then my machine will not have enough memory to support it.

1

1 Answers

1
votes

If RAM is a constraint (must be a very large corpus) then using mx.io.CSVIter could be a way to go. The CSV can be written in batches and will have a limited memory footprint during training. With vanilla mx.io.CSVIter, will likely need to perform a reshaping to bring to features X batch X seq.length as an initial transformation to the data in the network.

Another option could be to learn the embeddings within as part of the model, for example with this demo: http://dmlc.ml/rstats/2017/10/11/rnn-bucket-mxnet-R.html which also provides an example of custom iter with bucketing which also limits the RAM consumption.