After importing a relatively big table from MySQL into H2O on my machine, I tried to run a hashing algorithm (murmurhash from the R digest package) on one of its columns and save it back to H2O. As I found out, using as.data.frame
on a H2OFrame object is not always advised: originally my H2OFrame is ~43k rows large, but the coerced DataFrame contains usually only ~30k rows for some reason (the same goes for using base::apply
/base::sapply
/etc on the H2OFrame).
I found out there is an apply
function used for H2OFrames as well, but as I see, it can only be used with built-in R functions.
So, for example my code would look like this:
data[, "subject"] <- h2o::apply(data[, "subject"], 2, function(x)
digest(x, algo = "murmur32"))
I get the following error:
Error in .process.stmnt(stmnt, formalz, envs) :
Don't know what to do with statement: digest
I understand the fact that only the predefined functions from the Java backend can be used to manipulate H2O data, but is there perhaps another way to use the digest package from the client side without converting the data to DataFrame? I was thinking that in the worst case, I will have to use the R-MySQL driver to load the data first, manipulate it as a DataFrame and then upload it to the H2O cloud. Thanks for help in advance.
margin = 2
(columns) instead ofmargin = 1
(rows). Since you are trying to replace thedata[,"subject"]
column with the results, my guess is that you are actually trying to apply the hash function to each row. I have an answer for you, but I want to make sure I understand what you are trying to do first. – Erin LeDell