I am working with a very large dataset and I would like to keep the data in H2O as much as possible without bringing it into R.
I noticed whenever I pass an H2O Frame
to a function, any modification I make to the Frame is not reflected outside of the function. Is there a way to pass the Frame by Reference?
If not, what's the best way to modify the original frame inside a function with copying all of the Frame?
Another related question: does passing a Frame to other functions (read only), make extra copies on H2O side? My datasets are 30GB - 100GB. So want to make sure passing them around does not cause memory issues.
mod = function(fdx) {
fdx[,"x"] = -1
}
d = data.frame(x = rnorm(100),y=rnorm(100))
dx = as.h2o(d)
dx[1,]
mod(dx)
dx[1,] # does not change the original value of x
> dx[1,]
x y
1 0.3114706 0.9523058
> dx[1,]
x y
1 0.3114706 0.9523058
Thanks!
data.table
has the similar mechanism to usereference
, but I am not sure it can use in your case. you can take a look here. – Patric