Lets say I have a function 'getData()' which returns data (see of it as a data stream). Now I need to form a h2o data frame with these data. I need to insert them as a new row only if it is not present in the data frame before.
One obvious way is to do :
- There is a global h2o data frame
- Create a h2o data frame (of 1 row) from the arrived data. (I am using as.h2o())
- Check if it is already present in the global data frame (using h2o.which() or any other function)
- If it is not present then add it to the data frame (using h2o.rbind())
The above solution is too slow. Creation of h2o data frame every time the data arrives (2nd step) is taking too much time. (Only tested on small dataset)
I was also thinking of storing them in a R data frame and then using h2o.rbind() after some intervals.
What is the best (time is the priority) way to do it?
getData()
return, a 1-row data.frame in R? – Erin LeDell