Following the post about data.table and parallel computing, I'm trying to find a way to get an operation on a data.table
parallized.
I have a data.table
with 4 million rows of 14 observations and would like to share it in a common memory so that operations on it can be parallelized by using the "parallel"-package with parLapply
without having to copy the table for each node in the cluster (what parLapply
does). At the moment the costs for moving the data.table
around are bigger than the benefit of parallel computation.
I found the "bigmemory"-package as an answer for sharing memory, but it doesn't maintain the "data.table"-structure of the data. So does anyone know a way to:
1) put the data.table
in shared memory
2) maintain the "data.table"-structure of the data by doing so
3) use parallel processing on this data.table
?
Thanks in advance!