3
votes

I have a large data-frame (126041 Obs. of 604 variables). I'm new to HDF5 formats. I save the HDF5 file as follows:

writeH5DataFrame(myData,"C:/myDir/myHDF5.h5",overwrite=T)

  1. how can I read the data frame back? there doesn't appear to be any readH5DataFrame or loadH5DataFrame function?

  2. also, the writeH5DataFrame takes an incredibly long time, probably because of the large number of columns (604 in this case). The documentation mentions that "the data for each column is stored in a separate H5Dataset." - not sure if this the reason for the long time taken. Is there any way to speed up writing a DataFrame in HDF5 format?

1
Not every HDF5 file can be open in R. Aren't you using MODIS dataset? HDF5 from MODIS dataset cannot be open directly in R (I am not sure though if it's because of the format or due to sinusoidal projection), you have to use some external tool to handle these files. See the MODIS HDF5 spatial data tutorial.Tomas
no I am not using the MODIS dataset.uday

1 Answers

3
votes

I don't know which package are you using, but using rhdf5 package, it looks very easy to write/read hdf5 files.

## uncomment the 2 lines after to install the package
## source("http://bioconductor.org/biocLite.R")
## biocLite("rhdf5")
library(rhdf5)
## empty HDF5 file : the data base
h5createFile("myhdf5file.h5")
## create group hierarchy. : tables or datasets
h5createGroup("myhdf5file.h5","group1")
h5createGroup("myhdf5file.h5","group2")

## save a matrix 
A = matrix(1:10,nr=5,nc=2)
h5write(A, "myhdf5file.h5","group1/A")

## save an array with attribute 
B = array(seq(0.1,2.0,by=0.1),dim=c(5,2,2))
attr(B, "scale") <- "liter"
h5write(B, "myhdf5file.h5","group2/B")
## check the data base
h5ls("myhdf5file.h5")

   group   name       otype  dclass       dim
0       / group1   H5I_GROUP                  
1 /group1      A H5I_DATASET INTEGER     5 x 2
2       / group2   H5I_GROUP                  
3 /group2      B H5I_DATASET   FLOAT 5 x 2 x 2

 ## read A and B
 D = h5read("myhdf5file.h5","group1/A")
 E = h5read("myhdf5file.h5","group2/B")