I need to be able to quickly read lots of netCDF variables in python (1 variable per file). I'm finding that the Dataset function in netCDF4 library is rather slow compared to reading utilities in other languages (e.g., IDL).
My variables have shape of (2600,5200) and type float. They don't seem that big to me (filesize = 52Mb).
Here is my code:
import numpy as np
from netCDF4 import Dataset
import time
file = '20151120-235839.netcdf'
t0=time.time()
openFile = Dataset(file,'r')
raw_data = openFile.variables['MergedReflectivityQCComposite']
data = np.copy(raw_data)
openFile.close()
print time.time-t0
It takes about 3 seconds to read one variable (one file). I think the main slowdown is np.copy. raw_data
is <type 'netCDF4.Variable'>
, thus the copy. Is this the best/fastest way to do netCDF reads in python?
Thanks.
xarray
- it handles gridded data at a higher level, and makes coding much nicer. It might also be faster, if you're dealing with large arrays. See stackoverflow.com/questions/47180126/… for discussion of performance. – naught101