Efficient reading of netcdf variable in python

Question

I need to be able to quickly read lots of netCDF variables in python (1 variable per file). I'm finding that the Dataset function in netCDF4 library is rather slow compared to reading utilities in other languages (e.g., IDL).

My variables have shape of (2600,5200) and type float. They don't seem that big to me (filesize = 52Mb).

Here is my code:

import numpy as np
from netCDF4 import Dataset
import time
file = '20151120-235839.netcdf'
t0=time.time()
openFile = Dataset(file,'r')
raw_data = openFile.variables['MergedReflectivityQCComposite']
data = np.copy(raw_data)
openFile.close()
print time.time-t0

It takes about 3 seconds to read one variable (one file). I think the main slowdown is np.copy. raw_data is <type 'netCDF4.Variable'>, thus the copy. Is this the best/fastest way to do netCDF reads in python?

Thanks.

The power of Numpy is that you can create views into the exiting data in memory via the metadata it retains about the data. So a copy will always be slower than a view, via pointers. As @JCOidl says it's not clear why you don't just use raw_data = openFile.variables['MergedReflectivityQCComposite'][:] — Eric Bridger
This simple step speeds up the read by an order of magnitude. Thank you! I'll try to leverage pointers withy Numpy more. Do you know of a good reference explaining this concept a bit more (n00b here)? — weather guy
The docs: docs.scipy.org/doc/numpy-dev/user/… and on SO stackoverflow.com/questions/4370745/view-onto-a-numpy-array — Eric Bridger
I'm not sure that it's faster in your case, but I would highly recommend using xarray - it handles gridded data at a higher level, and makes coding much nicer. It might also be faster, if you're dealing with large arrays. See stackoverflow.com/questions/47180126/… for discussion of performance. — naught101

Eric Bridger Eric Bridger · Accepted Answer · 2015-12-09T19:46:20

The power of Numpy is that you can create views into the exiting data in memory via the metadata it retains about the data. So a copy will always be slower than a view, via pointers. As JCOidl says it's not clear why you don't just use:

 raw_data = openFile.variables['MergedReflectivityQCComposite'][:]

For more info see SciPy Cookbook and SO View onto a numpy array?

Efficient reading of netcdf variable in python

3 Answers