I have a large (5GB) temperature netCDF file. The file is has 4 dimensions: time, pressure level, latitude, longitude.
The dataset has 31 timepoints and I am only interested in 5 pressure levels.
My parameter is temperature t
:
from netCDF4._netCDF4 import Dataset
# Load the dataset
dataset = Dataset(path)
factor = dataset.variables['t']
To extract a 'cube' of temperature data from my factor
variable around a centre cell, I would simply do subsetting, like this:
radius = 5
# +1 because the subsetting does not include last index
lats_bounds = [nearest_latitude_index-radius,nearest_latitude_index+radius + 1]
lons_bounds = [nearest_longitude_index-radius,nearest_longitude_index+radius +1]
#all timepoints
times_bounds = [0, len(times)]
#just the last 5 pressure levels
pressure_level_bounds = [len(levels)-5, len(levels)]
results = factor[times_bounds[0]:times_bounds[1],pressure_level_bounds[0]:pressure_level_bounds[1], lats_bounds[0]:lats_bounds[1],lons_bounds[0]:lons_bounds[1]]
The problem is that results
will now of type ndarray
with shape (31,5,11,11)
and size 18755
(31*5*11*11) where every index holds just a single value.
I need the values from results
, but for each value, I also need its corresponding timepoint, pressure level, latitude and longitude.
Ideally, I'd like to do subsetting as I have done, but my final result would be an array of tuples... Something like this:
# corresponding timestamp, pressure level, latitude, longitude
# and the temperature value extracted.
final = [
(2342342, 1000, 24.532, 53.531, 277),
(2342342, 1000, 74.453, 26.123, 351),
(2342342, 1000, 80.311, 56,345, 131),
...
]
How can I achieve this?