0
votes

I have a NetCDF data file containing sea ice concentration

from netCDF4 import Dataset
ds = Dataset('file.nic', 'r')
ds.variables.keys()
>>odict_keys(['latitude', 'longitude', 'seaice_conc', 'seaice_source', 'time'])
ds.dimensions.keys()
>>odict_keys(['latitude', 'longitude', 'time'])

Question: In this dataset, time is stored as days since 2001-01-01 00:00:00. Let's say I want seaice_conc for a particular time = 1990-12-01 then how do I approach it without using xarray or writing another function to calculate the days difference. Is it possible to do it like in xarrays, for eg;

import xarray as xr
ds1 = xr.open_dataset('file.nc')
seaice_data = ds1['seaice_conc'].sel(time = '1990-12-01')

To give further info on dataset, it looks like this:

ds1.seaice_conc
<xarray.DataArray 'seaice_conc' (time: 1968, latitude: 240, longitude: 
1440)>
[680140800 values with dtype=float32]
Coordinates:
* latitude   (latitude) float32 89.875 89.625 89.375 89.125 88.875 88.625 
...
* longitude  (longitude) float32 0.125 0.375 0.625 0.875 1.125 1.375 1.625 
...
* time       (time) datetime64[ns] 1850-01-15 1850-02-15 1850-03-15 ...
Attributes:
short_name: concentration
long_name: Sea_Ice_Concentration
standard_name: Sea_Ice_Concentration
units: Percent

Another thing which I'm confused is that using netcdf it says that time is stored in days since 2001:01:01 but in xarrays it shows me the exact date in yyyy-mm-dd format instead of showing the 'days since...' definition?

Thanks!

3

3 Answers

1
votes

The easiest approach I could find is

from netCDF4 import date2index
from datetime import datetime
timeindex = date2index(datetime(1990,12,1),ds.variables['time'])
seaice_data = ds.variables['seaice_conc'][timeindex,:,:]
0
votes

netCDF4.Dataset is indeed a kind of lower level library than xarray, if it could do everything that xarray already does, there would be no need for xarray, right. Still, there is a useful function num2date in netCDF4, which can make your life easier when managing the date units. Approximately:

from netCDF4 import Dataset, num2date
import datetime
import numpy as np

ds = Dataset('file.nic', 'r')
your_date = datetime.datetime(1990,12,1)
select_time = np.argmax(num2date(ds.variables['time'][:],ds.variables['time'].units) == your_date)
seaice_data = ds.variables['seaice_conc'][select_time,:,:]

I admit it is still more code than xarray.

0
votes

You can do what you are trying to do in Xarray.

For Question 1. It looks like your dates are all on the 15th of each month. Selecting just one time point should work like this.

ds1['seaice_conc'].sel(time='1990-12-15')

Another way you can do this is to use the method='nearest' keyword argument.

ds1['seaice_conc'].sel(time='1990-12-01', method='nearest')

Finally, you may consider reindexing your time axis to the first of each month.

ds1['seaice_conc'].resample(time='MS').mean('time').sel(time='1990-12-01')

A bonus answer, you can select time slices with a similar approach:

ds1['seaice_conc'].sel(time=slice('1990-01-01', '1991-12-31')

The Xarray documentation includes a section on datetime indexing

For Question 2. Xarray automatically decodes coordinate variables when you use open_dataset. You can turn this off with the decode_times argument but that doesn't seem like what you want to do here.

This is also discussed in the Xarray documentation.