Why are NaNs introduced into dimension variable when loading a netcdf file into xarray dataset

Question

I am new to xarray so I would like to know if I am doing something wrong.

I have a netcdf file containing three groups (A, B, C) each of which contains several variables having just a time dimension as well as a corresponding 'time_dimension' variable. The time dimension variable employs Unix timestamps for its values.

In the example below I open the Netcdf file and print out the min and max of the timestamp for each group. This gives me the expected range of timestamps.

I then load each group from the netcdf file into a xarray dataset using the open_dataset command. For these datasets I again print out the min and max of the time dimension coordinate timestamp. The min values are the same as those gotten by directly reading the netcdf file but the max values contain NANs for two of the groups (A and B).

Although I don't show it in the code, the NAN values are all located at the end of the xarray variable values array. Also, group A contained 4 NAN values while group B contained quite a few more. Also note that the size of the netcdf variables are the same as the xarray variables for each group.

Does anyone know why NAN values are being introduced into my time dimension coordinates when they are imported into xarray from netcdf?

This is the code I used to demonstrate the problem

import xarray as XR
from netCDF4 import Dataset

Filename = r'C:\temp\My_data.nc'

#-------------- load netcdf data directly -----------

print('netcdf')  

root = Dataset(Filename,'r',format='NETCDF4')
grp = root.groups['A']
dt = grp.variables['time_dimension'][:]
print('group A: ',min(dt), max(dt))

grp = root.groups['B']
dt = grp.variables['time_dimension'][:]
print('group B: ',min(dt), max(dt))

grp = root.groups['C']
dt = grp.variables['time_dimension'][:]
print('group C: ',min(dt), max(dt))

root.close()

print('   ')
print('   ')

#-------------- load netcdf data via xarray -----------

print('xarray loaded from netcdf')

ax = XR.open_dataset(Filename, group='A', decode_times=False)
dt = ax['time_dimension'].values
print('group A: ', min(dt), max(dt))
ax.close()

ax = XR.open_dataset(Filename, group='B', decode_times=False)
dt = ax['time_dimension'].values
print('group B: ', min(dt), max(dt))
ax.close()

ax = XR.open_dataset(Filename, group='C', decode_times=False)
dt = ax['time_dimension'].values
print('group C: ', min(dt), max(dt))
ax.close()

This is the output of the above code

netcdf
group A:  1417532400.0 1480406400.0
group B:  1392129000.0 1439217000.0
group C:  1432913400.0 1436888700.0


xarray loaded from netcdf
group A:  1417532400.0 9.96920996839e+36
group B:  1392129000.0 9.96920996839e+36
group C:  1432913400.0 1436888700.0

RJCL RJCL · Accepted Answer · 2017-05-04T16:15:20

It appears that the problem was caused by not specifying a Fill_value or missing_value when the Netcdf file variables were created from masked numpy arrays.

This appears to have allowed the masked NaN values to be passed through to the xarray dataset.

Setting a Fill_value when the Netcdf file variable was creased solved this problem.

Why are NaNs introduced into dimension variable when loading a netcdf file into xarray dataset

1 Answers