6
votes

I have a netcdf file with data as a function of lon,lat and time. I would like to calculate the total number of missing entries in each grid cell summed over the time dimension, preferably with CDO or NCO so I do not need to invoke R, python etc.

I know how to get the total number of missing values

ncap2 -s "nmiss=var.number_miss()" in.nc out.nc

as I answered to this related question: count number of missing values in netcdf file - R

and CDO can tell me the total summed over space with

cdo info in.nc

but I can't work out how to sum over time. Is there a way for example of specifying the dimension to sum over with number_miss in ncap2?

2

2 Answers

2
votes

Even though you are asking for another solution, I would like to show you that it takes only one very short line to find the answer with the help of Python. The variable m_data has exactly the same shape as a variable with missing values read using the netCDF4 package. With the execution of only one np.sum command with the correct axis specified, you have your answer.

import numpy as np
import matplotlib.pyplot as plt
import netCDF4 as nc4

# Generate random data for this experiment.
data = np.random.rand(365, 64, 128)

# Masked data, this is how the data is read from NetCDF by the netCDF4 package.
# For this example, I mask all values less than 0.1.
m_data = np.ma.masked_array(data, mask=data<0.1)

# It only takes one operation to find the answer.
n_values_missing = np.sum(m_data.mask, axis=0)

# Just a plot of the result.
plt.figure()
plt.pcolormesh(n_values_missing)
plt.colorbar()
plt.xlabel('lon')
plt.ylabel('lat')
plt.show()

# Save a netCDF file of the results.
f = nc4.Dataset('test.nc', 'w', format='NETCDF4')
f.createDimension('lon', 128)
f.createDimension('lat', 64 )
n_values_missing_nc = f.createVariable('n_values_missing', 'i4', ('lat', 'lon'))
n_values_missing_nc[:,:] = n_values_missing[:,:]
f.close()
2
votes

We added the missing() function to ncap2 to solve this problem elegantly as of NCO 4.6.7 (May, 2017). To count missing values through time:

ncap2 -s 'mss_val=three_dmn_var_dbl.missing().ttl($time)' in.nc out.nc

Here ncap2 chains two methods together, missing(), followed by a total over the time dimension. The 2D variable mss_val is in out.nc. The response below does the same but averages over space and reports through time (because I misinterpreted the OP).

Old/obsolete answer:

There are two ways to do this with NCO/ncap2, though neither is as elegant as I would like. Either call assemble the answer one record at a time by calling num_miss() with one record at a time, or (my preference) use the boolean comparison function followed by the total operator along the axes of choice:

zender@aerosol:~$ ncap2 -O -s 'tmp=three_dmn_var_dbl;mss_val=tmp.get_miss();tmp.delete_miss();tmp_bool=(tmp==mss_val);tmp_bool_ttl=tmp_bool.ttl($lon,$lat);print(tmp_bool_ttl);' ~/nco/data/in.nc ~/foo.nc
tmp_bool_ttl[0]=0 
tmp_bool_ttl[1]=0 
tmp_bool_ttl[2]=0 
tmp_bool_ttl[3]=8 
tmp_bool_ttl[4]=0 
tmp_bool_ttl[5]=0 
tmp_bool_ttl[6]=0 
tmp_bool_ttl[7]=1 
tmp_bool_ttl[8]=0 
tmp_bool_ttl[9]=2

or

zender@aerosol:~$ ncap2 -O -s 'for(rec=0;rec<time.size();rec++){nmiss=three_dmn_var_int(rec,:,:).number_miss();print(nmiss);}' ~/nco/data/in.nc ~/foo.nc
nmiss = 0 

nmiss = 0 

nmiss = 8 

nmiss = 0 

nmiss = 0 

nmiss = 1 

nmiss = 0 

nmiss = 2 

nmiss = 1 

nmiss = 2