1
votes

I would love to get your help regarding xarray and opening multiple netcdf files. I have several .nc files containing lat, lon, time as dimensions. The files give monthly atmosphere temperature from 1850-01 until 2100-12. Each file contains this data for a different climate model. Now I want to combine the files to plot a multi-model time series and have a look on the multi-model mean. Opening each file simply with xr.open_dataset works, but the open_mfdataset not. These are the files I want to open:

tas_Amon_CNRM-ESM2-1_hist_ssp119_r1i1p1f2_gr_185001-210012.nc

tas_Amon_CanESM5_hist_ssp119_r10i1p1f1_gn_185001-210012.nc

tas_Amon_EC-Earth3-Veg-LR_hist_ssp119_r1i1p1f1_gr_185001-210012.nc

tas_Amon_EC-Earth3_hist_ssp119_r4i1p1f1_gr_185001-210012.nc

tas_Amon_GFDL-ESM4_hist_ssp119_r1i1p1f1_gr1_185001-210012.nc

tas_Amon_GISS-E2-1-G_hist_ssp119_r1i1p1f2_gn_185001-210012.nc

tas_Amon_IPSL-CM6A-LR_hist_ssp119_r1i1p1f1_gr_185001-210012.nc

and so on...

I applied

import xarray as xr

xr.open_mfdataset('path/to/file/*.nc', concat_dim=time)

My first error was about files having different calendars (so the time dimension either is datetime64 or object - that is the format is datetime format in some files and in others it is in string format). After doing deep research regarding that problem and adding several functions and comments to xr.open_mfdataset (preprocess=..., combat_by, concat_dim, etc.) I was not able to convert the  time coordinate into the same format. I then found that with ncks -A -v time file1.nc file2.nc I can override the time coordinate of file2 with the time coordinate of file 1. Applying xr.open_mfdataset and concat_dim='time' to these new adapted files gives me a new error of "Every dimension needs a coordinate for inferring concatenation order". Now I am wondering if I have to bring the files on the same grid in order to be able to open them with xr.open_mfdataset?

I also already tried to open only two files with the same time coordinate (datetime64) and it gives me the following error: ValueError: Could not interpret 'tas_Amon_EC-Earth3_hist_ssp119_r4i1p1f1_gr_185001-210012.' as a number

That makes me now very fuzzy that I do not get the errors away since a few days and thought, you know perhaps why these errors occur, or had already once a same error?

1

1 Answers

0
votes

You are aiming to merge multiple CMIP6 model outputs and combine them into an ensemble mean. I am not sure how to solve this using xarray, but this can be done in a few lines with my nctoolkit package, which uses CDO as a backend (read about the package here).

Based on the data, you have supplied, my guess is that you are working with more or less the raw data, with the historical and ssp data combined. This means the grids will be different. So you'll probably need to do the analysis in a couple of steps. First regrid to a common grid of your choosing. I've just used the first file in the ensemble below. Second you want to calculate the ensemble mean. The following should work problem free.

import nctoolkit as nc
ds = nc.open_data('path/to/file/*.nc')
ds.regrid(ds[0], method = "nn")
# Asssuming some of the files have more than one variable, you might need to do this
ds.select(variable = "tas")
ds.ensemble_mean()
ds.plot()
# if you want to convert from an nctoolkit dataset to an xarray dataset:
ds_xr = ds.to_xarray()