So I have 3 netcdf4 files (each approx 90 MB), which I would like to concatenate using the package xarray. Each file has one variable (dis) represented at a 0.5 degree resolution (lat, lon) for 365 days (time). My aim is to concatenate the three files such that we have a timeseries of 1095 days (3 years).
Each file (for years 2007, 2008, 2009) has: 1 variable: dis 3 coordinates: time, lat, lon ... as such
<xarray.Dataset>
Dimensions: (lat: 360, lon: 720, time: 365)
Coordinates:
* lon (lon) float32 -179.75 -179.25 -178.75 -178.25 -177.75 -177.25 ...
* lat (lat) float32 89.75 89.25 88.75 88.25 87.75 87.25 86.75 86.25 ...
* time (time) datetime64[ns] 2007-01-01 2007-01-02 2007-01-03 ...
Data variables:
dis (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ...
I get them imported and use the concat module to concatenate, I think successfully. In this case the module reads out 3 netcdf filenames from filestrF
flist1 = [1,2,3]
ds_new = xr.concat([xr.open_dataset(filestrF[0,1,1,f]) for f in flist1],dim='time')
New details of the new dataset are shown to now be:
Dimensions: (lat: 360, lon: 720, time: 1095)
Seems fine to me. However, when I write this dataset back to a netcdf, the filesize has now exploded, with 1 year of data seemingly equivalent to 700 MB.
ds_new.to_netcdf('saved_on_disk1.nc')
- For 2 concatenated files, ~1.5 GB
- For 3 ,, ,, 2.2 GB
- For 4 ,, ,, 2.9 GB
I would have expected 3 x 90 MB = 270 MB - since we are scaling (3x) in one dimension(time). The variable, dis, and other dimensions lat and lon remain constant in size.
Any ideas please for the huge upscale in size? I have tested reading in and writing back out files without concatenation, and do this successfully with no increase in size.