I have faced several challenges when I am dealing with very large RasterStack
object in R. Here is the main story, I have downloaded gridded data from European Climate Assessment website(download site of gridded data and download link of gridded data that I am interested in). So my very first step was importing this data in R as RasterStack
object. Then I intended to crop raster grid of only particular countries, so I used raster::crop
to do that. My ultimate goal is to calculate the annual average temperature for each grid cell. Here is the grid coverage that I cropped from original raw RasterStack
object where grid resolution is defined as 0.25-degree
resolution:
Here is the R script that I took a shot:
library(raster)
library(ncdf4)
library(R.utils)
library(maptools)
raw_netCDF = raster::stack("~/tg_0.25deg_reg_1995-2010_v17.0.nc") # read downloaded gridded data in R
data(wrld_simpl)
Germany <- wrld_simpl[wrld_simpl@data$NAME == "Germany",]
deu_ext <- extent(Germany)
Germany_ <- crop(raw_netCDF, deu_ext)
but above cropped solution Germany_
raised a challenge. The first challenge is treating missing values in large RasterStack
object. If I didn't treat missing values in large RasterStack
object, in newly produced cropped raster grid, all missing values were turned into zero, which cause confusion to read temperature observation such as zero Celsius degree. So I treated missing values in large RasterStack
object in two different way. The first one is down below:
raw_netCDF_ = raster::reclassify(raw_netCDF , cbind(NA, -999))
but raster::reclassify
always failed because of memory problem. so it is not good solution. I tried raster::calc
to treat missing values in very large RasterStack
object but it is extremely slow even I run the same operation in a powerful computer. So using raster::calc
to treat missing values is really not a good idea. Here is R script down below
raw_netCDF_ = raster::calc(raw_netCDF , function(x) { ifelse(is.na(x), -999, x) })
I want to do simple statistics, to calculate an annual mean temperature for each grid cell for whole grid coverage above, and produce its output in clean and simple plaintext data. In final raster grid data in plain text only contain grid coordinate and it's annual mean temperature. Doing such operation for RasterStack
object is not an ordinary task for me.
Perhaps, there must be a possible optimal solution to properly manipulate very large RasterStack
object and make sure all missing values in original raw data can be correctly preserved in the cropped raster grid of Germany.
Desired output:
In exported plain text data, I want to have annual mean Temp
for whole Germany grid for 16 years something like this:
> ann_mean_temp_1996_1999
long lat net_1996_precip net_1997_temp net_1997_temp net_1998_temp net_1999_temp net_2000_temp
1: 6.125 47.375 84.4 86.4 83.4 81.4 80.4 87.4
2: 6.375 47.375 89.3 88.3 84.3 81.3 846.3 846.3
3: 6.625 47.375 80.0 85.0 80.0 83.0 88.0 87.0
4: 6.875 47.375 84.4 83.4 85.4 86.4 82.4 80.4
5: 7.125 47.375 83.0 85.0 84.0 89.0 83.0 84.0
---
1112: 13.875 54.875 63.8 68.8 66.8 67.8 65.8 66.8
1113: 14.125 54.875 69.6 65.6 61.6 60.6 62.6 63.6
1114: 14.375 54.875 60.5 61.5 62.5 67.5 69.5 64.5
1115: 14.625 54.875 62.9 67.9 68.9 67.9 64.9 68.9
1116: 14.875 54.875 64.6 67.6 66.6 62.8 64.6 63.5
If manipulating very large RasterStack
object in R is possible, how can I obtain expected raster grid data with correct resolution (missing values would be properly treated) and apply simple statistics for all daily temperature observation for each grid? How can I make this happen? Is that possible to manipulate RasterStack
object and write all raster grid data in plain text data (ASCII
or csv
) in R? Any efficient way to get this task done? Any more thoughts? Thanks