easiest way to manipulate raster data for discretizing annual distribution of daily temperature

Question

I have raster gridded data of Germany' historical daily temperature observation (15 years' historical daily mean temperature) in large RasterBrick object. Here is how my raster gridded data look like:

> Temperature_rasterData
class       : RasterBrick 
dimensions  : 31, 37, 1147, 5479  (nrow, ncol, ncell, nlayers)
resolution  : 0.25, 0.25  (x, y)
extent      : 5.75, 15, 47.25, 55  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
data source : in memory
names       : X1980.01.01, X1980.01.02, X1980.01.03, X1980.01.04, X1980.01.05, X1980.01.06, X1980.01.07, X1980.01.08, X1980.01.09, X1980.01.10, X1980.01.11, X1980.01.12, X1980.01.13, X1980.01.14, X1980.01.15, ... 
min values  :       -9.24,      -11.32,      -12.05,      -14.12,       -7.91,       -6.35,       -6.74,       -7.77,       -9.79,      -10.17,      -12.20,      -14.90,      -15.68,      -15.61,      -15.22, ... 
max values  :        2.19,        0.68,        0.30,        2.91,        5.25,        5.03,        4.33,        3.40,        1.52,        0.33,       -1.10,       -1.61,       -3.55,       -0.12,        0.19, ...

However, I intend to discretize the annual distribution of daily temperature into a fixed set of temperature bins (I need 10 bins in total for each year), here you can find the methods in detail: Temperature Effects on Productivity and Factor Reallocation. To do so, I need to find maximum and minimum temperature value from all of these multiple layer raster gridded data. The reason for finding a temperature range because I need to divide annual distribution of daily temperature in each grid based on MAX/MIN temperature value.

Unfortunately, here I am not able to reproduce these multiple layers RaterBrick data in R because original raster gridded data is quite big and hard to reproduce small raster. I hope SO community will understand the situation. Here is smaller raster data for reproducible use: please give it try smallest example raster data and here is my R script to treat downloaded raster data:

temp_raster <- raster::stack('~/tg_day_2017_grid_ensmean.nc')
data(wrld_simpl) 
Germany <- wrld_simpl[wrld_simpl@data$NAME == "Germany",]
deu_ext <- extent(Germany)
Deu_crop <- crop(temp_raster ,deu_ext)

for getting temperature range for these multiple raster later data, I tried the following and it is not clever because I need a more simplified solution. Here is my attempt in R:

nms <- names(Deu_crop)
yrs <- unique(sub('X(\\d+).+','\\1',nms))

getRange <- lapply(yrs,function(x) {
    range(Deu_crop[[grep(x,nms)]],na.rm=TRUE)
})

I really don't know how to discretize the data in large RasterBrick object. In particular, it is not quite crystal clear for me how to manipulate raster data for discretization purpose because this raster data has multiple layers with huge daily mean temperature observation. How can I make this happen in R? Is that doable to manipulate multi-layers raster data for discretization? Any idea?

If there is the easier way to manipulate large raster data, how can I discretize annual distribution of daily temperature and make a bar plot for each year? Any easiest way to get this done in R? Thanks in advance!

Here is the likely bar plot that I want to make from multi-layers raster data:

Update:

I am gonna discretize the annual distribution of daily temperature observation for each year in each Germany' regions (AKA, polygon), here is the Germany' NUTS regions on the fly: Germany' shapefile.

Comments are not for extended discussion; this conversation has been moved to chat. — Samuel Liew

Val Val · Accepted Answer · 2018-05-30T12:21:26

Here's a solution (including a reproducible example):

library(raster)
library(lubridate)
library(tidyverse)

# creating some fake temperature data which matches your rasterstack

# create template raster
r <- raster(xmn=5.75, xmx= 15, ymn = 47.25, ymx =55,res=c(0.25,0.25))

# add fake temperature values
Deu_crop <- do.call(stack,lapply(1:5479,function(i) setValues(r,round(runif(n = ncell(r),min = -10,max = 25)))))

# add layer names
names(Deu_crop) <- paste0('X',gsub('-','.',ymd('1980.01.01') + days(1:5479)))

# check rasterstack

Deu_crop

# output
#
# class       : RasterStack 
# dimensions  : 31, 37, 1147, 5479  (nrow, ncol, ncell, nlayers)
# resolution  : 0.25, 0.25  (x, y)
# extent      : 5.75, 15, 47.25, 55  (xmin, xmax, ymin, ymax)
# coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
# names       : X1980.01.02, X1980.01.03, X1980.01.04, X1980.01.05, X1980.01.06, X1980.01.07, ...
# min values  :         -10,         -10,         -10,         -10,         -10,         -10, ...
# max values  :          25,          25,          25,          25,          25,          25, ...

So Deu_crop should be be comaprable to your data in terms of structure, of course with random temperature values.

The shapefile isn't easily reproducible, so I've downloaded yours and used it. As I already mentioned, some polygons are a bit small for extraction.

The fastest way to do it, would be to rasterize the shapefile to match your data raster, but some polygons wouldn't be converted and others probably to a wrong cell ... so in this case it might be better to use raster::extract directly with the shapefile, even though it's a bit slow. But if you only need to to it a couple of times, it's bearable - grab a coffee in the meantime.

shp <- shapefile('eurostat_NUTS3_29-May-18/deu_adm_2006.shp')

# coffee time
e <- extract(Deu_crop,shp)

# add NUTS_ID as names to list 

names(e) <- shp$NUTS_ID

To calculate the number of days per year per bin, I create a function which uses tidiverse functionality and use lapply to iterate over the whole list of extractions (one list item corresponds to one polygon):

# define bins

bins <- seq(-10,25,length.out = 5)

myfun <- function(ix){

gather(data.frame(e[[ix]],stringsAsFactors = F),'colname','temp') %>% 
    group_by(colname) %>% summarise(temp = mean(temp)) %>% ungroup() %>% # spatial mean
    mutate(year = sub('X(\\d{4}).+','\\1',colname)) %>% # get years
  select(- colname) %>% # drop colname column
  mutate(bin1= (temp <= bins[1]) * 1) %>%  # bin1
  mutate(bin2= (temp > bins[1] & temp <= bins[2]) * 1) %>% # bin2
  mutate(bin3= (temp > bins[2] & temp <= bins[3]) * 1) %>% # bin3
  mutate(bin4= (temp > bins[3] & temp <= bins[4]) * 1) %>% # bin4
  mutate(bin5= (temp > bins[4] & temp <= bins[5]) * 1) %>% # bin5
  mutate(bin6= (temp > bins[5]) * 1) %>% select(-temp) %>% # bin6
  group_by(year) %>% summarise_all(funs(sum)) %>% mutate(NUTS_ID = names(e)[ix]) # drop year, calculate occurences and add NUTS_ID

}

# create single dataframe

result <- do.call(rbind,lapply(1:length(e),function(ix) myfun(ix)))

A quick look at the result variable:

result

# output:
#
# # A tibble: 6,864 x 8
# year  bin1  bin2  bin3  bin4  bin5  bin6 NUTS_ID
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <chr>
# 1  1980    12    85    91    92    85     0   DEA54
# 2  1981     3    64    99   113    86     0   DEA54
# 3  1982     3    80   113    86    83     0   DEA54
# 4  1983     6    84    90    85   100     0   DEA54
# 5  1984     8    90    92    86    90     0   DEA54
# 6  1985     5    86    85    95    94     0   DEA54
# 7  1986     6    74    97   108    80     0   DEA54
# 8  1987     4    82    99    94    86     0   DEA54
# 9  1988     3    89    87    91    96     0   DEA54
#10  1989     8   103    92    73    89     0   DEA54
# # ... with 6,854 more rows

Update:

To handle the bins I first calculate the bins from the entire data's minimum and maximum, and then I use a new function createBins to add them to each polygon's extract. This will replace the myfun part from my original solution.

# new function

createBins <- function(df,bins_mat){

  for (i in 1:nrow(bins_mat)){

    bin <- sprintf('Bin%s;%s;%s',bins_mat[i,1],bins_mat[i,2],bins_mat[i,3])

    if (i ==1) df <- df %>% mutate(!!bin := (temp >= bins_mat[i,2] & temp <= bins_mat[i,3])*1)
    else df <- df %>% mutate(!!bin := (temp > bins_mat[i,2] & temp <= bins_mat[i,3])*1)
  }
  return(df)
}

# new version of myfun

myfun2 <- function(ix,bins_mat){
gather(data.frame(e[[ix]],stringsAsFactors = F),'colname','temp') %>% 
    group_by(colname) %>% summarise(temp = mean(temp)) %>% ungroup() %>% # spatial mean
    mutate(year = sub('X(\\d{4}).+','\\1',colname)) %>% # get years
    select(- colname) %>%  # drop colname column
    createBins(.,bins_mat) %>% select(-temp) %>%  
    group_by(year) %>% summarise_all(funs(sum)) %>% mutate(NUTS_ID = names(e)[ix])
}


# 11 values to create 10 interval bins

bins <- seq(min(cellStats(Deu_crop,'min')),min(cellStats(Deu_crop,'max')),length.out = 11)

# create a bin matrix (number, bin_minimum, bin_maximum) for later function

bins_mat <- cbind(1:10,bins[1:10],bins[2:11])

# create new result

result <- do.call(rbind,lapply(1:length(e),function(ix) myfun2(ix,binsmat)))

easiest way to manipulate raster data for discretizing annual distribution of daily temperature

1 Answers

Update: