0
votes

I am trying to compute the overall growth rate of each NUTS2 region (column NUTS_CODE) for the years 2000-2006 (REF_YEAR).

My dataset looks like this:

  NUTS_CODE NUTS_LEVEL SCENARIO_ID REF_YEAR IND_VALUE NUTS_C
  <chr>          <dbl>       <dbl>    <dbl>     <dbl> <chr> 
1 BE10               2           1     2000     49434 BE    
2 BE21               2           1     2000     29019 BE    
3 BE22               2           1     2000     20646 BE    
4 BE23               2           1     2000     21155 BE    
5 BE24               2           1     2000     24963 BE    
6 BE25               2           1     2000     22912 BE  

So I am trying to compute something like

(BE10(which(REF_YEAR == 2006 - BE10(which(REF_YEAR==2000))/BE10(which(REF_YEAR==2000)

(this is not my actual code - I just want to explain what I want to achieve) and this Needs to be done for each and every NUTS_CODE.

I have tried to achieve this by using both a for loop as well as the dplyr function, but somehow it does not work.

library(dplyr)

data$growth<-NA

for(i in 1:nrow(data))
{
  if((data%>%filter(NUTS_CODE == data$NUTS_CODE[i] &
                   SCENARIO_ID == data$SCENARIO_ID[i] &
                   REF_YEAR == (data$REF_YEAR[i]-1)
                   )%>%nrow()
      ) == 0
  )
  {
    data$growth[i]<-0
  } else {
    data$growth[i]<-(((data$IND_VALUE[i]-
                         (data%>%filter(NUTS_CODE == data$NUTS_CODE[i] &
                                          SCENARIO_ID == data$SCENARIO_ID[i] &
                                          REF_YEAR == (data$REF_YEAR[i]==2006) 
                         )
                         )[,"IND_VALUE"]
    )/
      (
        (data%>%filter(NUTS_CODE == data$NUTS_CODE[i] &
                         SCENARIO_ID == data$SCENARIO_ID[i] &
                         REF_YEAR == (data$REF_YEAR[i]==2000) 
        )
        )[,"IND_VALUE"]
      )
    )
    *100)
  }
  print(paste("",i,sep = " "))
}

I do not get an error or a warning, but in data$growth I get a full column of numeric(0) instead of actual values.

Help is appreciated!

2
Get rid of data$ in the pipe, the data set is already known since its beginning. I also believe that the code can be made (much) simpler, there is no need for a for loop and to get all growth rates at the same time,group_by/mutate seems to be more natural than filter.Rui Barradas
To get it right, you want to compute the overall growth rate of IND_VALUE for each NUTS_CODE ?Lisardo Erman
Remember: In 99% of the cases (especially when it comes to data processing) there is no need for for loops as they are likely to be inefficient.Peter Dieter

2 Answers

1
votes

You can do something like the following. This computes the growth rates for all the years, not just 2006, with base year == 2000.

library(tidyverse)

result <- data %>%
  arrange(NUTS_CODE, SCENARIO_ID, REF_YEAR) %>%
  group_by(NUTS_CODE, SCENARIO_ID) %>% 
  mutate(base_rate = first(IND_VALUE),
         growth_rate = (IND_VALUE - base_rate)/base_rate) %>%
  select(-base_rate)

head(result)
## A tibble: 6 x 5
## Groups:   NUTS_CODE, SCENARIO_ID [1]
#  NUTS_CODE SCENARIO_ID REF_YEAR IND_VALUE growth_rate
#  <fct>           <int>    <int>     <int>       <dbl>
#1 BE10                1     2000     27451      0     
#2 BE10                1     2001     38529      0.404 
#3 BE10                1     2002     34339      0.251 
#4 BE10                1     2003     33188      0.209 
#5 BE10                1     2004     24835     -0.0953
#6 BE10                1     2005     20901     -0.239 

If you only want year 2006, you can now filter those years.

result %>% filter(REF_YEAR == 2006)
## A tibble: 12 x 5
## Groups:   NUTS_CODE, SCENARIO_ID [12]
#   NUTS_CODE SCENARIO_ID REF_YEAR IND_VALUE growth_rate
#   <fct>           <int>    <int>     <int>       <dbl>
# 1 BE10                1     2006     21257     -0.226 
# 2 BE10                2     2006     41013      0.460 
# 3 BE10                3     2006     29346      0.423 
# 4 BE10                4     2006     46121      1.20  
# 5 BE21                1     2006     45183      0.613 
# 6 BE21                2     2006     31042     -0.289 
# 7 BE21                3     2006     22901     -0.350 
# 8 BE21                4     2006     45526      0.393 
# 9 BE22                1     2006     24590     -0.0947
#10 BE22                2     2006     47944      0.642 
#11 BE22                3     2006     32856      0.0639
#12 BE22                4     2006     40373      0.759 

Test data creation code.

NUTS_CODE <- sprintf("BE%02d", c(10, 21:22))
SCENARIO_ID <- 1:4
REF_YEAR <- 2000:2006

data <- expand.grid(NUTS_CODE, SCENARIO_ID, REF_YEAR)
names(data) <- c("NUTS_CODE", "SCENARIO_ID", "REF_YEAR")

set.seed(1234)
data$IND_VALUE <- sample(20000:50000, nrow(data), TRUE)
0
votes
library(dplyr)

data %>%
  group_by(NUTS_CODE) %>% 
  summarise(growth_rate = (IND_VALUE[REF_YEAR == 2006] -IND_VALUE[REF_YEAR == 2000]) / IND_VALUE[REF_YEAR == 2000] )