1
votes

I am trying to get a Growth Rate for some variables in an Unbalanced Panel data, but I´m still getting results for years in which the lag does not exist.

I've been trying to get the Growth Rates using library Dplyr. As I Show down here:

total_firmas_growth <- total_firmas %>% 
  group_by(firma) %>% 
  arrange(anio, .by_group = T) %>% mutate(
    ing_real_growth = (((ingresos_real_2/Lag(ingresos_real_2))-1)*100)
)

for Instance, if a firm has a value for "ingresos_real_2" in the year 2008 and the next value is in year 2012, the code calculate the growth rate instead of get an NA, because of the missing year (i.e 2011 is missing to calculate 2012 growth rate, as you can see in the example with the "firma" 115 (id) right below:

total_firmas_growth <- 
"     firma        anio     ingresos_real_2  ing_real_growth
1          110         2005         14000               NA  
2          110         2006         15000              7.14  
3          110         2007         13000             -13.3   
4          115         2008         15000               NA  
5          115         2012         13000               NA  
6          115         2013         14000              7.69  

I will really appreciate your help.

1
Thank you Mr Flick, I just posted a edited version of my question. I will really appreciate your help.jairo stiven jimenez montoya

1 Answers

2
votes

The easiest way to get your original table into a format where there are NAs for columns is to create a tibble with an all-by-all of the grouping columns and your years. Expand creates an all-by-all tibble of the variables you are interested in and {.} takes in whatever was piped more robustly than . (by creating a copy, I believe). Since any mathematical operation that includes an NA will result in an NA, this should get you what you're after if you use your group_by, arrange, mutate code after it.

total_firmas %>% 
  left_join(
    expand({.}, firma, anio),
    by = c("firma","anio")
  )