0
votes

I have panel data for 67 countries for 25 years and I need to generate 5 year averages of all variables for each country.

I already looked at the following posts but it did not solve my problem:

R: Calculating 5 year averages in panel data

Taking a 3 year average across in a panel data set with NAs

I tried the seq()command but it generates the periods 1991-1996, 1996-2001 etc. but that is not what I am looking for.

I would like to have 5-year averages for the years 1991-1995, 1996-2000, 2001-2005, 2006-2010 and 2011-2015 for each country and afterwards add dummy variables for each period and country.

My dataset looks like the following:


+---------------+--------------+------+------+----------------+--------------------+------+
|    Country    | Country Code | Year | GINI | Trade Openness | Financial Openness | HMI  |
+---------------+--------------+------+------+----------------+--------------------+------+
| United States | USA          | 1991 | 60.4 |          45.71 |               81.4 | 5.56 |
| United States | USA          | 1992 | 50.6 |          65.1  |               80.7 | 6.00 |
| United States | USA          | 1993 | 56.1 |          61.0  |               79.1 |  6.1 | 
| United States | USA          | 1994 | 59.1 |          62.07 |               80.7 |  5.7 |
| United States | USA          | 1995 | 62.5 |          54.34 |               76.0 |  6.0 |
| United States | USA          | 1996 | 56.9 |          60.09 |               79.8 |  6.4 |
| United States | USA          | 1997 | 56.8 |          61.7  |               78.1 |  5.9 |
| ...           |              |      |      |                |                    |      |
| Argentina     | ARG          | 1991 | 45.9 |           54.8 |               91.2 |  4.6 |
| Argentina     | ARG          | 1992 | 47.1 |           56.1 |               87.1 |  4.7 |
| Argentina     | ARG          | 1993 | 42.7 |           56.9 |               89.8 |  6.1 |
| Argentina     | ARG          | 1994 | 47.1 |           55.1 |               88.1 |  5.9 |
| Argentina     | ARG          | 1995 | 50.0 |           56.1 |               85.7 |  4.7 |
| Argentina     | ARG          | 1996 | 49.0 |           54.0 |               87.2 |  4.5 |
| Argentina     | ARG          | 1997 | 48.7 |           56.6 |               86.5 |  4.9 |
| ...           |              |      |      |                |                    |      |
+---------------+--------------+------+------+----------------+--------------------+------+

My expected outcome shall be something like this:

+---------------+-------------+-------+-------+-------+------+------+------+----------+----------+
|    Country    |   Period    | GINI  |  TO   |  FO   | HCI  | dUSA | dARG | dperiod1 | dperiod2 |
+---------------+-------------+-------+-------+-------+------+------+------+----------+----------+
| United States | 1991-1995   | 57.74 | 57.64 | 79.58 | 5.87 |    1 |    0 |        1 |        0 |
| United States | 1996-2000   |   ... |   ... |   ... |  ... |    1 |    0 |        0 |        1 |
| ...           |             |       |       |       |      |      |      |          |          |
| Argentina     | 1991-1995   | 46.56 |  55.8 | 88.38 |  5.2 |    0 |    1 |        1 |        0 |
| Argentina     | 1996-2000   |   ... |   ... |   ... |  ... |    0 |    1 |        0 |        1 |
| ...           |             |       |       |       |      |      |      |          |          |
+---------------+-------------+-------+-------+-------+------+------+------+----------+----------+

The periods do not have to be named 1991-1996 etc., 1, 2 and so on is also fine.

2
The dummy coding of variables is a completely different thing. Please consider asking it as a separate question.tmfmnk

2 Answers

2
votes

To create the averages, with dplyr you can do:

df %>%
 arrange(Country, Year) %>%
 group_by(Country) %>%
 mutate(Period = gl(ceiling(n() / 5), 5, length = n())) %>%
 group_by(Country, Country_Code, Period) %>%
 summarise_all(mean)

  Country       Country_Code Period  Year  GINI Trade_Openness Financial_Openness   HMI
  <chr>         <chr>        <fct>  <dbl> <dbl>          <dbl>              <dbl> <dbl>
1 Argentina     ARG          1      1993   46.6           55.8               88.4  5.2 
2 Argentina     ARG          2      1996.  48.8           55.3               86.8  4.7 
3 United_States USA          1      1993   57.7           57.6               79.6  5.87
4 United_States USA          2      1996.  56.8           60.9               78.9  6.15

Sample data:

df <- read.table(text = "
    Country     Country_Code  Year  GINI  Trade_Openness  Financial_Openness  HMI  
                  United_States  USA           1991  60.4           45.71                81.4  5.56 
                  United_States  USA           1992  50.6           65.1                 80.7  6.00 
                  United_States  USA           1993  56.1           61.0                 79.1   6.1  
                  United_States  USA           1994  59.1           62.07                80.7   5.7 
                  United_States  USA           1995  62.5           54.34                76.0   6.0 
                  United_States  USA           1996  56.9           60.09                79.8   6.4 
                  United_States  USA           1997  56.8           61.7                 78.1   5.9 
                  Argentina      ARG           1991  45.9            54.8                91.2   4.6 
                  Argentina      ARG           1992  47.1            56.1                87.1   4.7 
                  Argentina      ARG           1993  42.7            56.9                89.8   6.1 
                  Argentina      ARG           1994  47.1            55.1                88.1   5.9 
                  Argentina      ARG           1995  50.0            56.1                85.7   4.7 
                  Argentina      ARG           1996  49.0            54.0                87.2   4.5 
                  Argentina      ARG           1997  48.7            56.6                86.5   4.9 ", 
                 header = TRUE,
                 stringsAsFactors = FALSE)
0
votes

You may use the following chunk of code which will create groups of years for your year-periods.

Step 1: create a variable for the groups of years. Given your preferences for the years I can suggest the following using "ifelse" function in r.

# 5 year averages
df$Level.5 <- ifelse(df$Year >= 1991 & df$Year <= 1995, 1,
                 ifelse(df$Year >= 1996 & df$Year <= 2000, 2,
                    ifelse(df$Year >= 2001 & df$Year <= 2005, 3,
                       ifelse(df$Year >= 2006 & df$Year <= 2010, 4,
                          ifelse(df$Year >= 2011 & df$Year <= 2015, 5,
                                                       " ")))

Step 2: Then use the "ddply" function from the "plyr" package to calculate the averages based on these groups.

short.df.5 <- ddply(df, .(Country, Level.5), numcolwise(mean))