R: Average years in time series per group

Question

Dear Community,

I am working with R and looking for trends in time series data of bilateral exports over a duration of 20 years. As the data is fluctuating a lot between the years (and in addition is not 100% reliable), I would prefer to use four-years-average data (instead of looking at every single year separately) in order to analyze how the main export partners have changed over time. I have the following dataset, called GrossExp3, covering the bilateral exports (in 1000 USD) of 15 reporter countries for all years between (1998 – 2019) to all available partner countries. It covers the following four variables: Year, ReporterName (= exporter) , PartnerName (= export destination), 'TradeValue in 1000 USD' ( = export value to the destination) The PartnerName column also includes an entry, called “All”, which is the total sum of all exports for each year by reporter.

Here is the summary of my data

> summary(GrossExp3)
      Year      ReporterName       PartnerName        TradeValue in 1000 USD
 Min.   :1998   Length:35961       Length:35961       Min.   :       0      
 1st Qu.:2004   Class :character   Class :character   1st Qu.:      39      
 Median :2009   Mode  :character   Mode  :character   Median :     597      
 Mean   :2009                                         Mean   :  134370      
 3rd Qu.:2014                                         3rd Qu.:   10090      
 Max.   :2018                                         Max.   :47471515

My goal is to return a table which shows the percentage of total trade for each exporter to the export destination in percentage of total exports for that period. Instead of every single year, I want to have the average data for the following periods: 2000-2003, 2004-2007, 2008-2011, 2012-2015, 2016-2019.

What I tried My current code (created with support of this amazing community is the following: (At the current moment, it shows the data for each year separately, but I need the average data in the headline)

# install packages
library(data.table)
library(dplyr)
library(tidyr)
library(stringr)
library(plyr)
library(visdat)

# set working directory
setwd("C:/R/R_09.2020/Other Indicators/Bilateral Trade Shift of Partners")

# load data

# create a file path SITC 3 
path1 <- file.path("SITC Rev 3_Data from 1998.csv")

# load cvs data table, call "SITC3" 
SITC3 <- fread(path1, drop = c(1,9,11,13))

# prepare data (SITC3) for analysis
# Filter for GROSS EXPORTS SITC3 (Gross exports = Exports that include intermediate products)
GrossExp3 <- SITC3 %>%
  filter(TradeFlowName == "Gross Exp.", PartnerISO3 != "All", Year != 2019) %>%  # filter for gross exports, remove "All", remove 2019
  select(Year, ReporterName, PartnerName, `TradeValue in 1000 USD`) %>%
  arrange(ReporterName, desc(Year))
# compare with old subset
summary(GrossExp3)
summary(SITC3)

# calculate percentage of total
GrossExp3Main <- GrossExp3 %>%
  group_by(Year, ReporterName) %>%
  add_tally(wt = `TradeValue in 1000 USD`, name = "TotalValue") %>%
  mutate(Percentage = 100 * (`TradeValue in 1000 USD` / TotalValue)) %>%
  arrange(ReporterName, desc(Year), desc(Percentage))
head(GrossExp3Main, n = 20)

# print tables in separate sheets to get an overview about hierarchy of export partners and development over time
SpreadExpMain <- GrossExp3Main %>%
  select(Year, ReporterName, PartnerName, Percentage) %>%
  spread(key = Year, value = Percentage) %>%
  arrange(ReporterName, desc(`2018`))
View(SpreadExpMain) # shows whole table

Here is the head of my data

> head(GrossExp3Main, n = 20)
# A tibble: 20 x 6
# Groups:   Year, ReporterName [7]
    Year ReporterName PartnerName   `TradeValue in 100~ TotalValue Percentage
   <int> <chr>        <chr>                       <dbl>      <dbl>      <dbl>
 1  2018 Angola       China                   24517058.  42096736.      58.2 
 2  2018 Angola       India                    3768940.  42096736.       8.95
 3  2017 Angola       China                   19487067.  34904881.      55.8 
 4  2017 Angola       India                    2890061.  34904881.       8.28
 5  2016 Angola       China                   13923092.  28057500.      49.6 
 6  2016 Angola       India                    1948845.  28057500.       6.95
 7  2016 Angola       United States            1525650.  28057500.       5.44
 8  2015 Angola       China                   14320566.  33924937.      42.2 
 9  2015 Angola       India                    2676340.  33924937.       7.89
10  2015 Angola       Spain                    2245976.  33924937.       6.62
11  2014 Angola       China                   27527111.  58672369.      46.9 
12  2014 Angola       India                    4507416.  58672369.       7.68
13  2014 Angola       Spain                    3726455.  58672369.       6.35
14  2013 Angola       China                   31947235.  67712527.      47.2 
15  2013 Angola       India                    6764233.  67712527.       9.99
16  2013 Angola       United States            5018391.  67712527.       7.41
17  2013 Angola       Other Asia, ~            4007020.  67712527.       5.92
18  2012 Angola       China                   33710030.  70863076.      47.6 
19  2012 Angola       India                    6932061.  70863076.       9.78
20  2012 Angola       United States            6594526.  70863076.       9.31

I am not sure if the results I get up to this point are right? In addition, I have the following questions:

Do you have any recommendation on how to print nice looking tables with R?
How can I better round the percentage data to one number behind the comma?

As I have been stuck with these issues over the week, I would be very grateful for any recommendations on how to solve the issue!!

Wishing you a nice weekend and all the best,

Melike

** EDIT** here is some sample data

dput(head(GrossExp3Main, n = 20))
structure(list(Year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L), ReporterName = c("Angola", 
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola", 
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola", 
"Angola", "Angola", "Angola", "Angola", "Angola"), PartnerName = c("China", 
"India", "United States", "Spain", "South Africa", "Portugal", 
"United Arab Emirates", "France", "Thailand", "Canada", "Indonesia", 
"Singapore", "Italy", "Israel", "United Kingdom", "Unspecified", 
"Namibia", "Uruguay", "Congo, Rep.", "Japan"), `TradeValue in 1000 USD` = c(24517058.342, 
3768940.47, 1470132.736, 1250554.873, 1161852.097, 1074137.369, 
884725.078, 734551.345, 649626.328, 647164.297, 575477.283, 513982.584, 
468914.918, 452453.482, 425616.975, 423008.886, 327921.516, 320586.229, 
299119.102, 264671.779), TotalValue = c(42096736.31, 42096736.31, 
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31, 
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31, 
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31, 
42096736.31, 42096736.31, 42096736.31), Percentage = c(58.2398078593471, 
8.9530467213552, 3.49227247731025, 2.97066942147468, 2.75995765667944, 
2.55159298119945, 2.10164767046284, 1.74491281127062, 1.54317504144777, 
1.53732653342598, 1.3670353890672, 1.22095589599877, 1.11389850877492, 
1.07479467925527, 1.01104506502775, 1.00484959899258, 0.778971352043039, 
0.761546516668669, 0.710551762961598, 0.62872279943737)), row.names = c(NA, 
-20L), groups = structure(list(Year = 2018L, ReporterName = "Angola", 
    .rows = structure(list(1:20), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = 1L, class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))
>

Please use dput(head(GrossExp3Main, n = 20)) for posting a snippet of you data. This can then easily be copy-pasted by other users to read into R. — Baraliuh
For printing tables, please check haozhu233.github.io/kableExtra/awesome_table_in_pdf.pdf. Regarding the question about formatting, please check the library scales (scales.r-lib.org). — bttomio
Can you post sample data in dput format? Please edit the question with the output of dput(GrossExp3). Or, if it is too big with the output of dput(head(GrossExp3, 20)). — Rui Barradas
Thanks a lot for your quick comments! I just added some sample data with dput — Mels
Thanks as well for the reference to the nice latex table document. It looks really hepful!! — Mels

Sinh Nguyen Sinh Nguyen · Accepted Answer · 2021-02-12T22:55:11

To do what you want need an additional variable to group the year together. I used cut to do that.

library(dplyr)
# Define the cut breaks and labels for each group
# The cut define by the starting of each group and when using cut function
# I would use param right = FALSE to have the desire cut that I want here.
year_group_break <- c(2000, 2004, 2008, 2012, 2016, 2020)
year_group_labels <- c("2000-2003", "2004-2007", "2008-2011", "2012-2015", "2016-2019")

data %>%
  # create the year group variable
  mutate(year_group = cut(Year, breaks = year_group_break,
    labels  = year_group_labels,
    include.lowest = TRUE, right = FALSE)) %>%
  # calculte the total value for each Reporter + Partner in each year group
  group_by(year_group, ReporterName, PartnerName) %>%
  summarize(`TradeValue in 1000 USD` = sum(`TradeValue in 1000 USD`),
    .groups = "drop") %>%
  # calculate the percentage value for Partner of each Reporter/Year group
  group_by(year_group, ReporterName) %>%
  mutate(Percentage = `TradeValue in 1000 USD` / sum(`TradeValue in 1000 USD`)) %>%
  
  ungroup()

Sample output

   year_group ReporterName PartnerName          `TradeValue in 1000 USD` Percentage
   <fct>      <chr>        <chr>                                   <dbl>      <dbl>
 1 2016-2019  Angola       Canada                                647164.    0.0161 
 2 2016-2019  Angola       China                               24517058.    0.609  
 3 2016-2019  Angola       Congo, Rep.                           299119.    0.00744
 4 2016-2019  Angola       France                                734551.    0.0183 
 5 2016-2019  Angola       India                                3768940.    0.0937 
 6 2016-2019  Angola       Indonesia                             575477.    0.0143 
 7 2016-2019  Angola       Israel                                452453.    0.0112 
 8 2016-2019  Angola       Italy                                 468915.    0.0117 
 9 2016-2019  Angola       Japan                                 264672.    0.00658
10 2016-2019  Angola       Namibia                               327922.    0.00815
11 2016-2019  Angola       Portugal                             1074137.    0.0267 
12 2016-2019  Angola       Singapore                             513983.    0.0128 
13 2016-2019  Angola       South Africa                         1161852.    0.0289 
14 2016-2019  Angola       Spain                                1250555.    0.0311 
15 2016-2019  Angola       Thailand                              649626.    0.0161 
16 2016-2019  Angola       United Arab Emirates                  884725.    0.0220 
17 2016-2019  Angola       United Kingdom                        425617.    0.0106 
18 2016-2019  Angola       United States                        1470133.    0.0365 
19 2016-2019  Angola       Unspecified                           423009.    0.0105 
20 2016-2019  Angola       Uruguay                               320586.    0.00797

R: Average years in time series per group

1 Answers