Dear Community,
I am working with R and looking for trends in time series data of bilateral exports over a duration of 20 years. As the data is fluctuating a lot between the years (and in addition is not 100% reliable), I would prefer to use four-years-average data (instead of looking at every single year separately) in order to analyze how the main export partners have changed over time. I have the following dataset, called GrossExp3, covering the bilateral exports (in 1000 USD) of 15 reporter countries for all years between (1998 – 2019) to all available partner countries. It covers the following four variables: Year, ReporterName (= exporter) , PartnerName (= export destination), 'TradeValue in 1000 USD' ( = export value to the destination) The PartnerName column also includes an entry, called “All”, which is the total sum of all exports for each year by reporter.
Here is the summary of my data
> summary(GrossExp3)
Year ReporterName PartnerName TradeValue in 1000 USD
Min. :1998 Length:35961 Length:35961 Min. : 0
1st Qu.:2004 Class :character Class :character 1st Qu.: 39
Median :2009 Mode :character Mode :character Median : 597
Mean :2009 Mean : 134370
3rd Qu.:2014 3rd Qu.: 10090
Max. :2018 Max. :47471515
My goal is to return a table which shows the percentage of total trade for each exporter to the export destination in percentage of total exports for that period. Instead of every single year, I want to have the average data for the following periods: 2000-2003, 2004-2007, 2008-2011, 2012-2015, 2016-2019.
What I tried My current code (created with support of this amazing community is the following: (At the current moment, it shows the data for each year separately, but I need the average data in the headline)
# install packages
library(data.table)
library(dplyr)
library(tidyr)
library(stringr)
library(plyr)
library(visdat)
# set working directory
setwd("C:/R/R_09.2020/Other Indicators/Bilateral Trade Shift of Partners")
# load data
# create a file path SITC 3
path1 <- file.path("SITC Rev 3_Data from 1998.csv")
# load cvs data table, call "SITC3"
SITC3 <- fread(path1, drop = c(1,9,11,13))
# prepare data (SITC3) for analysis
# Filter for GROSS EXPORTS SITC3 (Gross exports = Exports that include intermediate products)
GrossExp3 <- SITC3 %>%
filter(TradeFlowName == "Gross Exp.", PartnerISO3 != "All", Year != 2019) %>% # filter for gross exports, remove "All", remove 2019
select(Year, ReporterName, PartnerName, `TradeValue in 1000 USD`) %>%
arrange(ReporterName, desc(Year))
# compare with old subset
summary(GrossExp3)
summary(SITC3)
# calculate percentage of total
GrossExp3Main <- GrossExp3 %>%
group_by(Year, ReporterName) %>%
add_tally(wt = `TradeValue in 1000 USD`, name = "TotalValue") %>%
mutate(Percentage = 100 * (`TradeValue in 1000 USD` / TotalValue)) %>%
arrange(ReporterName, desc(Year), desc(Percentage))
head(GrossExp3Main, n = 20)
# print tables in separate sheets to get an overview about hierarchy of export partners and development over time
SpreadExpMain <- GrossExp3Main %>%
select(Year, ReporterName, PartnerName, Percentage) %>%
spread(key = Year, value = Percentage) %>%
arrange(ReporterName, desc(`2018`))
View(SpreadExpMain) # shows whole table
Here is the head of my data
> head(GrossExp3Main, n = 20)
# A tibble: 20 x 6
# Groups: Year, ReporterName [7]
Year ReporterName PartnerName `TradeValue in 100~ TotalValue Percentage
<int> <chr> <chr> <dbl> <dbl> <dbl>
1 2018 Angola China 24517058. 42096736. 58.2
2 2018 Angola India 3768940. 42096736. 8.95
3 2017 Angola China 19487067. 34904881. 55.8
4 2017 Angola India 2890061. 34904881. 8.28
5 2016 Angola China 13923092. 28057500. 49.6
6 2016 Angola India 1948845. 28057500. 6.95
7 2016 Angola United States 1525650. 28057500. 5.44
8 2015 Angola China 14320566. 33924937. 42.2
9 2015 Angola India 2676340. 33924937. 7.89
10 2015 Angola Spain 2245976. 33924937. 6.62
11 2014 Angola China 27527111. 58672369. 46.9
12 2014 Angola India 4507416. 58672369. 7.68
13 2014 Angola Spain 3726455. 58672369. 6.35
14 2013 Angola China 31947235. 67712527. 47.2
15 2013 Angola India 6764233. 67712527. 9.99
16 2013 Angola United States 5018391. 67712527. 7.41
17 2013 Angola Other Asia, ~ 4007020. 67712527. 5.92
18 2012 Angola China 33710030. 70863076. 47.6
19 2012 Angola India 6932061. 70863076. 9.78
20 2012 Angola United States 6594526. 70863076. 9.31
I am not sure if the results I get up to this point are right? In addition, I have the following questions:
- Do you have any recommendation on how to print nice looking tables with R?
- How can I better round the percentage data to one number behind the comma?
As I have been stuck with these issues over the week, I would be very grateful for any recommendations on how to solve the issue!!
Wishing you a nice weekend and all the best,
Melike
** EDIT** here is some sample data
dput(head(GrossExp3Main, n = 20))
structure(list(Year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L), ReporterName = c("Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola"), PartnerName = c("China",
"India", "United States", "Spain", "South Africa", "Portugal",
"United Arab Emirates", "France", "Thailand", "Canada", "Indonesia",
"Singapore", "Italy", "Israel", "United Kingdom", "Unspecified",
"Namibia", "Uruguay", "Congo, Rep.", "Japan"), `TradeValue in 1000 USD` = c(24517058.342,
3768940.47, 1470132.736, 1250554.873, 1161852.097, 1074137.369,
884725.078, 734551.345, 649626.328, 647164.297, 575477.283, 513982.584,
468914.918, 452453.482, 425616.975, 423008.886, 327921.516, 320586.229,
299119.102, 264671.779), TotalValue = c(42096736.31, 42096736.31,
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31,
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31,
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31,
42096736.31, 42096736.31, 42096736.31), Percentage = c(58.2398078593471,
8.9530467213552, 3.49227247731025, 2.97066942147468, 2.75995765667944,
2.55159298119945, 2.10164767046284, 1.74491281127062, 1.54317504144777,
1.53732653342598, 1.3670353890672, 1.22095589599877, 1.11389850877492,
1.07479467925527, 1.01104506502775, 1.00484959899258, 0.778971352043039,
0.761546516668669, 0.710551762961598, 0.62872279943737)), row.names = c(NA,
-20L), groups = structure(list(Year = 2018L, ReporterName = "Angola",
.rows = structure(list(1:20), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = 1L, class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
>
scales
(scales.r-lib.org). – bttomiodput
format? Please edit the question with the output ofdput(GrossExp3)
. Or, if it is too big with the output ofdput(head(GrossExp3, 20))
. – Rui Barradas