I have the following dataset, called GrossExp3, covering the bilateral Exports (in 1000 USD) of 15 reporter Countries for all Years from (1998 – 2018) to all available partner countries It covers the following four variables: Year, ReporterName (= exporter) , PartnerName (= export destination), 'TradeValue in 1000 USD' ( = export value to the destination) The PartnerName column also includes an entry, called “All”, which is the total sum of all exports for each Year by reporter
Here is the head of my data
> head(GrossExp3, n = 20)
Year ReporterName PartnerName TradeValue in 1000 USD
1: 2018 Angola Afghanistan 19.353
2: 2018 Angola Albania 2.380
3: 2018 Angola Andorra 0.326
4: 2018 Angola United Arab Emirates 884725.078
5: 2018 Angola Argentina 61.362
6: 2018 Angola Armenia 60.105
7: 2018 Angola American Samoa 12.007
8: 2018 Angola Antigua and Barbuda 422.006
9: 2018 Angola Australia 40220.092
10: 2018 Angola Austria 433.699
Here is the summary of my data
> summary(GrossExp3)
Year ReporterName PartnerName TradeValue in 1000 USD
Min. :1998 Length:37398 Length:37398 Min. : 0
1st Qu.:2004 Class :character Class :character 1st Qu.: 39
Median :2009 Mode :character Mode :character Median : 596
Mean :2009 Mean : 135605
3rd Qu.:2014 3rd Qu.: 10209
Max. :2019 Max. :47471515
My goal is to filter the most important export destinations for each country by year by percentage of total exports, (all Percentage scores over 1%) and track how this changed over time. In particular, I would like to
- add an additional column, called "Percentage", with the percentage of total exports by year (the sum of all entries in the 'TradeValue in 1000 USD' by Year)
- drop all Percentage < 1
- summarize the data for each ReporterName by Year
- arrange by descending Percentage value
What I tried So far I tried it by filtering first for one ReporterName and one year
ONE_country <- GrossExp3 %>%
group_by(Year, ReporterName) %>%
filter(ReporterName == "Botswana", PartnerName != "All", Year == 2018) %>%
arrange(desc(`TradeValue in 1000 USD`)) %>%
summarize(Year, ReporterName, PartnerName, Percent = `TradeValue in 1000 USD`/sum(`TradeValue in 1000 USD`)*100)
head(ONE_country, n = 10)
I am not sure if the results I get here are right. In addition, I want the information for all countries and years to remain in the same dataset. Also I have not managed to drop all Percentage > 1, and would like the Percentage to be round without entries after the comma.
Another question is, why the summarize function does not return all columns, if I don't write them in the function?
As I have been stuck with these issues over the weekend, I would be very grateful for any recommendations on how to solve the issue!! All the best, Melike