I have a (directed) dyadic dataset that looks something like this (see below). What I want to do now is to keep just one observation per year. So in this case only one observation for 1992 (AFG 1992) and one in 1993 (AFG 1993), while deleting other observations. It doesn't matter which observation from the same year I keep in the data (not interested in country2).
country1 country2 year X X1
Afghanistan Colombia 1992 1 0.44
Afghanistan Venezuela 1992 1 0.45
Afghanistan Peru 1992 1 0.46
Afghanistan Brazil 1992 1 0.47
Afghanistan Bolivia 1992 1 0.48
Afghanistan Chile 1992 1 0.49
Afghanistan Argentina 1992 1 0.50
Afghanistan Uruguay 1993 0 0.51
Afghanistan USA 1993 0 0.52
Afghanistan Canada 1993 0 0.53
Afghanistan UK 1993 0 0.54
Afghanistan Netherlands 1993 0 0.55
Afghanistan Belgium 1993 0 0.56
Afghanistan Luxembourg 1993 0 0.57
Afghanistan France 1993 0 0.58
My try:
newdata<- data %>%
group_by(country1,year) %>%
summarise() %>%
select(unique.x=country1, unique.y=year)
This works BUT how do I keep all other variables from "data" in the "newdata"? I can't think of any way of doing this (which I find more practical). Any help?
Desired outcome
country1 year X
Afghanistan 1991 1
Afghanistan 1992 0
dput(data) structure(list(country1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Afghanistan", class = "factor"), country2 = structure(c(8L, 33L, 24L, 5L, 4L, 7L, 1L, 32L, 31L, 6L, 30L, 21L, 3L, 19L, 14L, 29L, 27L, 26L, 15L, 25L, 2L, 17L, 10L, 18L, 13L, 28L, 23L, 11L, 9L, 16L, 12L, 20L, 22L), .Label = c("Argentina", "Austria", "Belgium", "Bolivia, Plurinational State of", "Brazil", "Canada", "Chile", "Colombia", "Cuba", "Czech Republic", "Denmark", "Dominican Republic", "Finland", "France", "Germany", "Guinea-Bissau", "Hungary", "Italy", "Luxembourg", "Mauritania", "Netherlands", "Niger", "Norway", "Peru", "Poland", "Portugal", "Spain", "Sweden", "Switzerland", "United Kingdom", "United States", "Uruguay", "Venezuela, Bolivarian Republic of"), class = "factor"), year = c(1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L), X = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), X1 = c(0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76)), .Names = c("country1", "country2", "year", "X", "X1"), class = "data.frame", row.names = c(NA, -33L))
unique(data)
in your case - David Arenburglibrary(data.table) ; unique(setDT(data), by = "year")
. Or just useduplicated
from base R Ordistinct(data, year, .keep_all = TRUE)
from tibbliverse - David Arenburg