Data transformation: from dyadic to observational data in R

Question

I have a (directed) dyadic dataset that looks something like this (see below). What I want to do now is to keep just one observation per year. So in this case only one observation for 1992 (AFG 1992) and one in 1993 (AFG 1993), while deleting other observations. It doesn't matter which observation from the same year I keep in the data (not interested in country2).

 country1   country2    year    X   X1
Afghanistan Colombia    1992    1   0.44
Afghanistan Venezuela   1992    1   0.45
Afghanistan Peru        1992    1   0.46
Afghanistan Brazil      1992    1   0.47
Afghanistan Bolivia     1992    1   0.48
Afghanistan Chile       1992    1   0.49
Afghanistan Argentina   1992    1   0.50
Afghanistan Uruguay     1993    0   0.51
Afghanistan USA         1993    0   0.52
Afghanistan Canada      1993    0   0.53
Afghanistan UK          1993    0   0.54
Afghanistan Netherlands 1993    0   0.55
Afghanistan Belgium     1993    0   0.56
Afghanistan Luxembourg  1993    0   0.57
Afghanistan France      1993    0   0.58

My try:

newdata<- data %>% 
  group_by(country1,year) %>%
  summarise() %>%
  select(unique.x=country1, unique.y=year)

This works BUT how do I keep all other variables from "data" in the "newdata"? I can't think of any way of doing this (which I find more practical). Any help?

Desired outcome

    country1     year   X
    Afghanistan 1991   1
    Afghanistan 1992   0

dput(data) structure(list(country1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Afghanistan", class = "factor"), country2 = structure(c(8L, 33L, 24L, 5L, 4L, 7L, 1L, 32L, 31L, 6L, 30L, 21L, 3L, 19L, 14L, 29L, 27L, 26L, 15L, 25L, 2L, 17L, 10L, 18L, 13L, 28L, 23L, 11L, 9L, 16L, 12L, 20L, 22L), .Label = c("Argentina", "Austria", "Belgium", "Bolivia, Plurinational State of", "Brazil", "Canada", "Chile", "Colombia", "Cuba", "Czech Republic", "Denmark", "Dominican Republic", "Finland", "France", "Germany", "Guinea-Bissau", "Hungary", "Italy", "Luxembourg", "Mauritania", "Netherlands", "Niger", "Norway", "Peru", "Poland", "Portugal", "Spain", "Sweden", "Switzerland", "United Kingdom", "United States", "Uruguay", "Venezuela, Bolivarian Republic of"), class = "factor"), year = c(1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L), X = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), X1 = c(0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76)), .Names = c("country1", "country2", "year", "X", "X1"), class = "data.frame", row.names = c(NA, -33L))

and I'm not sure why. Perhaps because I have country1 and country2 variables in my (dyadic) data. — FKG
If you want one observation per year you could do library(data.table) ; unique(setDT(data), by = "year"). Or just use duplicated from base R Or distinct(data, year, .keep_all = TRUE) from tibbliverse — David Arenburg

NWPope NWPope · Accepted Answer · 2017-10-25T21:35:29

newdata <- olddata[!duplicated(olddata$year),]

Answers the question

newdata <- olddata[!duplicated(paste(olddata$country1, olddata$year)),]

Gives you what you want

Data transformation: from dyadic to observational data in R

3 Answers