I'm trying to show duplicates by group using data.table. More specifically I'm trying to find out whether there are multiple observations for a country in a given year. Here's a sample dataset:
# load data.table package, assuming it's installed
library(data.table)
# create dataset
year <- rep(2010:2012, 3)
country <- c(rep("A", 3), rep("B", 3), rep("C", 3))
value <- sample(2:200, 9, replace = TRUE)
df <- cbind(year, country, value)
df <- rbind(df, c(2012, "C", 20))
df # show data
# put data frame in data.table format and set year as key
dt <- data.table(df)
setkey(dt, year)
Note how for 2012 the country "C" is two observations, whereas all other country observations are unique by year.
I'd expected that the following code would provide me the name of the duplicate country and year:
dt[duplicated(country), country, by = year]
Instead, it shows all data from 2011 onwards, which implies that the duplicated function is applied to the entire data.table and not by year. How would I go about to extract the single duplicate value of country "C" in 2012 using data.table?