Randomly creating a var that is zero or one by group, and an additional variable (zero or one), if the variable was one

Question

I have sample data as follows:

panelID= c(1:50)
year= c(2005, 2010)
country = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
urban = c("A", "B", "C")
indust = c("D", "E", "F")
sizes = c(1,2,3,4,5)
n <- 2
library(data.table)
library(dplyr)
set.seed(123)
DT <- data.table(   country = rep(sample(country, length(panelID), replace = T), each = n),
                    year = c(replicate(length(panelID), sample(year, n))))
DT [, uniqueID := .I]                                                         # Creates a unique ID     
DT[DT == 0] <- NA 
DT$sales[DT$sales< 0] <- NA 
DT <- as.data.frame(DT)
DT <- DT %>%
group_by(country) %>%
mutate(base_rate = as.integer(runif(1, 12.5, 37.5))) %>%
group_by(country, year) %>%
mutate(tax_rate = base_rate + as.integer(runif(1,-2.5,+2.5)))

I would like to create an extra variable Vote that for each country-year pair is either 1 or 0.

Then another variable Vote_won which is either 1 or 0, if Vote==1.

I tried:

DT <- DT %>%
group_by(country, year) %>%
mutate(Vote = sample(c(0,1),3)) %>%
group_by(country, year) %>%
mutate(Vote_won = ifelse(Vote=1, sample(c(0,1),1),0))

But it says:

Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE'

Martin Gal Martin Gal · Accepted Answer · 2020-06-29T16:23:47

mutate doesn't change your grouping, so you don't have to use group_by with the same arguments twice. Removing the second group_by-statement opens the possibility to merge the two mutate-functions: Therefore

DT %>%
  group_by(country, year) %>%
  mutate(Vote = sample(c(0,1),1) ,
         Vote_won = ifelse(Vote==1, sample(c(0,1),1),0))

should give you what you are looking for.

Randomly creating a var that is zero or one by group, and an additional variable (zero or one), if the variable was one

1 Answers