2
votes

The question title might not completely reflect my problem, and that's perhaps the reason why I cannot come up with a solution for my problem. I have read simmilar questions (e.g., Assign a value to column based on condition across rows or R: Generate a dummy variable based on the existence of one column' value in another column) and on-line guides on creating dummy variables in R (I'm quite new to R), but no-one seems to tackle my problem, or perhaps I just couldn't see how. I have a dataframe like this:

df <- data.frame("Country" = c("US", "US", "US", "US", "IT", "IT", "IT","FR","FR"),          
             "Time" = c(1, 1, 2, 3, 1, 2, 1, 2, 3))

The dataframe is more complex, but each row is an observation of one country at a given moment in time. I want to create a dummy variable that takes the value 1 for an observation if the country to which this observation is assigned was measured at time 1, 2 3 and 0 otherwise and another dummy that takes the value 1 if the country was measured at the moments 2 and 3 but not one, and 0 otherwise. So the dataframe would look like:

df2 <- data.frame("Country" = c("US", "US", "US", "US", "IT", "IT", "IT","FR","FR"),          
              "Time" = c(1, 1, 2, 3, 1, 2, 1, 2, 3),
              "DummyTime123" = c(1, 1, 1, 1, 0, 0, 0, 0, 0),
              "DummyTime23" = c(0, 0, 0, 0, 0, 0, 0, 1, 1))

So, since US was measured at times 1, 2 and 3, american observations take the value 1, and the rest of the observations take the value 0. However, since FR was measured at times 2 and 3, French observations take the value 1, and the rest take the value 0 (note that also US takes the value 0, because it was measured at times 1, 2 and 3, and not only at times 2 and 3).

I have tried to create a dummy for instance with if_else within the tidyverse, like

DummyTime123 = ifelse(country = country, time = 1 & time = 2 & time = 3)

But this does not work, quite reasonably, because no single observation is measured at time 1, 2 and 3. Instead, I want to create a dummy for that observation based on whether the value of one column for this observation (country) is measured at several (and specific) times. I have also considered that my dataframe could be untidy, but I cannot see how and I don't think that's the problem. Of course, I could do this manually (that's what I did so far), but since the dataset is quite large, I would like to find an automated solution.

¿Does anybody have a solution for this problem? It would be really nice if there was a solution for this within the tidyverse, but of course any solution would be helpful.

1

1 Answers

2
votes

With tidyverse you could try the following.

Use group_by with Country to consider all the Time values within each Country.

To satisfy DummyTime123 criteria, you need all values of 1, 2, and 3 in the Time values within a Country. If TRUE, then using + this becomes 1.

For DummyTime23, it sounds like you want both 2 and 3 in Time but do not want any values of Time to be 1. Using & you can make sure both criteria are satisfied.

Let me know if this provides the results expected.

library(tidyverse)

df %>%
  group_by(Country) %>%
  mutate(DummyTime123 = +all(1:3 %in% Time),
         DummyTime23 = +(all(2:3 %in% Time) & !any(Time == 1)))

Output

  Country  Time DummyTime123 DummyTime23
  <chr>   <dbl>        <int>       <int>
1 US          1            1           0
2 US          1            1           0
3 US          2            1           0
4 US          3            1           0
5 IT          1            0           0
6 IT          2            0           0
7 IT          1            0           0
8 FR          2            0           1
9 FR          3            0           1