0
votes

I really need your help with this. I have a panel dataframe which looks something like this

     Name            A                  B      

   1 Marco          01/09/2014         NA    
   2 Marco          NA                 01/01/2015    
   3 Marco          02/01/2015         NA    
   4 Luca           01/01/2015         NA    
   5 Luca           NA                 31/01/2015                        
   6 Silvia         NA                 15/01/2015  

and I want to create a dummy variable taking value 1 if (condition 1), in column A, observations do not show a 2014-date OR (condition 2) if, in column B, observations show a 2015-date AND, at the same time, there is at least another observation for that individual but none of them being associated with a 2014-date in column A. In other words, I do not know how to impose a condition for the dummy which checks all the other observations related to the same individual (identified in the column "Name"). The result I want is something like this

         Name            A                  B                     dummy

      1  Marco          01/09/2014         NA                     0    
      2  Marco          NA                 01/01/2015             0     
      3  Marco          02/01/2015         NA                     1    
      4  Luca           01/01/2015         NA                     1     
      5  Luca           NA                 31/01/2015             1                        
      6  Silvia         NA                 15/01/2015             0    

In the example above, the value of the dummy at the first observation is 0 because of the 2014-date in column A (condition 1 not verified). At the second observation, the dummy takes value 0 because, despite the fact of the 2015-date in column B, the same individual (Marco) presents a 2014-date in Column A in at least one of the other observations related to him (observation 1 in this case). Observation 4 instead shows the dummy equal to 1 since the date in column A is 2015. Observation 5 shows the dummy equal to 1 since, despite the 2015-date in column B, the same individual (Luca) does not have other observations with a 2014-date in column A (it has a 2015-date in observation 4). Finally, the dummy associated with Silvia must be 0 since, despite the 2015-date in column B, there is no other Silvia's observation in the dataframe.

I hope it is not too twisted and that I expressed my idea. Let me know if this is not clear. Besides the conditions themselves, if you help me just with the way to impose conditions accross different observations related to the same individual it would already help a lot.

Thank you all! Marco

           structure(list(Name = c("Marco", "Marco", "Marco", "Luca", 
             "Luca", 
             "Silvia"), A = structure(c(1409529600, NA, 1420156800, 
             1420070400, 
             NA, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), B = 
             structure(c(NA, 
             1420070400, NA, NA, 1422662400, 1421280000), class = 
             c("POSIXct", 
             "POSIXt"), tzone = "UTC")), row.names = c(NA, -6L), class = 
             c("tbl_df", 
             "tbl", "data.frame"))
2
It would be helpful if you dput() your sample data.frame so that we can start with the same dat types you have.vaettchen
I did it! ThanksMarco Mello
I uploaded this simplified version of the dataframe since in the complete one there are many other things and variables that I should explain here making the questions even more complicated. However, if you manage to help me with the issue I raised in this very simple dataframe, it would be enough to me to deal with the big and complete dataset I haveMarco Mello

2 Answers

0
votes

The NAs make it a little tricky, but here's a direct method, adding the implied condition "A is not NA" to the first case. Using %in% instead of == helps with other NA issues because 1 %in% NA is FALSE, but 1 == NA is NA.

dd %>% group_by(Name) %>%
  mutate(dummy = as.integer((
      !format(A, "%Y") %in% "2014" & !is.na(A)
    ) | (
      format(B, "%Y") %in% "2015"
      & n() > 1 
      & !any(format(A, "%Y") %in% "2014")
    )
  ))
# # A tibble: 6 x 4
# # Groups:   Name [3]
#   Name   A                   B                   dummy
#   <chr>  <dttm>              <dttm>              <int>
# 1 Marco  2014-09-01 00:00:00 NA                      0
# 2 Marco  NA                  2015-01-01 00:00:00     0
# 3 Marco  2015-01-02 00:00:00 NA                      1
# 4 Luca   2015-01-01 00:00:00 NA                      1
# 5 Luca   NA                  2015-01-31 00:00:00     1
# 6 Silvia NA                  2015-01-15 00:00:00     0
0
votes

You can use library lubridate and function from it year, to receive year from date. Other note that if NA in if condition it gives NA, that is why it is better to convert NA to some values to use in if statements. Example of code is:

    library(lubridate)

    Marco <- read.csv("Marcoset.csv",stringsAsFactors=F ) 
    Marco$A[is.na(Marco$A)] <- "01/01/0001"
    Marco$B[is.na(Marco$B)] <- "01/01/0001"
    Marco$A <- as.Date(Marco$A, "%d/%m/%Y")
    Marco$B <- as.Date(Marco$B, "%d/%m/%Y")

    Obs <-  Marco%>%
            group_by(Name)%>%
            mutate(i2014 = sign(sum(ifelse(year(A)=="2014",1,0))))%>%
            filter(year(A) !="2014" & year(A)!="0001")%>%
            select(Name, i2014)%>%
            group_by(Name, i2014)%>%
            summarise(obs=n()) 

      Marco <- Marco%>%
      left_join(Obs, by="Name")%>%
      mutate(dummy= ifelse(((year(A)!="2014"& year(A)!="1")|(year(B)=="2015" & obs>=2 & i2014==0)),1,0))%>%
      select(-obs, -i2014)