1
votes

I am trying to extract the month from a date formatted as a character. I first convert the strings in column 1 to a date format (class character). However, I am unable to extract the month.

df <- data.frame(col1=c(44008, 44001, NA, 77), col2=c(43476, 43479, 77, NA))
df
  col1  col2
1 44008 43476
2 44001 43479
3    NA    77
4    77    NA

My attempt

df %>%
# convert to date
mutate_at(vars(col1), funs(case_when((nchar(col1)>4)~(as.character(as.Date(as.numeric(col1), origin="1899-12-30"))), TRUE~as.character(col1))) ) %>% 
  mutate(
    # extract month
    col1_month = case_when(nchar(col1)==10~lubridate::month(col1), TRUE~as.character(col1)))

I've tried various tricks but can't seem to get this to work.

Desired output:

        col1  col2 col1_month
1 2020-06-26 43476          6
2 2020-06-19 43479          6
3       <NA>    77         NA
4         77    NA         77
3
Are you sure you want col1_month to equal col1 for values with 4 or less digits? I would think NA is a better choice.neilfws

3 Answers

2
votes

I think your logic and code is rather more complex than it needs to be. I'd just convert everything in col1 to a date, store that in its own column and extract the months.

df <- df %>% 
  mutate(Date = as.Date(col1, origin="1899-12-30"), 
         Month = month(Date))

   col1  col2       Date Month
1 44008 43476 2020-06-26     6
2 44001 43479 2020-06-19     6
3    NA    77       <NA>    NA
4    77    NA 1900-03-17     3

And then apply the logic to use dates in a certain range later on. For example:

df %>% 
  filter(nchar(col1) > 4) %>%
  # do stuff...
1
votes

With case_when, the 'type' should be the same across the output from all the conditions. In the first case, we convert to 'Date' from the numeric column across the 'col' columns, then reconvert it to character class to avoid a clash with other values. In the 'col1_month', the 'col1' is converted to Date again before extracting the month

library(dplyr)
library(lubridate)
df %>% 
  mutate(across(starts_with('col'), 
    ~ case_when(nchar(.) > 4 ~ 
         as.character(as.Date(., origin = '1899-12-30')),
       TRUE ~ as.character(.))), 
   col1_month = case_when(nchar(col1) == 10 ~ 
            as.character(month(as.Date(col1))), TRUE ~ col1))

-output

#        col1       col2 col1_month
#1 2020-06-26 2019-01-11          6
#2 2020-06-19 2019-01-14          6
#3       <NA>         77       <NA>
#4         77       <NA>         77
1
votes

A base R approach can be:

#Data
df <- data.frame(col1=c(44008, 44001, NA, 77), col2=c(43476, 43479, 77, NA))
#Compute date
df$Date <- as.Date(df$col1, origin = "1899-12-30")
#Extract month
df$Month <- as.numeric(format(df$Date,'%m'))
#Format for filling
df$Date <- as.character(df$Date)
df$Date <- ifelse(nchar(df$col1)>4,df$Date,df$col1)
df$Month <- ifelse(nchar(df$col1)>4,df$Month,df$col1)

Output:

   col1  col2       Date Month
1 44008 43476 2020-06-26     6
2 44001 43479 2020-06-19     6
3    NA    77       <NA>    NA
4    77    NA         77    77