1
votes

I'm trying to split a date and time column into separate day, month, hour, minute, second, day of week columns. I'm using the lubridate and mutuate function but when I try to use the below code I get this error: Warning message: All formats failed to parse. No formats found.

My new columns are created but they all contain NA's - wondering if someone can help?

My Column looks like this:

tpep_pickup_datetime

01/07/2019 00:51:15
01/07/2019 00:46:30
01/07/2019 00:25:35

and my code is this:

taxidata3 <- taxidata2 %>%
  mutate(tpep_pickup_datetime = mdy_hms(tpep_pickup_datetime), 
         day = day(tpep_pickup_datetime),
         month = month(tpep_pickup_datetime), 
         year = year(tpep_pickup_datetime),
         dayofweek = wday(tpep_pickup_datetime),
         hour = hour(tpep_pickup_datetime),
         minute = minute(tpep_pickup_datetime),
         second = second(tpep_pickup_datetime))
3
What is the output of mdy_hms is it month/day/ or day/monthakrun
Hi akrun - It is day/MonthKarC
Then your code should be dmy_hmsakrun

3 Answers

3
votes

Based on the comments from the OP, the date format is day/month/... instead of month/day/... Here, we need dmy_hms. So, each letter signifies the order of occurrence

library(lubridate)
library(dplyr)
taxidata3 <- taxidata2 %>%
  mutate(tpep_pickup_datetime = dmy_hms(tpep_pickup_datetime), 
         day = day(tpep_pickup_datetime),
         month = month(tpep_pickup_datetime), 
         year = year(tpep_pickup_datetime),
         dayofweek = wday(tpep_pickup_datetime),
         hour = hour(tpep_pickup_datetime),
         minute = minute(tpep_pickup_datetime),
         second = second(tpep_pickup_datetime))
2
votes

This is a stringr solution using regex to match the date components:

Data:

df <- data.frame(
  tpep_pickup_datetime = c("01/07/2019 00:51:15", "01/07/2019 00:46:30", "01/07/2019 00:25:35")
)

Solution:

library(stringr)
df$day <- str_extract(df$tpep_pickup_datetime, "^\\d{2}") 
df$month <- str_extract(df$tpep_pickup_datetime, "(?<=/)\\d{2}")
df$year <- str_extract(df$tpep_pickup_datetime, "\\d{4}")
df$hour <- str_extract(df$tpep_pickup_datetime, "(?<= )\\d{2}(?=:)")
df$minute <- str_extract(df$tpep_pickup_datetime, "(?<=:)\\d{2}(?=:)")
df$second <- str_extract(df$tpep_pickup_datetime, "(?<=:)\\d{2}$")

Result:

df
  tpep_pickup_datetime day month year hour minute second
1  01/07/2019 00:51:15  01    07 2019   00     51     15
2  01/07/2019 00:46:30  01    07 2019   00     46     30
3  01/07/2019 00:25:35  01    07 2019   00     25     35
1
votes

Here is an other alternative using the separate function. The code and the output is as follows :-

library(tidyverse)
df <- data.frame(
    tpep_pickup_datetime = c("01/07/2019 00:51:15", "01/07/2019 00:46:30", 
                             "01/07/2019 00:25:35"))

df %>% 
    separate(tpep_pickup_datetime, c("Day", "Month", "Year_time"),
             sep = "/", remove = FALSE) %>% 
    separate(Year_time, c("Year", "Time"),
             sep = " ", remove = TRUE) %>% 
    separate(Time, c("Hour", "Minute", "Second"),
             sep = ":", remove = TRUE) 

# tpep_pickup_datetime Day Month Year Hour Minute Second
#1  01/07/2019 00:51:15  01    07 2019   00     51     15
#2  01/07/2019 00:46:30  01    07 2019   00     46     30
#3  01/07/2019 00:25:35  01    07 2019   00     25     35