0
votes

I have a large csv file in which relevant dates are categorical and formatted in one column as follows: "Thu, 21 Jan 2012 04:59:00 -0000". I am trying to use as.Date, but it doesn't seem to be working. It would be create to have several columns for weekday, day, month, year, but I am happy to settle for one column at this point. Any suggestions?

UPDATE QUESTION: Each row has a different date in the above format (weekday, day, month, year, hour, minutes, seconds. I did not make that clear. How do I transform each date in the column?

4
Please dput() sample data in the future. - Hack-R
as.POSIXct("Thu, 21 Jan 2012 04:59:00 -0000", format = '%a, %d %b %Y %H:%M:%S %z', tz = 'UTC') - alistaire
I am new around here. by dput(), do you mean this: ("Wed, 31 Mar 2010 23:22:00 -0000"), class = "factor") - CupAJoe
Whoa. as.POSIXct is amazing. I transformed it in seconds. Thank you! - CupAJoe

4 Answers

3
votes

The anytime package can parse this without a format:

R> anytime("Thu, 21 Jan 2012 04:59:00 -0000")
[1] "2012-01-21 04:59:00 CST"
R> 

It returns a POSIXct you can then operate on, or just format(), at will. It also has a simpler variant anydate() which returns a Date object instead.

2
votes
library(lubridate)

my_date <- "Thu, 21 Jan 2012 04:59:00 -0000"

# Get it into date format
my_date <- dmy_hms(my_date)

# Use convenience functions to set up the columns you wanted
data.frame(day=day(my_date), month=month(my_date), year=year(my_date),
           timestamp = my_date)
  day month year           timestamp
1  21     1 2012 2012-01-21 04:59:00
1
votes

We can use

as.Date(str1, "%a, %d %b %Y")
#[1] "2012-01-21"

If we need DateTime format

v1 <- strptime(str1, '%a, %d %b %Y %H:%M:%S %z', tz = "UTC")
v1
#[1] "2012-01-21 04:59:00 UTC"

Or using lubridate

library(lubridate)
dmy_hms(str1)
#[1] "2012-01-21 04:59:00 UTC"

data

str1 <- "Thu, 21 Jan 2012 04:59:00 -0000"
0
votes

If you really want the separation in components then start with Dirk's powerful suggestion and then transpose the output of as.POSIXlt:

library(anytime)
times <- c("2004-03-21 12:45:33.123456", # example from ?anytime
          "2004/03/21 12:45:33.123456",
          "20040321 124533.123456",
          "03/21/2004 12:45:33.123456",
          "03-21-2004 12:45:33.123456",
          "2004-03-21",
          "20040321",
          "03/21/2004",
          "03-21-2004",
          "20010101")
    t( sapply( anytime::anytime(times), 
                     function(x) unlist( as.POSIXlt(x)) ) )

      sec                min  hour mday mon year  wday yday  isdst
 [1,] "33.1234560012817" "45" "12" "21" "2" "104" "0"  "80"  "0"  
 [2,] "33.1234560012817" "45" "12" "21" "2" "104" "0"  "80"  "0"  
 [3,] "33.1234560012817" "45" "12" "21" "2" "104" "0"  "80"  "0"  
 [4,] "33.1234560012817" "45" "12" "21" "2" "104" "0"  "80"  "0"  
 [5,] "33.1234560012817" "45" "12" "21" "2" "104" "0"  "80"  "0"  
 [6,] "0"                "0"  "0"  "21" "2" "104" "0"  "80"  "0"  
 [7,] "0"                "0"  "0"  "21" "2" "104" "0"  "80"  "0"  
 [8,] "0"                "0"  "0"  "21" "2" "104" "0"  "80"  "0"  
 [9,] "0"                "0"  "0"  "21" "2" "104" "0"  "80"  "0"  
[10,] "0"                "0"  "0"  "1"  "9" "101" "1"  "273" "1"  
      zone  gmtoff  
 [1,] "PST" "-28800"
 [2,] "PST" "-28800"
 [3,] "PST" "-28800"
 [4,] "PST" "-28800"
 [5,] "PST" "-28800"
 [6,] "PST" "-28800"
 [7,] "PST" "-28800"
 [8,] "PST" "-28800"
 [9,] "PST" "-28800"
[10,] "PDT" "-25200"