2
votes

When getting data via an API, I get the following format of dates back in JSON "/Date(1386201600000)/". I tried rvest, httr, jsonlite, and rjson, but they all return this date format. To solve this I wrote the function convert_JSON_Date to transform this data into readable dates. This works and returns the correct date.

Using the function over the whole date column I get a warning message that NAs were introduced by coercion. I found out that this has something to do with the length of the json date format. Some of them are 20 characters and long some are 21. In my total dataset there are even more different lenghts. When I put the data through the function for each length separately, everything works fine.

I have no idea why the coercion errors occur. I was wondering if someone has an explanation of why this is happening.

# Example data
t <- c("/Date(1184889600000)/", "/Date(1377648000000)/", "/Date(1386201600000)/",
       "/Date(1099353600000)/", "/Date(1403222400000)/", "/Date(1052092800000)/",
       "/Date(1324425600000)/", "/Date(1115942400000)/", "/Date(1343260800000)/",
       "/Date(940377600000)/", "/Date(1438819200000)/", "/Date(975715200000)/",
       "/Date(1125446400000)/", "/Date(1194566400000)/", "/Date(1331856000000)/",
       "/Date(1396569600000)/", "/Date(1346803200000)/", "/Date(1438560000000)/",
       "/Date(950832000000)/", "/Date(1380326400000)/", "/Date(1432771200000)/",
       "/Date(1436572800000)/", "/Date(1376438400000)/", "/Date(1428537600000)/",
       "/Date(869788800000)/", "/Date(1343001600000)/", "/Date(1382486400000)/",
       "/Date(1259539200000)/", "/Date(1427500800000)/", "/Date(1421971200000)/")

# converter for json dates. 
convert_JSON_Date <- function(Input_String){
      start <- stringi::stri_locate(Input_String, regex = "\\(")[1,1]
      end <- stringi::stri_locate(Input_String, regex = "\\)")[1,1]
      # shift 1 position from the start and end to get the string between the parentheses
      JSON_Date <- stringi::stri_sub(Input_String, start+1, end-1)
      # Not interested in time element. This is the time the data was uploaded to server
      JSON_Date <- as.Date(structure(as.numeric(JSON_Date)/1000, class = c("POSIXct", "POSIXt")))
      return(JSON_Date)
}


# NAs introduced by coercion
convert_JSON_Date(t)

# separetely it works
convert_JSON_Date(t[nchar(t) == 20])

# separetely it works
convert_JSON_Date(t[nchar(t) == 21])
3
Maybe this can be useful.SabDeM

3 Answers

2
votes

You are only getting the start and end positions for the first element, which has 21 characters. Therefore for the elements that have 20 characters you are including the closing parenthesis, thereby making as.numeric return NA.

You should change it to extract the entire column for these values:

  start <- stringi::stri_locate(Input_String, regex = "\\(")[,1]
  end <- stringi::stri_locate(Input_String, regex = "\\)")[,1]

Or alternatively you could use base functions to extract the correct values:

start.end <- regexpr("\\d+",t)
as.numeric(substr(t, start.end, start.end + attr(start.end,"match.length")-1))/1000
 [1] 1184889600 1377648000 1386201600 1099353600 1403222400 1052092800
 [7] 1324425600 1115942400 1343260800  940377600 1438819200  975715200
[13] 1125446400 1194566400 1331856000 1396569600 1346803200 1438560000
[19]  950832000 1380326400 1432771200 1436572800 1376438400 1428537600
[25]  869788800 1343001600 1382486400 1259539200 1427500800 1421971200
[25]  869788800 1343001600 1382486400 1259539200 1427500800 1421971200
1
votes

You could run those dates through the date handler R_json_dateStringOp in RJSONIO::fromJSON after removing three zeros from the end of your string.

library(RJSONIO)
## create the JSON string after removing three zeros at the end of each 't'
make <- toJSON(gsub("0{3}(?=\\))", "", t, perl = TRUE))
## run it through fromJSON() with the date handler and collapse result to an atomic vector
do.call(c, fromJSON(make, stringFun = "R_json_dateStringOp"))
# [1] "2007-07-19 17:00:00 PDT" "2013-08-27 17:00:00 PDT" "2013-12-04 16:00:00 PST"
# [4] "2004-11-01 16:00:00 PST" "2014-06-19 17:00:00 PDT" "2003-05-04 17:00:00 PDT"
# [7] "2011-12-20 16:00:00 PST" "2005-05-12 17:00:00 PDT" "2012-07-25 17:00:00 PDT"
#[10] "1999-10-19 17:00:00 PDT" "2015-08-05 17:00:00 PDT" "2000-12-01 16:00:00 PST"
#[13] "2005-08-30 17:00:00 PDT" "2007-11-08 16:00:00 PST" "2012-03-15 17:00:00 PDT"
#[16] "2014-04-03 17:00:00 PDT" "2012-09-04 17:00:00 PDT" "2015-08-02 17:00:00 PDT"
#[19] "2000-02-17 16:00:00 PST" "2013-09-27 17:00:00 PDT" "2015-05-27 17:00:00 PDT"
#[22] "2015-07-10 17:00:00 PDT" "2013-08-13 17:00:00 PDT" "2015-04-08 17:00:00 PDT"
#[25] "1997-07-24 17:00:00 PDT" "2012-07-22 17:00:00 PDT" "2013-10-22 17:00:00 PDT"
#[28] "2009-11-29 16:00:00 PST" "2015-03-27 17:00:00 PDT" "2015-01-22 16:00:00 PST"
0
votes

This line:

stringi::stri_locate(Input_String, regex = "\\)")[1,1]

Will get the position of the closing parenthesis, which can be either position 19 or 20, depending on how long each date string is. Since your first date is one of the longer ones, your end value will be 20. But extracting between start and 20 will return the closing ) for the shorter dates. Really the shorter dates need an end value of 19, and the long ones need a value of 20.

Anyway you just need more general regex to fix this:

as.numeric(stri_extract(t, regex = '\\d+'))

That will return all numbers in the string, which is what you ultimately want.