2
votes

I am trying to read http://www.cboe.com/publish/ScheduledTask/MktData/datahouse/pcratioarchive.csv (more about the data here http://www.cboe.com/data/PutCallRatio.aspx) using

library(data.table)
download.file(url="http://www.cboe.com/publish/ScheduledTask/MktData/datahouse/pcratioarchive.csv", destfile="pcratioarchive.csv")
outDT <- fread("pcratioarchive.csv", header=FALSE, skip=4)

Somehow this detects strange error (which I can not see in pcratioarchive.csv file itself):

outDT <- fread("pcratioarchive.csv", header=FALSE, skip=4) Error in fread("pcratioarchive.csv", header = FALSE, skip = 4) : Expected sep (',') but new line, EOF (or other non printing character) ends field 2 on line 6 when detecting types: 12/2/1999,0.52

Is there a way to get this work with data.table without manually changing the pcratioarchive.csv ?

My session info:

sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
 [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
[11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.3 rj_1.1.3-1      

loaded via a namespace (and not attached):
[1] plyr_1.8.1    Rcpp_0.11.1   reshape2_1.4  rj.gd_1.1.3-1 stringr_0.6.2
[6] tools_3.1.1  
1
Check this link <https://stackguides.com/questions/24340370/fread-error-and-strange-behaviour-when-reading-csv> - akrun

1 Answers

3
votes

The pcratioarchive.csv file is malformed. For example:

...
10/12/1995,0.63,,,
10/13/1995,0.76,,,
10/16/1995,0.87
10/17/1995,0.76
...
10/17/2003,0.64,,
10/20/2003,0.62,,
10/21/2003,0.7,1.27,0.59
10/22/2003,0.98,1.89,0.77
...

I'm not familiar enough with fread to know if it has arguments to handle this, but read.csv does.

x <- read.csv("pcratioarchive.csv", header=FALSE, skip=4, fill=TRUE)