2
votes

I'm using fread to read a 179mb CSV file with 16 columns and 637501 rows. fread is not reading the first 29 lines of the CSV file. It misses the headers in the first line as well. I have used

fread("filename.csv",sep= ",")
fread("filename.csv",sep= "," , skip>=0L)
fread("filename.csv",sep= "," , skip>=1L)
fread("filename.csv",sep= ",", autostart=1L)

When I set header =TRUE, the row 30 is set as the header but fread fails to recognize the first 29 rows. I am able to read the read the same file read.csv without any issues (only it takes a lot longer).

Is this a bug or am I missing something?

Link to a sample CSV that produces the same bug (20kb) https://dl.dropboxusercontent.com/u/17747104/example.csv

Here's the link to the 179mb file. https://dl.dropboxusercontent.com/u/17747104/read.csv

1
These sorts of things are basically impossible for anyone to help with without access to your csv, or a small example csv that exhibits the same behavior.joran
@joran I added a link to a sample file that produces the same bug anda link to the actual file. I've noticed in the file that the row 30 has only 16 columns where are most of the the other rows have 36 columns.arelangi
It's not a bug, I think. read.table has a feature that will automatically add blank fields if you provide a malformed csv file with different number of fields per row. I'm not sure how to handle this with fread short of modifying the file itself.joran
on *nix and data.table 1.8.11+, I'd do fread("awk 'BEGIN{OFS = FS = \",\"}{$36 = $36; print}' yourfile.csv") (replace 36 with whatever the right number of columns is)eddi

1 Answers

3
votes

As you've now realised by looking at row 30, it has 16 columns whereas the other rows have 36 columns. It seems chopped off, like a data error.

Edit : fread gained fill=TRUE in v1.9.8 on CRAN Nov 2016: release notes. That should resolve it.