I have a large csv file with 19 columns of character/numeric data.
Upon running fread, I got an error message saying one of my numeric columns was being converted to character because the field had value ""
. I then opened up my data in a text editor, and found the source of my problem. On one line, a character column read:
"""PARENTS"", ""Y.M."", AND ""EXPECTING"""
Which corresponded to the string:
"PARENTS", "Y.M.", AND "EXPECTING"
As:
- The first quote is a string protector
- The 2nd to 6th pairs of quotes are a single quote
- The last quote is the close for the string protector.
From what I've seen before, fread would read this converting ""
to \"
. The problem in this case is that the string also contains commas. These are being interpreted as delimiters, which messes with my column order and pushes later character columns into my numeric fields.
Is there a way to stop this, or should I use another package?
Note: I have looked around for a solution for this and sense that ""
+ fread is a source of frustration, but have not seen an example with the added complication of commas.
Reproduce:
Put the following in a txt file:
"A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S"
"168263291","Gruner & Jahr Printing and Publishing Company","Parents Ym and Expecting","""PARENTS"", ""Y.M."", AND ""EXPECTING""",0,0,3,"73130201","055302756","Quad/Graphics Inc.","013034588","02","093671063","000000000","Unclassified","94133","San Francisco","CALIFORNIA","UNITED STATES"
Read the data:
DT <- fread("myfile.csv",colClasses = c(rep("Character",5),
rep("numeric",2),
rep("character",12))
,sep = ",")
fread
was under development, and looking at the page it says it still is – Rorschach