Notice that the R's base command read.csv works such that
read.csv(file=fileName, dec=".", sep=",", header=T)
whilefread
does not work in the following demo, having quoted separators inline. We concentrate now on the data.table and fread, read.csv too slow.
I have a csv file that has comma as a field value separators and dot as a decimal point. The file MyFile.csv
has field names with commas and dots, such as "Product.Apple.Green,Purple"
where double quotes try to separate dots and commas. However, the separation with double quotes does not work with the fread such that
Sys.setlocale('LC_NUMERIC', 'fi_FI.UTF-8')
fread(file="MyFile.csv", sep=",", dec=".")
so for example the fields
`CustomerID, ProductID, Prod.Sub, "Prod.Sub,feature", A.B.C, "A,B,C,D"`
is read, with dash -
denoting field separation now, as
`CustomerID - ProductID - Prod.Sub - "Prod.Sub - feature" - A.B.C - "A - B - C - D"`
where "Prod.Sub, feature"
is wrongly read as two fields "Prod.Sub - feature"
and "A,B,C,D"
is wrongly read as "A - B - C - D"
.
How can I inline escape separators with data.table's fread?
data.table
are you using? When I usefread('myfile.csv')
, the data is read normally in the latest version (I made a csv-file with the field names you described). Maybe you can include the first line of the csv-file? – Jaapfread
works as expected ondt= fread( 'CustomerID, ProductID, Prod.Sub, "Prod.Sub,feature", A.B.C, "A,B,C,D" 1,2,3,4,5,6' )
(note there is a new line between the header and data - doesn't show in comments) – dwwC/UTF-8/C/C/C/C
, could that be a problem if fread uses the location parameters for the separators? – hhh