3
votes

I have a problem with fread() reading a column of directory paths using "\" as the directory separator. The issue is that the trailing directory separator throws an error in fread().

For the below example csv file,

file,size
"windows\user",123

both fread() and read.csv() agree and both convert the \ to \\

> fread("example.csv")
            file size
1: windows\\user  123

However, for the following example fread() gives an error while read.csv() is fine.

file,size
"windows\user\",123

read.csv() gives

> read.csv("example.csv")
             file size
1 windows\\user\\  123

While the fread() error looks like this

> fread("example.csv",verbose=TRUE)
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.000 GB
File is opened and mapped ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Using line 2 to detect sep (the last non blank line in the first 'autostart') ... sep=','
Found 2 columns
First row with 2 fields occurs on line 1 (either column names or first row of data)
All the fields on line 1 are character fields. Treating as the column names.
Count of eol after first data row: 2
Subtracted 1 for last eol and any trailing empty lines, leaving 1 data rows
Error in fread("example.csv", verbose = TRUE) : 
' ends field 1 on line 1 when detecting types: "windows\user\",123

I would really like to avoid doing

DT = data.table(read.csv("example.csv"))

if at all possible.

1
As it happens, I've just been fixing that along with \n inside quoted fields. Will add answer when it's ready to try from GitHub. - Matt Dowle
It does make one wonder what would be the "right" fix since it is due to the well-documented behavior of scan and contrary to this questioner's claims, the example is NOT fine with read.csv(). 'file,size "windows\user\",123` throws an error. - IRTFM
@BondedDust read.csv seems to read it fine for me, agreeing with the asker. I looked in ?scan - where do you mean? - Matt Dowle
scan interprets the '\user' as ctrl-u followed by 'ser'. read.csv(text="windows\user\",123", sep=",") returns: Error: '\u' used without hex digits in character string starting ""windows\u". Mac 10.8.5, R 3.1.0 - IRTFM
@BondedDust That's not read.csv, that's the parser. Try typing "windows\user\",123" at the console on its own and you get the same error. To parse you need to double the \. When reading from a file with contents as shown by asker, read.csv(filename) works. - Matt Dowle

1 Answers

5
votes

Now fixed in v1.9.3 on GitHub.

  • fread() now accepts trailing backslash in quoted fields. Thanks to user2970844 for highlighting.
$ cat example.csv
file,size
"windows\user\",123

> require(data.table)
> fread("example.csv")
              file size
1: windows\\user\\  123
> read.csv("example.csv")
             file size
1 windows\\user\\  123
>