One byte separator argument in read.table()

Question

I am making strings of unpredictable character sets into table, with expected number of columns. I am having a troublesome time of choosing a proper separator.

For instance, a sample table might look like:

FILENAME: foo.txt

SEPARATOR: "\u00AA"

ROW1,COL1: foo

ROW1,COL2: b,ar

ROW1,COL3: fo;obar

ROW1,COL4: bo\tt

And on.

In R I would give

read.table('foo.txt', sep="\u00AA")

and get

invalid 'sep' value: must be one byte

What separator should I use to avoid conflict with the unpredictable strings? Unicode is accepted up to \u007F, but R interprets anything higher to be multi-byte. Why?

Why not use something normal like , and include a quote character like " after you escape all instances of " in your strings? the command line tool sed is super handy for this kind of thing. — Justin
I am going for efficiency. I prefer not to put the strings of interest in quotes, but that is an option to keep in mind. — bfb
The crux of my frustration is that I am writing and reading the table in R and reading the table in python. Using a tab delimited file works great to write in R and read in Python, but R cannot read the tab delimited file. I returns "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 72373 did not have 11 elements" — bfb
R can read tab separated values perfectly (sead ?read.table). That error may be because of some other malformation in the data. You may inspect that line on the shell using sed -n 72373p filename.txt. — asb
There are certainly 11 elements in line 72373 via visual inspection. Could R be seeing a space instead of a tab? — bfb

bfb bfb · Accepted Answer · 2013-06-20T21:39:12

Figured it out. Thank you for the inspiration.

The key is to set comment.char="" and quote=""

For instance,

read.table('foo', sep="\t", quote="", comment.char="")

returns the proper data.frame.

One byte separator argument in read.table()

2 Answers