Read Table: quotes in R

Question

I have a txt file like:

"cd_solicitud""nu_cuit""cd_provincia""tx_provincia"
"9531""203128827"18"Salta"
"9541""272477419"9"Entre Ríos"
"9571""273065780"2"Buenos Aires"
"6331""233703594"7"Córdoba"
"6351""272442465"5"Chaco"

I am trying to read it with:

prov_nos<-read.table("C:/.../prov_demo.txt",
                 header=T, quote = "\"")

But I get the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 doesn't have 4 elements

Uhh, what's that file format? Seriously, what's the separator here? — daroczig
@daroczig Maybe it's some sort of twisted, late April Fools joke where someone deleted all the commas from a file. Pretty sadistic, if you ask me. — joran
The other R-ish option would be to read it in using readLines and then manually process each row, splitting on "\"" or something and removing entries of zero length. Then stitch it all back together with do.call(rbind,...). — joran
Could you upload text file to some place like dropbox, so we can see the raw file, maybe separators are getting messed up when pasted here. It is hard to believe that anyone would have this kind of evil file format. — zx8754

alistaire alistaire · Accepted Answer · 2016-04-04T18:59:05

You can hack it together if you read it in with readLines and then use strsplit to separate the elements of each row. It's not pretty, but then neither is the data's format:

the_text <- '"cd_solicitud""nu_cuit""cd_provincia""tx_provincia"
             "9531""203128827"18"Salta"
             "9541""272477419"9"Entre Ríos"
             "9571""273065780"2"Buenos Aires"
             "6331""233703594"7"Córdoba"
             "6351""272442465"5"Chaco"'
the_text <- readLines(textConnection(the_text))
df <- data.frame(do.call(rbind, strsplit(the_text[-1], '"+')))
names(df) <- strsplit(the_text[1], '"+')[[1]]
df[,1] <- NULL
df
#    cd_solicitud   nu_cuit cd_provincia tx_provincia
# 1          9531 203128827           18        Salta
# 2          9541 272477419            9   Entre Ríos
# 3          9571 273065780            2 Buenos Aires
# 4          6331 233703594            7      Córdoba
# 5          6351 272442465            5        Chaco

Read Table: quotes in R

2 Answers