1
votes

I have a txt file like:

"cd_solicitud""nu_cuit""cd_provincia""tx_provincia"
"9531""203128827"18"Salta"
"9541""272477419"9"Entre Ríos"
"9571""273065780"2"Buenos Aires"
"6331""233703594"7"Córdoba"
"6351""272442465"5"Chaco"

I am trying to read it with:

prov_nos<-read.table("C:/.../prov_demo.txt",
                 header=T, quote = "\"")

But I get the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 doesn't have 4 elements

2
Uhh, what's that file format? Seriously, what's the separator here? - daroczig
@daroczig Maybe it's some sort of twisted, late April Fools joke where someone deleted all the commas from a file. Pretty sadistic, if you ask me. - joran
@daroczig txt as it says the read table line.... - GabyLP
The other R-ish option would be to read it in using readLines and then manually process each row, splitting on "\"" or something and removing entries of zero length. Then stitch it all back together with do.call(rbind,...). - joran
Could you upload text file to some place like dropbox, so we can see the raw file, maybe separators are getting messed up when pasted here. It is hard to believe that anyone would have this kind of evil file format. - zx8754

2 Answers

2
votes

You can hack it together if you read it in with readLines and then use strsplit to separate the elements of each row. It's not pretty, but then neither is the data's format:

the_text <- '"cd_solicitud""nu_cuit""cd_provincia""tx_provincia"
             "9531""203128827"18"Salta"
             "9541""272477419"9"Entre Ríos"
             "9571""273065780"2"Buenos Aires"
             "6331""233703594"7"Córdoba"
             "6351""272442465"5"Chaco"'
the_text <- readLines(textConnection(the_text))
df <- data.frame(do.call(rbind, strsplit(the_text[-1], '"+')))
names(df) <- strsplit(the_text[1], '"+')[[1]]
df[,1] <- NULL
df
#    cd_solicitud   nu_cuit cd_provincia tx_provincia
# 1          9531 203128827           18        Salta
# 2          9541 272477419            9   Entre Ríos
# 3          9571 273065780            2 Buenos Aires
# 4          6331 233703594            7      Córdoba
# 5          6351 272442465            5        Chaco
4
votes

As I sketched out in my comment, some variation on this:

l <- readLines("~/Desktop/scratch/no_delim.txt")
foo <- function(line){
    line <- strsplit(line,"\"")[[1]]
    line <- line[nchar(line) > 0]
    line
}
l <- lapply(l,foo)

> setNames(as.data.frame(do.call(rbind,l[-1])),l[[1]])
  cd_solicitud   nu_cuit cd_provincia tx_provincia
1         9531 203128827           18        Salta
2         9541 272477419            9   Entre Ríos
3         9571 273065780            2 Buenos Aires
4         6331 233703594            7      Córdoba
5         6351 272442465            5        Chaco

I say "some variation" because if there are other odd characters, odd quoting or other gotchas in your file you may need to adjust the splitting and cleanup to handle those.