1
votes

I have this similar problem: read.csv warning 'EOF within quoted string' prevents complete reading of file

That is, when I load a csv R says:

Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
EOF within quoted string

I can get rid of this error by applying: quotes="" to read.csv

But the main problem still exists, only 22111 rows of 689233 in total are read into R. I would like to try removing all special characters from the csv to see if this clears the problem.

Related I found this: How to remove specific special characters in R

But is there a way to do it in read.csv, that is in the phase when I'm reading in the file?

3
Are you certain that your input file is well-formed, meaning that all 689,233 rows have the same number of columns? read.csv (which is a wrapper around read.table) is somewhat sensitive and can die for bad input files. - Tim Biegeleisen
I don't think you can do it within read.csv! I believe it is even better here to not use R and use something like awk or other Linux text post-processing commands. - agstudy
@ElinaJ Could you post the first 2 rows along with rows 22111 and 22112 from your input csv file? - Tim Biegeleisen
I'm afraid it's sensitive data and it's not possible to post... I tried deleting rows 21611-22111 and now I got 230,168 rows to load... - ElinaJ
You can likely solve it by using read.table with option encoding. - daniel

3 Answers

1
votes

Did you try fread from data.table? It can optimize the task and likely deal with some common issues. As you haven't provide any piece of data, I'm giving a silly example:

> fread('col1,col2\n5,"4\n3"')
   col1 col2
1:    5 4\n3
0
votes

It was indeed a special charcter. There was a → (arrow, hexadecimal value 0x1A) on line 22,112. After deleting the arrow I get the data to load normally!

0
votes

Solution of datatable expord csv with special chahracters Find charset from https://cdn.datatables.net/buttons/1.1.2/js/buttons.html5.js or https://cdn.datatables.net/buttons/1.1.2/js/buttons.html5.min.js

and change it to 'UTF-8-BOM'from 'UTF-8'