0
votes

I have a .csv file that contains 285000 observations. Once I tried to import dataset, here is the warning and it shows 166000 observations.

Joint <- read.csv("joint.csv", header = TRUE, sep = ",")

Warning message: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string

When I coded with quote, as follows:

Joint2 <- read.csv("joint.csv", header = TRUE, sep = ",", quote="", fill= TRUE)

Error in read.table(file = file, header = header, sep = sep, quote = quote, : more columns than column names

When I coded like that, it shows 483000 observations:

Joint <- read.table("joint.csv", header = TRUE, sep = ",", quote="", fill= TRUE)

What should I do to read the file properly?

1
Can you post a small excerpt of your CSV file? It is hard to help you without see the format of your data.Simon Larsen

1 Answers

1
votes

I think the problem has to do with file encoding. There are a lot of special characters in the header. If you know how your file is encoded you can specify using the fileEncoding argument to read.csv.

Otherwise you could try to use fread from data.table. It is able to read the file despite the encoding issues. It will also be significantly faster for reading such a large data file.