0
votes

My csv file (accessible through link and viewable through screenshot) has 8 observations. Obs #5 has a non-standard character in the "author" column. I've shaded this yellow.

https://docs.google.com/spreadsheets/d/1-douIz03OQqahG6WCWY-irOE52oXtDDc4fJ6myMwJDk/edit?usp=sharing

enter image description here

When I run the following:

data1<-read.csv("Book1.csv",colClasses=c("end_date_n"="character","start_date_n"="character"),stringsAsFactors=FALSE)

I get this warning message and only the first 4 rows and a partial 5th row are imported. The import stops at the point where the non-standard character appears in col 5.

In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string

When I delete the "author" column from my csv source file, the import works fine.

How can I import the full file without having to delete the problem column?

1
You may check hereakrun
@akrun I looked at the link you suggested and tried this: data1<-read.csv("Book1.csv", sep = ",", quote = "\"", fill = TRUE) but the import still stops at the same point in row 5. The problem is that I can't get past the non-standard char I've identified.user3614783

1 Answers

0
votes

A colleague came up with this solution:

"The original character is ^z, which for decades was used by DOS/Windows as an end of file marker. Because UNIX systems never used ^z, the read-in problem is Windows-specific. Windows systems often direct users to enter non-ASCII characters (like é) using “ALT” codes. This may be where the ^z originates."

"Use a utility to translate ^z to something innocuous. The killZ function below takes the name of a file, translates ^z to *, then write the results in the same directory as the original file but with a -noz inserted just before the .txt or .csv (or whatever) filetype. You can then read the -noz file in the same way you have been reading the original .txt or .csv file."

killZ <- function(fname) {
  # open in binary mode
  f <- file(fname, "rb")
  res <- readLines(f)
  # translate the ^Z to *
  res <- gsub("\032", "*", res, fixed = TRUE)
  # Create the new file name
  ftype <- stringr::str_extract(fname, "\\..{1,3}$")
  new_name <- paste0(gsub(ftype, "", fname), "-noz", ftype)
  writeLines(res, con = new_name)
  close(f)
  return(new_name)
}