2
votes

I am encountering an issue while loading a CSV data set in R. The data set can be taken from

https://data.baltimorecity.gov/City-Government/Baltimore-City-Employee-Salaries-FY2015/nsfe-bg53

I imported the data using read.csv as below and the dataset was imported correctly.

EmpSal <- read.csv('E:/Data/EmpSalaries.csv')

I tried reading the data using read.table and there were a lot of anomalies when looking at the dataset.

EmpSal1 <- read.table('E:/Data/EmpSalaries.csv',sep=',',header = T,fill = T)

The above code started reading the data from 7th row and the dataset actually contains ~14K rows but only 5K rows were imported. When looked at the dataset in few cases 15-20 rows were combined into a single row and the entire row data appeared in a single column.

I can work on the dataset using read.csv but I am curious to know the reason why it didn't work with read.table.

2
@zx8754 you mean the link is not working or the download? you can export the data to any format from Export tab in the link - mockash
Strange, now it is working, sorry. - zx8754

2 Answers

2
votes

read.csv is defined as:

function (file, header = TRUE, sep = ",", quote = "\"", dec = ".", 
    fill = TRUE, comment.char = "", ...) 
read.table(file = file, header = header, sep = sep, quote = quote, 
    dec = dec, fill = fill, comment.char = comment.char, ...)

You need to add quote="\"" (read.table expects single quotes by default whereas read.csv expects double quotes)

EmpSal <- read.csv('Baltimore_City_Employee_Salaries_FY2015.csv')
EmpSal1 <- read.table('Baltimore_City_Employee_Salaries_FY2015.csv', sep=',', header = TRUE, fill = TRUE, quote="\"")
identical(EmpSal, EmpSal1)
# TRUE
2
votes

As you mentioned, your data is imported successfully by using read.csv() command without mentioning quote argument. Default value of quote argument for read.csv function is "\"" and for read.table function, it is "\"'". Check following code,

read.table(file, header = FALSE, sep = "", quote = "\"'",
           dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
           row.names, col.names, as.is = !stringsAsFactors,
           na.strings = "NA", colClasses = NA, nrows = -1,
           skip = 0, check.names = TRUE, fill = !blank.lines.skip,
           strip.white = FALSE, blank.lines.skip = TRUE,
           comment.char = "#",
           allowEscapes = FALSE, flush = FALSE,
           stringsAsFactors = default.stringsAsFactors(),
           fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)

read.csv(file, header = TRUE, sep = ",", quote = "\"",
         dec = ".", fill = TRUE, comment.char = "", ...)

There are many single quotation in your specified data. And this is the reason why read.table function isn't working for you.

Try the following code and it will work for you.

 r<-read.table('/home/workspace/Downloads/Baltimore_City_Employee_Salaries_FY2015.csv',sep=",",quote="\"",header=T,fill=T)