I am trying to utilize sqldf for my large dataset. (Original data files are scattered across 1000~ separate csv files, so I've created a master csv file that contains all those information and plan to use sqldf to fetch the data as needed)
Upon trying some simple codes, none of my sql queries was returning a valid result -- all queries will return 0 observations!
I tried to run the sample code provided in "read.csv.sql" documentation, which creates a csv file with iris dataset and runs a simple sql query to fetch the data from the created csv file. I've done it in two different versions:
write.csv(iris, "iris1.csv", quote = FALSE, row.names = FALSE)
iris1 <- read.csv.sql("iris1.csv",
sql = "select * from file where Species = 'setosa' ")
dim(iris1)
write_csv(iris, "iris2.csv")
iris2 <- read.csv.sql("iris2.csv",
sql = "select * from file where Species = 'setosa' ")
dim(iris2)
I get following:
> dim(iris1)
[1] 50 5
> dim(iris2)
[1] 0 5
The only difference I am finding is that "iris1.csv" was made with write.csv, while "iris2.csv" was made with write_csv. To me, they appear to be the exact same csv files. What underlying difference would be the cause for this unexpected result? I would just use write.csv, but my dataset is so large that I definitely need to use write_csv to create a master db.
write_csv
manual, the output (iris2) should be encoded inUTF-8
(as inwrite.csv
for iris1) but I suspect it isCSV-Macintosh
which is not the best option forread.csv.sql
. – Majid