Major update
It looks like development plans for fread
changed and fread
has now gained a fill
argument.
Using the same sample data from the end of this answer, here's what I get:
library(data.table)
packageVersion("data.table")
# [1] ‘1.9.7’
fread(x, fill = TRUE)
# V1 V2 V3 V4 V5 V6 V7
# 1: AA 3 3 3 3 NA NA
# 2: CC ad 2 2 2 2 2
# 3: ZZ 2 NA NA NA NA NA
# 4: AA 3 3 3 3 NA NA
# 5: CC ad 2 2 2 2 2
Install the development version of "data.table" with:
install.packages("data.table",
repos = "https://Rdatatable.github.io/data.table",
type = "source")
Original answer
This doesn't answer your question about fread
: That question has already been addressed by @Matt.
It does, however, give you an alternative to consider that should give you good speed improvements over base R's read.csv
.
Unlike fread
, you will have to help these functions out a little by providing them with some information about the data you are trying to read.
You can use the input.file
function from "iotools". By specifying the column types, you can tell the formatter function how many columns to expect.
library(iotools)
input.file(x, formatter = dstrsplit, sep = ",",
col_types = rep("character", max(count.fields(x, ","))))
Sample data
x <- tempfile()
myvec <- c('"AA",3,3,3,3', '"CC","ad",2,2,2,2,2', '"ZZ",2', '"AA",3,3,3,3', '"CC","ad",2,2,2,2,2')
cat(myvec, file = x, sep = "\n")
## Uncomment for bigger sample data
## cat(rep(myvec, 200000), file = x, sep = "\n")