1
votes

I am trying to load a 3Gb csv file in R and I getting the following warning:

Warning messages:

1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 7128Mb: see help(memory.size)

2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 7128Mb: see help(memory.size)

I tried doing this :

memory.size()

[1] 766.68

memory.limit()

[1] 7128

But still my file doesn't get loaded and I keep getting this warning. Is there a way I can get around reading this file in R?

Thank you !

1
I've seen people suggest fread() in the past, might be worth looking into. Also gc() might help with the error if garbage collection isn't taking place. - zacdav
@zacdav.. Thank you so much :) I used >install.packages("data.table") >library(data.table) >fread("file.csv",sep = ",",stringsAsFactors=FALSE) - Sweta
The best part is it also shows the time Read 74180464 rows and 11 (of 11) columns from 2.980 GB file in 00:08:34 - Sweta
I'm glad that it helped! - zacdav

1 Answers

1
votes

R can be incredibly memory inefficient when loading large datasets. From the documentation:

Memory usage

These functions can use a surprising amount of memory when reading large files. There is extensive discussion in the ‘R Data Import/Export’ manual, supplementing the notes here.

Less memory will be used if colClasses is specified as one of the six atomic vector classes. This can be particularly so when reading a column that takes many distinct numeric values, as storing each distinct value as a character string can take up to 14 times as much memory as storing it as an integer.

Using nrows, even as a mild over-estimate, will help memory usage.

Using comment.char = "" will be appreciably faster than the read.table default.

read.table is not the right tool for reading large matrices, especially those with many columns: it is designed to read data frames which may have columns of very different classes. Use scan instead for matrices.