I am trying to read a large (~700Mb) .csv file into R.
The file contains an array of integers less than 256, with a header row and 2 header columns.
I use:
trainSet <- read.csv(trainFileName)
This eventually barfs with:
Loading Data...
R(2760) malloc: *** mmap(size=151552) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(2760) malloc: *** mmap(size=151552) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Error: cannot allocate vector of size 145 Kb
Execution halted
Looking at the memory usage, it conks out at about 3Gb usage on a 6Gb machine with zero page file usage at the time of the crash, so there may be another way to fix it.
If I use:
trainSet <- read.csv(trainFileName, header=TRUE, nrows=100)
classes = sapply(train,class);
I can see that all the columns are being loaded as "integer" which I think is 32 bits.
Clearly using 3Gb to load a part of a 700Mb .csv file is far from efficient. I wonder if there's a way to tell R to use 8 bit numbers for the columns? This is what I've done in the past in Matlab and it's worked a treat, however, I can't seem to find anywhere a mention of an 8 bit type in R.
Does it exist? And how would I tell it read.csv to use it?
Thanks in advance for any help.
RMySQL
. c) Quickly spin up an EC2 instance with large memory and run it there. – Maiasaura