It sounds like memory mapping is the solution for you!
In essence you map the location of a file so you can access it per part (a kind of indexing but then on your harddrive I suppose). After you have done that, depending on the sparsity (sparseness?) of the matrix, you might want to switch to a sparse matrix which hopefully fits in your RAM so you can utilize the speed of RAM and no longer be limited to HDD speeds.
Another solution would be to read the file line by line (or other delimited quantity) and put only the non-zero values in a sparse matrix.
Kind regards,
Ernst Jan
OK So when using the fgetl solution I get reasonable performance. Approx. 10s per 100 lines on my laptop.
% Start with a clean slate.
clear all
% Create a data file, large!
m = 100; % Rows
n = 230000; % Columns
max_x = 10000;
% Create lots of zeros by setting everything smaller 0.999 x_max to 0;
% Write data file
% Now create a sparse matrix to put the csv file in:
P = sparse(m,n);
% Open data file
% Set line number counter to 0
line_number = 0;
% Get the first line of the data file (230K numbers)
text_line = fgetl(FID);
% If a text line has been retrieved from the line keep looping!
while ischar(text_line)
% Increase to line_number with 1 (MATLAB index starts at 1..)
line_number = line_number+1;
% Analyse the first text line (I assume all integers, otherwise change the format %d to %f)
C = textscan(text_line,'%d','delimiter',',','EmptyValue', 0);
% Now the number are stored in cell C. Which we should put in the
% sparse matrix:
P(line_number,:)=C{1}; % Can be optimized but forgot how but fast enough for now!
% And let's get the next line!
text_line = fgetl(FID);
So 230k lines should take about 5 to 10 hours.
Kind regards,
Ernst Jan