1
votes

I have written code to plot data from very large .txt files (20Gb to 60Gb). The .txt files contain two columns of data, that represent the outputs of two sensors from an experiment that I did. The reason the data files are so large is that the data was recorded at 4M samples/s. The code works well for plotting relatively small .txt files (10Gb), however when I try to plot my larger data files (60Gb) I get the following error message:

Attempted to access TIME(0); index must be a
positive integer or logical.

Error in textscan_loop (line 17)
  TIME =
  ((TIME(end)+sample_rate):sample_rate:(sample_rate*(size(d,1)))+(TIME(end)));%shift
  Time along

The basic idea behind my code is to conserve RAM by reading Nlines of data from .txt on disk to Matlab variable C in RAM, plotting C then clearing C. This process occurs in loop so the data is plotted in chunks until the end of the .txt file is reached. The code can be found below:

Nlines = 1e6; % set numbe of lines to sample per cycle
sample_rate = (1); %sample rate
DECE= 1000;% decimation factor

TIME = (0:sample_rate:sample_rate*((Nlines)-1));%first inctance of time vector
format = '%f\t%f';
fid = fopen('H:\PhD backup\Data/ONK_PP260_G_text.txt');

while(~feof(fid))

  C = textscan(fid, format, Nlines, 'CollectOutput', true);
  d = C{1};  % immediately clear C at this point you need the memory! 
  clearvars C ;
  TIME = ((TIME(end)+sample_rate):sample_rate:(sample_rate*(size(d,1)))+(TIME(end)));%shift Time along 
  plot((TIME(1:DECE:end)),(d(1:DECE:end,:)))%plot and decimate
  hold on;
  clearvars d;
end

fclose(fid);

I think the while loop does around 110 cycles before the code stops executing and the error message is displayed, I know this because the graph shows around 110e7 data points and the loop processes 1e6 data points at a time.

If anyone knows why this error might be occurring please let me know.

Cheers, Jim

1
Never seen a txt file of 20+G size... - herohuyongtao
@herohuyongtao The .txt files contain two columns of data, that represent the outputs of two sensors from an experiment that I did. The reason the data files are so large is that the data was recorded at 4M samples/s - James Archer
The error message clearly does not suggest any issue with the plot command... Did you try running the function with the plot line commented out? It seems more like your TIME channel gets messed up... - sebastian
Please check whether running the code with dbstop if error helps and if not, please describe all relevant variables. I now suspect the error is just caused by a flaw in the code rather than the limitations of plot. - Dennis Jaheruddin
@sebastian I ran the code without the plot command and indeed the same error occurs. My TIME vector gets all crunked up it should be [1e6 x 1] but it ends up being [1 x 0] at the point or error. - James Archer

1 Answers

1
votes

The error that you encounter is in fact not in the plotting, but in the line of reference.

Though I have been unable to reproduce the exact error, I suspect it to be related to this:

Time = 1:0
Time(end)

In any case, the way forward is clear. You need to run this code with dbstop if error and observe all relevant variables in the line that throws the error.

From here you will likely figure out what is causing the problem, hopefully just something simple like your code being unable to deal with data size that is an exact multiple of 1000 or so.


Trying to use plot for big data is problematic as matlab is trying to plot every single data point.

Obviously the screen will not display all of these points (many will overlap), and therefore it is recommended to plot only the relevant points. One could subsample and do this manually as you seem to have tried, but fortunately we have a ready to use solution for this:

The Plot (Big) File Exchange Submission

Here is the introduction:

This simple tool intercepts data going into a plot and reduces it to the smallest possible set that looks identical given the number of pixels available on the screen. It then updates the data as a user zooms or pans. This is useful when a user must plot a very large amount of data and explore it visually.

This works with MATLAB's built-in line plot functions, allowing the functionality of those to be preserved.

Instead of:

plot(t, x);

One could use:

reduce_plot(t, x);

Most plot options, such as multiple series and line properties, can be passed in too, such that 'reduce_plot' is largely a drop-in replacement for 'plot'.

h = reduce_plot(t, x(1, :), 'b:', t, x(2, :), t, x(3, :), 'r--*');

This function works on plots where the "x" data is always increasing, which is the most common, such as for time series.