0
votes

I have a very large text file containing mostly numerical data with headers (strings). It has 13 columns and ~49,000 rows. All cells contain a numerical value (no empty cells, no columns with different # of rows). It is data about the solar wind taken from a satellite. It looks like this:

year,day,hr,min, sec,  Np,     Tp,          Vx_gsm,    Vy_gsm,  Vz_gsm,   Bx_gsm.....
YYYYxDDDxHHxMMxSSSSSSSxNNNNNNNxTTTTTTTTTTTTxVVVVVVVVVVxVVVVVVVVxVVVVVVVVVxB
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 .....

Now, the columns are actually perfectly aligned in the text file but I couldn't include all 13 columns here but that doesn't matter for now. Wherever there is a "-9999.90039", that means the value was corrupted for some reason. What I need to do is replace the "-9999.90039" values with NaN, so they won't be included in any calculations. I need to create arrays of the columns of cell data for Np and Tp (for now), initialize them to zero, and find the minimum and maximum values in each. This is a large file so I figured I needed to do this in blocks. Also, since I only need to access the values in the numerical cells of 1 column (per calculation) then I didn't need to change much and use textscan. This is what I've gotten so far:

 N = 10;% block size will have to be bigger but I wanted to test first
solarmax = fopen('ACE_magswe_64sec_2000.txt','r');
formatSpec = '%*d %*d %*d %*d %f %f %f %f %f %f %f %f %f';
% my prof said I need to initialize my variables by setting them to zero so
% I put this line below but I don't think it's right
k = 0;% not sure if this is necessary
while ~feof(solarmax)
    k = k+1;% not sure if this is necessary
    C = textscan(solarmax,formatSpec,N,'HeaderLines',2,'Delimiter','\t');
    function y = changeval(num)
    if (num('-9999.90039',num))
        y = 'NaN';
    end
end
fclose(solarmax);
Np = C{1,6};% not sure what to put here to call all values in that column
min(Np)
max(Np)
Tp = C{1,7};% same problem here
min(Tp)
max(Tp)

So, I put asterisks next to those columns I want ignored in FormatSpec and I used HeaderLines to ignore the first 2 rows. After, this I get confused on how I'm supposed to set this up (my only previous experience in programming was C++ in 2006!) Please help!!

UPDATED

Marcin offered me some great advice but I still have some issues that I need help with. Here is my code, now:

N = 1000;
solarmax = fopen('ACE_magswe_64sec_2000.txt','r+');
formatSpec = '%*d %*d %*d %*d %*f %f %f %f %f %f %f %f %f';

minNp = [];
maxNp = [];

minTp = [];
maxTp = [];

while ~feof(solarmax)
    C = textscan(solarmax,formatSpec,N,'HeaderLines',2,'Delimiter','\t');
    Np = cell2mat(C(:,1)); 
    Tp = cell2mat(C(:,2));

    Np(Np == -9999.90039) = NaN;
    Tp(Tp == -9999.90039) = NaN;

    minNp(end+1) = nanmin(Np);
    maxNp(end+1) = nanmax(Np); 

    minTp(end+1) = nanmin(Tp);
    maxTp(end+1) = nanmax(Tp);
end
fclose(solarmax);
nanmin(Np);
nanmax(Np);
nanmin(Tp);
nanmax(Tp);

When compiled and deleting the semicolons at the very end so I get values for the min and max functions, the values turn out to all be NaNs! I thought that the min and max commands already ignored the NaNs so I looked it up and nanmin/nanmax was suggested. However, this yielded the same results. Is there anything else I'm missing?

1
What language is this? It is tagged R, but this is not R code. - James King
I'm going to guess this is matlab based on the % comment character and the ~feof - MrFlick
Confirming this is MATLAB / Octave based on the use of cell arrays and using % for comments. - rayryeng
Oops! I'm sorry! Yes, it's MATLAB. - GreySky

1 Answers

0
votes

Maybe this will be helpful.

Example data based on your question. I made it longer so to have more than 10 rows.

year,day,hr,min, sec,  Np,     Tp,          Vx_gsm,    Vy_gsm,  Vz_gsm
YYYYxDDDxHHxMMxSSSSSSSxNNNNNNNxTTTTTTTTTTTTxVVVVVVVVVVxVVVVVVVVxVVVVVVVVVxB
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -17.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060 
2000   1  0  1 47.0496 3.42400 -9999.90039  -9999.90039 49.72259 -67.01060
2000   1  0  0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -267.01060 
2000   1  0  1 47.0496 3.42400 -99.90039  -9999.90039 49.72259 -167.01060

Modified code based on the example data and your description:

N = 10;% block size will have to be bigger but I wanted to test first

solarmax = fopen('ACE_magswe_64sec_2000.txt','r');

formatSpec = '%*d %*d %*d %*d %f %f %f %f %f %f';

minNp = []; % to store min values for each iteration of the while loop
maxNp = []; % to store max values for each iteration of the while loop

while ~feof(solarmax)    

    % read N=10 rows from the file
    C = textscan(solarmax,formatSpec, N, 'HeaderLines',2,'Delimiter','\t');

    % get Np and Tp column as row vectors
    Np = cell2mat(C(:,2)); 
    Tp = cell2mat(C(:,3)); 

    % chage -9999.90039 to NaN
    Np(Np == -9999.90039) = NaN;
    Np(Np == -9999.90039) = NaN;

    % calculate min or max values for each set ot N=10 values as you
    % did. Probably need to store them, so do this:        
    minNp(end+1) = min(Np);
    maxNp(end+1) = max(Np);        

    % the same do for Tp.
end
fclose(solarmax);

Results in one min and max value for each iteration of loop:

minNp =

    3.2740    3.2740    3.2740


maxNp =

    3.4240    3.4240    3.4240