1
votes

i have some data files that i would like to load into matlab. unfortunatly, they have a quite complex structure - at least compared to what i am used to. you should be able to download an old example of this here, https://www.dropbox.com/s/vbh6kl334c5zg1s/fn1_2.out (it opens fine in notepad or wordpad)

it is data files based on synchrotron data where both the raw data, regularized "raw" data and the (indirect) fourier transformed data+fit to data is listed. there are furthermore some statistics from the fourier transformation.

I just need to quote the results from the statistics in my paper, so while it would be nice to plot some of the results, it is not strictly necessary. I need, however, the raw and regularized data together with the fit, and the fourier transformed data.

My problem

in the data file, the results from the statistical analysis is shown before the data i need. but the size of the columns from the statistical analysis varies from data file to data file. this means that i cannot just include the statistics in the header unless i manually change the number of header lines for each file i import. i need to analysis groups of 5 data files together and i would at least need to analyze around 30 files this time so i would like to avoid it if possible. in the future i would again need to load this kind of data files - so even if changing the number of headerlines 30 times does not sound bad it would be nice to be able do it automatically

Possible solution

both the he raw and regularized data together with the fit as well as the fourier transformed data are preceded by a specific line that tells me that after this and a blank/empty line, the data begins

so i though that maybe i could use regular expressions to tell matlab to ignore everything until you see this specific line, ignore this line and one more, and then import data

i googled and found this topic where regular expressions are used: Trying to parse a fairly complex text file

but i am new to regular expressions and the code suggested is a bit complex for me. i can gather that he uses named capture but i am not quite sure i understand how he uses it and if i can adopt it to me need. i have checked the official matlab documentation but their examples are somewhat simpler :) (http://www.mathworks.se/help/matlab/matlab_prog/regular-expressions.html#bqm94nz-1)

Sorry for writing such a long post. any suggestions on how to proceed with this problem will be greatly appreciated

/Martin

EDIT

the code i have used based on the link in the comment:

fileName = 'data.dat';
inputfile = fopen(fileName);

% Ignore all until we see one that just consists of this:
startString = '       R          P(R)      ERROR';

mydata = [];

while 1
 tline = fgetl(inputfile);

 % Break if we hit end of file, or the start marker
 if ~ischar(tline)  ||  strcmp(tline, startString)
    break
 end

 data = sscanf(tline, '%f', 3 );
 mydata(end+1,:) = data;

end
fclose(inputfile); 

When i run the code i get the error:

 Subscripted assignment dimension mismatch.

 mydata(end+1,:) = data;

any suggestions will be greatly appriciated and my apologize for the strange layout/leaving the link in the comment. i am not allowed to include more than two links in a post and i cannot add a new answer yet - both due to me having to low rep :)

1
I have found this method of ignoring everything until a specific line: link. I have tried to incorporate the solution and the code is shown in next commentMartin Nors Pedersen

1 Answers

0
votes

Since the blocks are separated by at least two new lines you can use that to separate the text into blocks and analyse them individually. Try this code

fileH = fopen('fn1_2.out');
content = fscanf(fileH, '%c', inf);
fclose(fileH);

splitstring = regexp(content, '\r\n\r\n', 'split');

blocks = regexp(splitstring, '\d\.\d{4}.*\r\n.*\d\.\d{4}','match');
numericBlocksIdx = find(cellfun(@(x) ~isempty(x), blocks));
numericBlocks = splitstring(numericBlocksIdx);

Now the numericBlocks{1}, numericBlocks{2}, ... contain the tables that you are interested in. Note that for some tables the headers are also included because they are not separated by two new lines. From here you can use functions like textscan to read the data into matrices.