0
votes

My matlab code for dataimport is giving me different results for what appear to be similar text files as input. Input1 gives me a normal cell with all lines from the text file as entries in the cell which i can reference using {i}. Input2 gives me a scalar data structure where all numeric entries in my text file are converted to the input.data structure. I want all files to be converted to regular cell entries and I do not understand why for some files they are converted to scalar data structures.

Code: input = importdata(strcat(direct,'\',filename));

Input1 example: Correctly working dataimport, with text file on the right File link: https://drive.google.com/open?id=1aHK4xivqEqJEmaA8Dv8Y0uW5giG-Bbip

Input2 example: Incorrectly working data import, with text file on the right FIle link: https://drive.google.com/open?id=1nzUj_wR1bNXFcGaSLGva6uVsxrk-R5vA

2
This seems to be dependent on the contents of your text files. Please provide the two data files in a minimal form to make your problem reproducible for others. - Georg W.
Please find the text files attached in a google drive link. - TomNijbroek
you've added the [octave] tag, gives GNU Octave the same result? - Andy
@Andy: I tested it with octave and got the exact behaviour as described in my answer. - Georg W.

2 Answers

2
votes

UTSL!

I'm guessing you are using GNU Octave although you are writing "Matlab" as topic of your question.

In importdata.m around line 178, the code tries to automatically detect the delimiter for your data:

delim = regexpi (row, '[-+\d.e*ij ]+([^-+\de.ij])[-+\de*.ij ]','tokens', 'once');

If you run this against W40A0060; you get A as delimiter because there is basically a number before and after it.

If you run this against W39E0016; you get {} as delimiter(empty) because the E could be part of a number in scientific notation and is therefore excluded.

Solution:

you really should add the correct delimiter to the importdata call and not trust that it's magically detected.

And if you just want the lines in a cell, use

strsplit (fileread ("W39E0016_Input2.txt"), "\n")
0
votes

Analysis

This looks indeed strange!

EDIT: The cause for this strange looking behaviour has been deciphered by @Andy (See his solution).

When you use all outputs of importdata() function you can see what happens when reading the data:

[dat1,del1,headerrows1]=importdata('Input1.txt')
[dat2,del2,headerrows2]=importdata('Input2.txt')

For your first file it recognizes 69 header riws and no delimiter:

del1 = []
headerrows1 =  69

while in your second file only two header rows and a comma , delimiter is recognized

del2 = ','
headerrows2 =  2

I can not find an obvious reason in your files causing this different interpretation of data.

Suggestion

Your data format is rather complex. It is not a simple table like produced from excel. It has multiple lines with a different number of fields per line and varying data types. importdata() is not designed for this type of data. I suggest to write a specific import function for this kind of file. Have a look at textread() for a first guess. You can use it to read the lines of the files as text and later interpret it with sscanf() or use strsplit() to split the line contents into fields.