0
votes

I'm trying to write a function to read multiple (1000+) text files ('.txt') into MATLAB. A snippit of one file is shown below. The actual file has the same columns but with ~150 000 rows.

Start, Serial, DeviceId, RunNumber, Date, Real, Elapsed, X, EcgVal, EcgStatus, CapnoVal, CapnoStatus, P1Val, P1Status, P2Val, P2Status, P3Val, P3Status, Spo2Val, Spo2Status, CprDepth, CprFrequency, CprStatus, CprWaveVal, FiltEcgVal, FiltEcgStatus, Ecg2Val, Ecg2Status, Ecg3Val, Ecg3Status, Ecg4Val, Ecg4Status
2013-01-01 23:51:12, 00017711, TEMS ACP272, , 01-01-2013, 23:51:12.000, 00:00:00.000, 41275.993889, 0.000000, -1, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0, 0.000000, 0.000000, 1, 0.000000, 1, 0.000000, 1, 0.000000, 1
2013-01-01 23:51:12, 00017711, TEMS ACP272, , 01-01-2013, 23:51:12.008, 00:00:00.008, 41275.993889, 0.000000, -1, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0, 0.000000, 0.000000, 1, 0.000000, 1, 0.000000, 1, 0.000000, 1
2013-01-01 23:51:12, 00017711, TEMS ACP272, , 01-01-2013, 23:51:12.016, 00:00:00.016, 41275.993889, 0.000000, -1, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0, 0.000000, 0.000000, 1, 0.000000, 1, 0.000000, 1, 0.000000, 1
2013-01-01 23:51:12, 00017711, TEMS ACP272, , 01-01-2013, 23:51:12.024, 00:00:00.024, 41275.993889, 0.000000, -1, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0.000000, 0, 0, 0.000000, 0.000000, 1, 0.000000, 1, 0.000000, 1, 0.000000, 1

I've tried the obvious approaches (csvread, dlmread, importdata) without success. When I open this file using the 'ImportData' function I get:

þS

followed by 5 blank lines. Using

fid = fopen('TEST.txt','r');
fgetl(fid)

I find that there is an empty row between each data row and that there is a space between each character.

I've also tried using the textscan function as follows

fid = fopen('TEST.txt','r');
c = textscan(fid, '%s', 'Delimiter', ',')

but this returns an empty cell.

An alternative that does work is to open the file in Excel and save it as a CSV file. However, given that I am trying to do this for 1000+ files, this is not feasible.

Any comments, suggestions, or advice is greatly appreciated. Thank you!

UPDATE:

The following seems to work:

data = textscanu('TEST.txt');
str=textscan(data{1},'%s','Delimiter',',')

I will try to write this up in general to read the entire file, skip blank lines, and organize all the columns.

1
And store everything starting from the second line those 32 columns into a N x 32 sized cell array for each text file? - Divakar
Yes, that's fine. I'm not super picky about the format - I can reformat/organize the data once it's imported. - DrDunkenstein
how do you save your txt files to begin with? is it possible they are encoded in some 16bits per char instead of 8bits per char? some unicode perhaps? - Shai
The text files are downloaded from a machine (a defibrillator in fact) so I have no control over how they are saved. It is entirely possible that they are in unicode or have some strange encoding. - DrDunkenstein
So, it appears there are 33 entries in each row, please make sure of that. - Divakar

1 Answers

0
votes

Approach #1: With importdata -

%// Import text data as string cells, assuming file1 is the path to text file
data = importdata(file1,'')

%// Split columns based on the delimiter: ' '
split_data = cellfun(@(x) strsplit(x,' ') , data(2:end),'Uni',0)

%// Gather data into a N x number_of_entries cell array
out_data = vertcat(split_data{:})

%// Remove the commas after each entry (if so desired)
out_data = cellfun(@(x) strrep(x,',','') , out_data,'Uni',0)

%// Remove the sixth columns that had extra commas
out_data(:,6) = []

Approach #2: With textscan -

%// Read entire text data into a cell of a cell array, 
%// assuming file1 is the path to text file
fileID = fopen(file1,'r');
onecell_data = textscan(fileID,'%s','Delimiter','\n','HeaderLines',1);
fclose(fileID);

%// Unpack one level of data to have N x 1 sized cell array
data = [onecell_data{:}]

%// Split columns based on the delimiter: ' '
split_data = cellfun(@(x) strsplit(x,' ') , data(2:end),'Uni',0)

%// Gather data into a N x number_of_entries cell array
out_data = vertcat(split_data{:})

%// Remove the commas after each entry (if so desired)
out_data = cellfun(@(x) strrep(x,',','') , out_data,'Uni',0)

%// Remove the sixth columns that had extra commas
out_data(:,6) = []