1
votes

How do I store Matlab arrays located in a 'struct within struct within struct' into a database so that I can then retrieve the fields and arrays?

More detail on why do I need this below:

I have tons of data saved as .mat files....the hassle is that I need to load a complete .mat file to begin manipulating and plotting the data there. If that file is large, it becomes quite a task just to load it into memory.

These .mat files are resulted from the analysis of raw electrical measurement data of transistors. All .mat files have the same structure but each file correspond to a different and unique transistor.

Now say I want to compare a certain parameter in all transistors that are common in A and B, I have to manually search and load all the .mat files I need and then try to do the comparison. There is no simple way to merge all of these .mat files into a single .mat file (since they all have the same variable names but with different data). Even if that is possible, there is no way I know of to query specific entries from .mat files.

I do not see a way of easily doing that without a structured database from which I can query specific entries. Then I can use any programming language (continue with Matlab or switch to python) to convieniently do the comparison and plotting...etc. without the hassle of the scattered .mat files.

Problem is that the data in the .mat files are structured in structs and large arrays. From what I know, storing that in a simple SQL database is not a straight forward task. I looked up using HDF5 but from the examples I saw, I have to do a lot of low-level commands to store those structs in an HDF file and I am not sure if I can load parts of the HDF file into Matlab/python or if I also have to load the whole file in memory first.

The goal here is to merge all existing (and to-be-created) .mat files (with their compound data strucutre of structs and arrays) into a single database file from which I can query specific entries. Is there a database solution that can preserve the structure of my complex data? Is HDF the way to go? or is there a simple solution I am missing?

EDIT:

Example on data I need to save and retrieve:

All(16).rf.SS(3,2).data

Where All is an array of structs with 7 fields. Each struct in the rf field is a struct with arrays, integers, strings and structs. One of those structs is named SS which in turn is an array of structs each containing a 2x2 array named data.

1
Outside the Matlab world, this is known as an ORM, or Object Relation Mapper. - MSalters
Thanks @MSalters! Looked it up, couldn't find something related to Matlab....Would it be logical to import the .mat files into python then saving it with Object Oriented Databases? Then continue using python from there? - Ahmad Khaled
Not an expert here, but it makes sense. Python (with SciPy/NumPy) is serious competition for MatLab, precisely because it integrates better with the rest of the world. - MSalters
True...I am bound to the .mat files though for now...I will lookup a. How to import .mat files into python and b. How to save those imported .mat files with a python ORM...If I can get both right, that would solve the problem. - Ahmad Khaled
Can you show us an example of the 'struct within struct within struct' data? Going to Python may make it possible to store and retrieve the .mat files but it doesn't mean you can search and filter on the values of fields within those files. If it's important that you can do that, you may be better off focusing on reorganising the data in MATLAB so that you can either use native data structures (see my answer) or get it into a database-friendly table format. - nekomatic

1 Answers

2
votes

Merge .mat files into one data structure

In general it's not correct that There is no simple way to merge ... .mat files into a single .mat file (since they all have the same variable names but with different data).

Let's say you have two files, data1.mat and data2.mat and each one contains two variables, a and b. You can do:

>> s = load('data1')
s = 
  struct with fields:

    a: 'foo'
    b: 3

>> s(2) = load('data2')
s = 
  1×2 struct array with fields:
    a
    b

Now you have a struct array (see note below). You can access the data in it like this:

>> s(1).a
ans =
    'foo'

>> s(2).a
ans =
    'bar'

But you can also get all the values at once for each field, as a comma-separated list, which you can assign to a cell array or matrix:

>> s.a
ans =
    'foo'
ans =
    'bar'

>> allAs = {s.a}
allAs =
  1×2 cell array
    {'foo'}    {'bar'}

>> allBs = [s.b]
allBs =
     3     4

Note: Annoyingly, it seems you have to create the struct with the correct fields before you can assign to it using indexing. In other words

s = struct;
s(1) = load('data1')

won't work, but

s = struct('a', [], 'b', [])
s(1) = load('data1')

is OK.

Build an index to the .mat files

If you don't need to be able to search on all of the data in each .mat file, just certain fields, you could build an index in MATLAB containing just the relevant metadata from each .mat file plus a reference (e.g. filename) to the file itself. This is less robust as a long-term solution as you have to make sure the index is kept in sync with the files, but should be less work to set up.

Flatten the data structure into a database-compatible table

If you really want to keep everything in a database, then you can convert your data structure into a tabular form where any multi-dimensional elements such as structs or arrays are 'flattened' into a table row with one scalar value per (suitably-named) table variable.

For example if you have a struct s with fields s.a and s.b, and s.b is a 2 x 2 matrix, you might call the variables s_a, s_b_1_1, s_b_1_2, s_b_2_1 and s_b_2_2 - probably not the ideal database design, but you get the idea.

You should be able to adapt the code in this answer and/or the MATLAB File Exchange submissions flattenstruct2cell and flatten-nested-cell-arrays to suit your needs.