0
votes

Hi there a Matlab guru!

I started learning MATLAB somewhere a month ago (after my trial license got expired I switched to octave). I'm writing a function (simply for educational needs) for calculating an entropy (e.g. in leafs of decision trees), and I'm stuck. I get an error below:

>> my_entropy(cell3, false)
f = -0
f =

  -0  -0

f =

  -0  -0   3

error: my_entropy: A(I,J): column index out of bounds; value 2 out of bound 1
error: called from:
error:   C:\big data\octave\my_entropy.m at line 29, column 13

Updated 5.04.15 to @Daniel suggestion

# The main difference between MATLAB bundled entropy function
# and this custom function is that they use a transformation to uint8
# and the bundled entropy() function is used mostly for signal processing
# while I simply use a straightforward solution usefull e.g. for learning trees

function f = my_entropy(data, weighted)
  # function accepts only cell arrays;
  # weighted tells whether return one weighed average entropy
  # or return a vector of entropies per bucket
  # moreover, I find vectors as the only representation of "buckets"
  # in other words, vector = bucket (leaf of decision tree)
  if nargin < 2
    weighted = true;
  end;

  rows = @(x) size(x,1);
  cols = @(x) size(x,2);

  if weighted
    f = 0;
  else
    f = [];
  end;

  for r = 1:rows(data)

    for c = 1:cols(data{r}) # in most cases this will be 1:1

      omega = sum(data{r,c});
      epsilon = 0;

      for b = 1:cols(data{r,c})
        epsilon = epsilon + ( (data{r,c}(b) / omega) * (log2(data{r,c}(b) / omega)) );
      end;

      entropy = -epsilon;

      if weighted
        f = f + entropy
      else
        f = [f entropy]
      end;

    end;

  end;

end;

# test cases

cell1 = { [16];[16];[2 2 2 2 2 2 2 2];[12];[16] }
cell2 = { [16],[12];[16],[2];[2 2 2 2 2 2 2 2],[8 8];[12],[8 8];[16],[8 8] }
cell3 = { [16];[16];[2 2 2 2 2 2 2 2];[12];[16] }

For input

c = { [16];[16];[2 2 2 2 2 2 2 2];[12];[16] }

the answer of my_entropy(c, false) should be

[0, 0, 3, 0, 0]

This picture can help to visualize

Marbles as data

One bucket is a one matlab vector, whole palet is a matlab cell sheet, numbers are count of distinct various data. So, in this picture middle cell {2,2} have entropy 3, while other buckets (cells) have entropy 0.

Help for suggesting how to fix it is appreciated, Best Regards! :)

1

1 Answers

0
votes

The error is here for c = 1:cols(cell{r})

You want the number of cols of cell, this is cols(cell). What you wrote returns the number of cols for the r-th element of cell.

You should avoid using variable names which are equal to build in functions like cell