3
votes

I'm doing a research project on random forest algorithm. I have found numerous implementations of the algorithm but the main part of the code is often written in Fortran while I'm completely naive in it.

I have to edit the code, change the main parameters (like tree depth, num of feature variables, ...) and trace the algorithm's performance during each run.

Currently I'm using "Windows-Precompiled-RF_MexStandalone-v0.02-". The train and predict functions are matlab mex files and can not be opened or edited. Can anyone give me a piece of advice on what to do or is there a valid and completely matlab-based version of random forests.


I've read the randomforest-matlab carefully. The main training part unfortunately is a dll file. Through reading more, most of my wonders is now resolved. My question mainly was how to run several trees simultaneously.

4

4 Answers

5
votes
2
votes

If you're doing a research project on it, the best thing is probably to implement the individual tree training yourself in C and then write Mex wrappers. I'd start with an ID3 tree (before attempting C4.5 for instance.) Then write the random forest code itself, which, once you write the tree code, isn't all that hard.

You'll:

  1. learn a lot
  2. be able to modify them as much as you like
  3. eventually move on to exploring new areas with them

I've implemented them myself from scratch so I can help once you post some of your own code. But I don't think anybody on this site will write the code for you.

Will it take effort? Yes. Will you come out of it with more knowledge and ability than you had going in? Undoubtably.

1
votes

There is a nice library in R called randomForest. It is based on the original implementation of Breiman in Fortran but it is now mainly recoded in C.

http://cran.r-project.org/web/packages/randomForest/index.html

The main parameters you talk about (tree depth, number of features to be tested, ...) are directly available.

0
votes

Another library I would recommend is Weka. It is java based and lucid.Performance is slightly off though compared to R. The source code can be downloaded from http://www.cs.waikato.ac.nz/ml/weka/