Is it possible to extract the formulas of the trained machine learning models in python?

Question

In my project I should use classifiers to predict one of 8 classes depending on 6 input values. I have to compare between all the supervised learning classifieres on a device which runs only C++ code. So I use python to teach/fit the machine learning models, but i need to figure out the ultimate formula for each classifier to run it on C++. Is there any way to get these formulas/code from the model?

The used machine learning algorithms:

Support Vector Machines
Naive Bayes
Linear regression
Linear discriminant analysis
Decision trees
K-nearest neighbor algorithm
Logistic regression
Neural networks
Gradient Boosting Algorithms
Random Forest.

What ML framework are you using in Python? You might be able to actually run the model as is on the device by exporting and it and loading it in the C++ executable. — Frank
The ML framework is scikit learn. actually i want run it on an ultra low power device. so i need to use the formula itself, and perhaps optimizing it. — Majd Addin
I don't think scikit learn has a good mechanism to export to a compilable language. You would be much better off using something like TensorFlow which would allow you to export the models and load them using the C++ API. re-implementing all these algorithms in a way that performs reasonably on a low power device would be a LOT of work — Frank
you are right. but I think just using the classifier (that is kind of formula) shouldn't be that complicated. for example the neural network formula is about: sum += weights * inputs (for 2 or 3 layers on each node). I mean the implementation itself should be simple regardless the computation time. (Thanks for your comments) — Majd Addin

Jon Nordby Jon Nordby · Accepted Answer · 2018-06-23T20:57:49

There is no general mechanism for converting a Python machine learning model to C++ code, as the Python code needs a full runtime implementation.

I also needed to run classifiers on low-power embedded devices / microcontrollers. And have started implementing some of your listed algorithms in embedded-friendly C, based on models trained in scikit-learn.

Naive Bayes: embayes
Random Forests / Decision trees: emtrees. Eventually also gradient boosted trees (XGBoost, LightGBM).

There are some other embedded-friendly classifier projects available:

Neural networks. uTensor allows to run TensorFlow Lite models on ARM Cortex using CMSIS-NN.
K-nearest neighbor (kNN). Classic kNN is very simple to implement. But since it stores all the training samples, the the model size is typically problematic for embedded devices. Many alternatives have been proposed, for example ProtoNN. Implemented in ELL

For the other algorithms you can find various C/C++ implementations around, but most are intended for use with an operating system (like Linux). Depending on how constrained your device is, it might be possible to reuse these. Then you only have to implement model export from Python and model import into the C++ library.

Is it possible to extract the formulas of the trained machine learning models in python?

1 Answers