2
votes

I've been looking for a way to 'productionize' R or python based Random Forest/Gradient boosting tree models, and had thought that since all the individual component decision tree are binary trees, exporting to a graphical database might be a workable solution (deploying by holding the models in memory and invoking from a lightweight restful library like Flask doesn't scale that well). Here's how a decision tree is normally traversed:

1.) Data gets passed to the root node

2.) We check if the present node is a leaf node; if it is, we return a set of attributes (the predicted distribution/value).

  1. If not, the node stores a decision rule, and checks the relevant column for which node to pass the data to next (e.g., "If age>9.5, move to left node")

  2. Repeat 2-3.

I'm new to neo4j and graph databases in general, and it wasn't clear to me that it is possible to store(and subsequently traverse) decision rules in a node; all the examples I saw tended to be in the vein of

MATCH (neo:Database {name:"Neo4j"})
MATCH (johan:Person {name:"Johan"})
CREATE (johan)-[:FRIEND]->(:Person:Expert {name:"Max"})-[:WORKED_WITH]->       (neo)

where the conditional statements are prespecified in a query. Is this something which is feasible with neo4j, and if so, which areas of the documentation should I be focusing on?

Thank you for any guidance you could provide.

2
Did you end up implementing this in neo4j? If so, I'd really appreciate an example - howMuchCheeseIsTooMuchCheese
No, I never implemented this. - torgos

2 Answers

1
votes

Interesting problem.

You need a way to export a model out of R or Python and translate that into a Neo4J graph.

The export mechanism can be PMML (if you're using R rpart package to generate prunded trees), Google protobuf (if you're using R gbm package to generate trees), or simply an Excel spreadsheet.

Parsing and unmarshalling to Neo4J is your issue.

-1
votes

I am not affiliated with Yhat in any way, but reading your question made me think of an alternative approach.

Yhat Science Ops

I don't know what that means for your team internally, but it seems like a pretty simple way to have a model easy to call via a basic API call.