I'm looking for a solution to the following graph problem in order to perform graph analysis in Python.
Basically, I have a directed graph of N nodes where I know the following:
- The sum of the weights of the out-edges for each node
- The sum of the weights of the in-edges for each node
- Following from the above, the sum of the sum across all nodes of the in-edges equals the sum of the sum of out-edges
- No nodes have edges with themselves
- All weights are positive (or zero)
- However, I know nothing about to which nodes a given node might have an edge to, or what the weights of any edges are
Represented as a weighted adjacency matrix, I know the column sums and row sums but not the value of the edges themselves. I've realized that there is not a unique solution to this problem (Does anyone how to prove that, given the above, there is an assured solution?). However, I'm hoping that I can at least arrive at a solution to this problem that minimizes the sum of the edge weights or maximizes the number of 0 edge weights or something along those lines (Basically, out of infinite choices, I'd like the most 'simple' graph).
I've thought about representing it as:
Min Sum(All Edge Weights) s.t. for each node, the sum of its out-edge weights equals the known sum of these, and the sum of its in-edge weights equals the known sum of these. Additionally, constrained such that all weights are >= 0
I'm primarily using this for data analysis in Scipy and Numpy. However, using their constrained minimization techniques, I'll end up with approximately 2N^2-2N constraints from the edge-weight sum portion, and N constraints from the positive portion. I'm worried this will be unfeasible for large data sets. I could have up to 500 nodes. Is this a feasible solution using SciPy's fmin_cobyla? Is there another way to layout this problem / another solver in Python that would be more efficient?
Thanks so much! First post on StackOverflow.