0
votes

I'm looking for a solution to the following graph problem in order to perform graph analysis in Python.

Basically, I have a directed graph of N nodes where I know the following:

  • The sum of the weights of the out-edges for each node
  • The sum of the weights of the in-edges for each node
  • Following from the above, the sum of the sum across all nodes of the in-edges equals the sum of the sum of out-edges
  • No nodes have edges with themselves
  • All weights are positive (or zero)
  • However, I know nothing about to which nodes a given node might have an edge to, or what the weights of any edges are

Represented as a weighted adjacency matrix, I know the column sums and row sums but not the value of the edges themselves. I've realized that there is not a unique solution to this problem (Does anyone how to prove that, given the above, there is an assured solution?). However, I'm hoping that I can at least arrive at a solution to this problem that minimizes the sum of the edge weights or maximizes the number of 0 edge weights or something along those lines (Basically, out of infinite choices, I'd like the most 'simple' graph).

I've thought about representing it as:

Min Sum(All Edge Weights) s.t. for each node, the sum of its out-edge weights equals the known sum of these, and the sum of its in-edge weights equals the known sum of these. Additionally, constrained such that all weights are >= 0

I'm primarily using this for data analysis in Scipy and Numpy. However, using their constrained minimization techniques, I'll end up with approximately 2N^2-2N constraints from the edge-weight sum portion, and N constraints from the positive portion. I'm worried this will be unfeasible for large data sets. I could have up to 500 nodes. Is this a feasible solution using SciPy's fmin_cobyla? Is there another way to layout this problem / another solver in Python that would be more efficient?

Thanks so much! First post on StackOverflow.

2
First off -- Welcome! Now the tougher part: Can you state the problem more simply? e.g. are you choosing some edges that span graph that minimize edge-weights? Or what? You haven't actually (yet) said what the problem is. You only say "you want to minimize the sum of the edge weights" -- but subject to what constraints? To minimize the sum of the edge weights -- choose none of the edges. Then the sum is 0. I'm not sure what you mean by the most 'simple' graph, though I appreciate the heuristic. Also -- there are not infinitely many choices.gabe

2 Answers

0
votes

Despite not knowing what your actual question is, the situation sounds like the Assignment Problem, and you should check out the Hungarian Algorithm.

0
votes

The prohibition against self-flows makes some instances of this problem infeasible (e.g., one node that has in- and out-flows of 1). Otherwise, a reasonably sparse solution with at most one self-flow always can be found as follows. Initialize two queues, one for the nodes with positive out-flow from lowest ID to highest and one for the nodes with positive in-flow from highest ID to lowest. Add a flow from the front node of the first queue to the front node of the second, with quantity equal to the minimum of the out-flow of the former and the in-flow of the latter. Update the out- and in-flows to their residual values and remove the exhausted node(s) from their queues. Since the ID of the front of the first queue increases, and the ID of the front of the second queue decreases, the only node that self-flows is the one where the ID numbers cross.

Minimizing the total flow is trivial; it's constant. Finding the sparsest solution is NP-hard; there's a reduction from subset sum where each of the elements being summed has a source node with that amount of out-flow, and two more sink nodes have in-flows, one of which is equal to the target sum. The subset sum instance is solvable if and only if no source flows to both sinks. The algorithm above is a 2-approximation.

To get rid of the self-flow on that one bad node sparsely: repeatedly grab a flow not involving the bad node and split it into two, via the bad node. Stop when we exhaust the self-flow. This fails only if there are no flows left that don't use the bad node and there is still a self-flow, in which case the bad node has in- and out-flows that sum to more than the total flow, a necessary condition for the existence of a solution. This algorithm is a 4-approximation in sparsity.