2
votes

The problem is given as follows: Given a DAG and a number 0 < p ≤ 1, return the minimum-cardinality set of vertices that disconnects at least a p-fraction of paths from a source (i.e., no incoming arcs) to a sink (i.e., no outgoing arcs). For p = 1, the problem is equivalent to minimum cut. For other values of p, however, I'm not sure what the answer will be.

An algorithm I'm thinking of is to first compute the min-cut set for a DAG and then try to prune it to satisfy the criteria. This by itself is interesting to see if the subset that we find is actually the minimum cut-set for the specific p that is given. The problem with this algorithm is that it is wasteful, because it computes many nodes that we don't need in the final answer and in fact, is solving "bigger" problem in the first place.

Any pointers to the solution of this problem? wouldn't it be possible that for some of the algorithms of min-cut, we put this extra constraint as an early-stopping criterion?

For checking how many paths are removed, suppose we have indexed each vertex (and will keep it updated if needed) so that we know how many paths get disconnected by their removal. Please do not worry for the complexity of the index being updated. One last thing, there is no constraint on the resulting components in terms of size or anything.

2
s-t paths for some given designated vertices s and t? or any paths?Niklas B.
How does a set of vertices disconnect anything? Do you mean a bipartition of the vertices, with all edges between vertices in different parts considered to be deleted?j_random_hacker
Also I'm not sure that p=1 is equivalent to minimum cut: p=1 implies all paths between all vertex pairs must be destroyed, implying all edges must be deleted. Even if you only meant "paths between vertices in different parts of the bipartition", min-cut still doesn't necessarily return a bipartition (A, B) that minimises the "minimum-cardinality set" (which I take here to be min(|A|, |B|)).j_random_hacker
@j_random_hacker Well the second scenario would actually make sense. From what I understand, OP wants to minimize the number of cut vertices, not the cardinality of one of the partitions. Of course we don't know if that's actually the scenario OP is interested inNiklas B.
In any given solution, how can we verify that the number of removed paths exceeds (p-fraction of paths) ? Isnt that a hard problem in itself?A.S.H

2 Answers

1
votes

Here's an idea for getting approximately optimal solutions.

There's a variant of set cover that I want to call partial set cover where we have sets and elements and want the minimum number of sets whose union contains a p-fraction of elements. To apply to this problem, sets correspond to nodes, and elements correspond to maximal paths. (Yes, there are too many paths to do this naively; see below.) A set contains an element if and only if the node is contained in the path.

It's not too hard to write partial set cover as an integer program.

minimize sum_{sets S} x_S
subject to
for all elements e, sum_{sets S containing e} x_S >= y_e
sum_{elements e} y_e >= p * number of elements
for all sets S, x_S in {0, 1}      # 1 if set S is chosen
for all elements e, y_e in [0, 1]  # 1 if element e is covered

This program can be solved by an integer program solver. The solvers are surprisingly good, though they of course cannot promise optimal solutions to this NP-hard set cover generalization.

In an interesting DAG, there are of course far too many paths to be enumerated. Sampling to the rescue! Since it's easy to count maximal paths in DAGs, it's easy to sample them uniformly at random. Sample a number of paths and use these as the elements in the integer program.

The tradeoff is that with more paths, the estimate of the objective is better, but the time to solve the integer program is worse. Hoeffding's inequality gives some hint as to the proper choice of the number of samples. With n vertices, there are 2^n possible solutions. We want the estimated fraction for each of these to be accurate to within some epsilon. Hoeffding says that we need to choose the number of samples to be Theta(n/epsilon^2) so that, almost all of the time, all of the estimates are approximately correct. I'd work out the exact constant, but I doubt that it's practically relevant.

0
votes

EDIT1: after seeing the OP's update, this solution applies to the case of one source u and one sink v.

EDIT2: this is actually a heuristic, see the counter-example in the comments below.

This is a O(V+E) solution based on the method of counting paths provided in the following thread (pointed out by David Eisenstat, thanks) :

Number of paths between two nodes in a DAG

In a first phase, we will apply exactly the “backward” method suggested by stalker. At the end of this phase, we will obtain and store the following information:

  • For each vertex i, the number F(i, v) of paths from i to v

  • F(u, v), the number of paths from source u to sink v.

In a second phase, we proceed forward with the same method: we simulate the algorithm as if the question was to find the “backward paths” from v to u. At the end, we obtain:

  • For each vertex i, the number B(i, u) of backward paths from i to u. Obviously B(i, u) = F(u, i)

  • B(v, u) which is equal to F(u, v).

In the third phase, we compute for each vertex i, the number P(i) = F(u, i) * F(i, v). It is easy to prove that the number of (u, v) paths that traverse i is P(i). Therefore, removing the vertex i results in removing P(i) paths.

In the fourth and last phase, we proceed in a greedy way: remove the vertex that has the highest P(i) until the total number of removed paths exceeds p*F(u, v)

The overall algorithm is O(V+E), since:

  • According to the reference post, the first two phases are O(V+E).

  • The third and fourth phases are obviously O(V), hence O(V+E)