0
votes

I have faced the following problem:

  • there are two disjoint sets, A and B
  • for each pair of elements (a, b) (a belongs to set A, where b belongs to set B) there a probability pij is known in advance. It represents the probability (certainty level) that a matches b, or in other words, how closely a matches b (and vice-versa, because pij == pji).
  • I have to find a matching with the highest probability/certainty and find out pairs (a, b) which describe the matching
  • every element must be matched / paired with another from the other set exactly once (like in the standard bipartite matching problem)
  • if possible, I would like to compute a number which approximately expresses the uncertainty level for the obtained matching (let's say that 0 represents random guess and 1 represents certainty)

A simple practical example in which such algorithm is required is described below (this is not actually the problem I am solving!):

  • two people are asked to write letters a - z on a piece of paper
  • for each pair of letters (a, b) we run a pattern matcher to determine the probability that letter a written by person A represents letter b wrote by person B. This gives us the probability matrix which expresses some kind of similarity correlation for each pair of letters (a, b)
  • for each letter that person A wrote, we need to find the corresponding letter written by person B

Current approach: I am wondering if I could just assign weights which are proportional to the logarithm of certainty level / probability that element a from set A matches element b from set B and then run maximum weighted bipartite matching to find the maximum sum. The logarithm is because I want to maximize the total probability of multiple matching, and since single matches (represented as pairs of matched elements a - b) form a chain of events, which is a product of probabilities, by taking the logarithm we converts this to a sum of probabilities, which is then easily maximized using an algorithm for weighted bipartite matching, such as Hungarian algorithm. But I somehow doubt this approach would ensure the best matching in terms of statistical expected maximum.

After searching a bit, the closest problem I found was a two-stage stochastic maximum weighted matching problem, which is NP-hard, but I actually need some kind of "one-stage" stochastic maximum weighted matching problem.

1

1 Answers

1
votes

I wonder if you can use MaxFlow/MinCut. I can't prove it's optimal at the moment, but your problem may be NP-hard anyway. You can use MF/MC to find a perfect matching when you have a bipartite graph with V=(A,B) by creating a source connected to all nodes in A with a weight of 1 and a sink connected to all nodes in B with weight 1. I'm proposing you make the weights of edges that cross from A to B be the probabilities you mentioned above. What do you think?