I have faced the following problem:
- there are two disjoint sets,
A
andB
- for each pair of elements (
a
,b
) (a
belongs to setA
, whereb
belongs to setB
) there a probabilitypij
is known in advance. It represents the probability (certainty level) thata
matchesb
, or in other words, how closelya
matchesb
(and vice-versa, becausepij
==pji
). - I have to find a matching with the highest probability/certainty and find out pairs (
a
,b
) which describe the matching - every element must be matched / paired with another from the other set exactly once (like in the standard bipartite matching problem)
- if possible, I would like to compute a number which approximately expresses the uncertainty level for the obtained matching (let's say that 0 represents random guess and 1 represents certainty)
A simple practical example in which such algorithm is required is described below (this is not actually the problem I am solving!):
- two people are asked to write letters a - z on a piece of paper
- for each pair of letters (
a
,b
) we run a pattern matcher to determine the probability that lettera
written by personA
represents letterb
wrote by personB
. This gives us the probability matrix which expresses some kind of similarity correlation for each pair of letters (a
,b
) - for each letter that person
A
wrote, we need to find the corresponding letter written by personB
Current approach:
I am wondering if I could just assign weights which are proportional to the logarithm of certainty level / probability that element a
from set A
matches element b
from set B
and then run maximum weighted bipartite matching to find the maximum sum. The logarithm is because I want to maximize the total probability of multiple matching, and since single matches (represented as pairs of matched elements a
- b
) form a chain of events, which is a product of probabilities, by taking the logarithm we converts this to a sum of probabilities, which is then easily maximized using an algorithm for weighted bipartite matching, such as Hungarian algorithm. But I somehow doubt this approach would ensure the best matching in terms of statistical expected maximum.
After searching a bit, the closest problem I found was a two-stage stochastic maximum weighted matching problem, which is NP-hard, but I actually need some kind of "one-stage" stochastic maximum weighted matching problem.