I've split the problem into sub problems... The test to check it is all working as expected:
a = [ "apples"
, "oranges"
, "bananas"
, "blueberries" ]
b = [ "apples"
, "blueberries"
, "oranges"
, "bananas" ]
expected = [ (0, 0)
, (None, 1)
, (1, 2)
, (2, 3)
, (3, None) ]
assert expected == get_indexes_for_best_pairing(a, b)
print("Tests pass!")
1. Create a list of all of the pairs
Create a map of the values from list A with the list of the associated indices...
def map_list(list):
map = {}
for i in range(0, len(list)):
if not list[i] in map:
map[list[i]] = []
map[list[i]].append(i)
return map
The map
would look like...
{ "apples": [0]
, "oranges": [1]
, "bananas": [2]
, "blueberries": [3] }
Find all of the pairs by cross referencing list B...
def get_pairs(a, b):
map = map_list(a)
pairs = []
for i in range(0, len(b)):
v = b[i]
if v in map:
for j in range(0, len(map[v])):
pairs.append((map[v][j], i))
return pairs
The pairs
are as follows...
[ (0, 0)
, (3, 1)
, (1, 2)
, (2, 3) ]
2. Get the scores for each pair
Simply loop through the pairs and look up the value in the original list:
def get_pairs_scores(pairs, a):
return [len(a[i]) for i, _ in pairs]
3. Create a list of pairs that each pair excludes
For each pair find the other pairs it excludes...
def get_pairs_excluded_by_pair(pairs, i):
return [j for j in range(0, len(pairs))
if i != j
and ((pairs[i][0] - pairs[j][0]) * (pairs[i][1] - pairs[j][1]) < 0)]
def get_pairs_excluded_by_pairs(pairs):
excludes = []
for i in range(0, len(pairs)):
excludes.append(get_pairs_excluded_by_pair(pairs, i))
return excludes
The pairs_excludes
method will return...
[ []
, [2, 3]
, [1]
, [1] ]
4. Calculate the total cumulative score for each pair minus the pairs it excludes
Plus the score of the pairs that are excluded by the pairs it excludes... etc, etc.
Use a depth first algorithm to traverse a acyclic graph of excludes to get the cumulative score for each pair... This is the meat of what we need to do...
def get_cumulative_scores_for_pairs(pairs, excludes, scores):
cumulative = []
for i in range(0, len(pairs)):
score = 0
checked = set()
checked.add(i)
trail = []
current = [i, 0]
while True:
if len(excludes[current[0]]) > current[1]:
j = excludes[current[0]][current[1]]
if not j in checked:
trail.append(current)
checked.add(j)
current = [j, 0]
score = -score
else:
current[1] += 1
else:
score = scores[current[0]] - score
if len(trail):
current = trail.pop()
else:
break
cumulative.append(score)
return cumulative
This method should return an array that looks like...
[ 6
, -3
, 3
, 3 ]
5. Pick only the best pairs
We need to then store the index with the score so that we can sort on score without losing the index.
Sort the cumulative scores in order we create a list of the indices indices
...
arr = sorted([(i, cumulative[i])
for i in range(0, len(cumulative))],
key=lambda item: item[1])
It looks like...
[ (1, -3)
, (2, 3)
, (3, 3)
, (0, 6) ]
Pick the top scoring items deleting the excluded items as we go, this way we retain the best pairs and discard the worst...
def get_best_pairs(a, b):
pairs = get_pairs(a, b)
excludes = get_pairs_excluded_by_pairs(pairs)
scores = get_pairs_scores(pairs, a)
cumulative = get_cumulative_scores_for_pairs(pairs, excludes, scores)
arr = sorted([(i, cumulative[i])
for i in range(0, len(cumulative))],
key=lambda item: item[1])
top = []
while len(arr):
topitem = arr.pop()
top.append(topitem[0])
arr = [(i, score)
for i, score in arr
if i not in excludes[topitem[0]]]
return sorted([pairs[i] for i in top], key=lambda item: item[0])
Our top
list would look like, where the pair with the index 1
has been dropped because it was low scoring and excluded by higher scoring pairs...
[ (0, 0)
, (1, 2)
, (2, 3) ]
6. Build the result
Sort the selected pairs and build the result by incrementing each index to the next pair. When we run out of pairs increment until we reach the end of each list...
def get_indexes_for_best_pairing(a, b):
pairs = get_best_pairs(a, b)
result = [];
i = 0
j = 0
next = None
pair = None
while True:
if next == None:
if len(pairs):
next = pairs.pop(0)
pair = next
else:
next = (len(a), len(b))
pair = None
if i < next[0]:
result.append((i, None))
i += 1
elif j < next[1]:
result.append((None, j))
j += 1
elif pair != None:
result.append((pair[0], pair[1]));
i += 1
j += 1
next = None
else:
break
return result
And finally our result would look like...
[ (0, 0)
, (None, 1)
, (1, 2)
, (2, 3)
, (3, None) ]