3
votes

I have n sets identified by setId and each one could contain an arbitrary number of elements that is a pair (elementId, priority).

My algorithm should take in input two setId, and give in output a set containing the first m elements which are in the intersection of the two input set and have the highest priority (sum of priorities).

Example:

n=3, m=1

Set1: { (1, 1), (12, 2) }
Set2: { (1, 4), (23, 6), (33, 22) }
Set3: { (33, 1), (1, 16 }


Input: Set2, Set3
Output: { (33, 23) }

My question is: assuming that I have infinite space what is the best data structures I can use in order to optimize performance?

Of course precomputing all possible intersection isn't a valid answer.

Edit:

Realistic numbers:

  • n, sets number, is ~ 10^6
  • average cardinality of sets is ~ 5*10^3.
1
Give us a realistic number of sets and set elements. With a toy example it's rather pointless. And what do you mean by "sum of priorities"? Doesn't make sense. You have sets of tuples.gnasher729
Sort by elementId then step though both sets simultaneously and add to output set when inputs match (if you see what I mean)Skizz
@Skizz: this is my approach, but it doesn't take into account priority, and it is quite complex c=|Set1|+|Set2|, O(c)+O(c * log(c))marka.thore
What are the limits on elementId and priority?Paddy3118
@Paddy3118: elementId doesn't have an upper bound, priority could be limited even to 3 values: { 1, 2, 3 }marka.thore

1 Answers

3
votes

Take one of the sets and convert it to a hash map. Iterate the other set, and for each member try to look up the element in the hash map. If you find it, add the result to a heap; if the size of the heap grows to one greater than the number of elements you wish to keep, throw away the lowest item in the heap.