Computing efficiently intersection of n sets

Question

I have n sets identified by setId and each one could contain an arbitrary number of elements that is a pair (elementId, priority).

My algorithm should take in input two setId, and give in output a set containing the first m elements which are in the intersection of the two input set and have the highest priority (sum of priorities).

Example:

n=3, m=1

Set1: { (1, 1), (12, 2) }
Set2: { (1, 4), (23, 6), (33, 22) }
Set3: { (33, 1), (1, 16 }


Input: Set2, Set3
Output: { (33, 23) }

My question is: assuming that I have infinite space what is the best data structures I can use in order to optimize performance?

Of course precomputing all possible intersection isn't a valid answer.

Edit:

Realistic numbers:

n, sets number, is ~ 10^6
average cardinality of sets is ~ 5*10^3.

Give us a realistic number of sets and set elements. With a toy example it's rather pointless. And what do you mean by "sum of priorities"? Doesn't make sense. You have sets of tuples. — gnasher729
Sort by elementId then step though both sets simultaneously and add to output set when inputs match (if you see what I mean) — Skizz
@Skizz: this is my approach, but it doesn't take into account priority, and it is quite complex c=|Set1|+|Set2|, O(c)+O(c * log(c)) — marka.thore
@Paddy3118: elementId doesn't have an upper bound, priority could be limited even to 3 values: { 1, 2, 3 } — marka.thore

Mark Ransom Mark Ransom · Accepted Answer · 2015-02-24T16:03:44

Take one of the sets and convert it to a hash map. Iterate the other set, and for each member try to look up the element in the hash map. If you find it, add the result to a heap; if the size of the heap grows to one greater than the number of elements you wish to keep, throw away the lowest item in the heap.

Computing efficiently intersection of n sets

1 Answers