5
votes

I have two lists, L1 and L2, of data containing multiple elements, each unique, of an abstract data type (ie: structs). Each of the two lists:

  • May contain between zero and one-hundred (inclusive) elements.
  • Contains no duplicate elements (each element is unique).
  • May or may not contain elements in the other list (ie: L1 and L2 might be identical, or contain completely different elements).
  • Is not sorted.
  • At the lowest level, is stored withing a std::vector<myStruct> container.

What I am typically expecting is that periodically, a new element is added to L2, or an element is subtracted/removed from it. I am trying to detect the differences in the two lists as efficiently (ie: with minimal comparisons) as possible:

  • If an entry is not present in L2 and is present in L1, carry out one operation: Handle_Missing_Element().
  • If an entry is present in L2 and not present in L1, carry out another operation: Handle_New_Element().

Once the above checks are carried out, L1 is set to be equal to L2, and at some time in the future, L2 is checked again.

How could I go about finding out the differences between the two lists? There are two approaches I can think of:

  1. Compare both lists via every possible combination of elements. Possibly O(n2) execution complexity (horrible).

bool found;
for i in 1 .. L2->length()
  found = false;
  for j in 1 .. L1->length()
    if (L1[j] == L2[i]
      // Found duplicate entry
      found = true;
    fi
  endfor
endfor
  1. Sort the lists, and compare the two lists element-wise until I find a difference. This seems like it would be in near-linear time. The problem is that I would need the lists to be sorted. It would be impractical to manually sort the underlying vector after each addition/removal for the list. It would only be reasonable to do this if it were somehow possible to force vector::push_back() to automatically insert elements such that insertions preseve the sorting of the list.

Is there a straightforward way to accomplish this efficiently in C++? I've found similar such problems, but I need to do more than just find the intersection of two sets, or do such a test with just a set of integers, where sum-related tricks can be used, as I need to carry out different operations for "new" vs "missing" elements.

Thank you.

4
Difficult to use std::vector<myStruct> in C. Suggest dropping C tag. - chux - Reinstate Monica
So, your lists are not really linked lists (as in std::list), but are actually arrays (as in std::vector)? - AnT
Do you have a comparison function for elements? (I mean operator<, not just operator==.) - Beta
@stgatilov Correct, L1 is constant. - Cloud
@Beta I do not have comparison functions. It's just a struct rather than a fully-defined class at this time. - Cloud

4 Answers

4
votes

It would be impractical to manually sort the underlying vector after each addition/removal for the list. It would only be reasonable to do this if it were somehow possible to force vector::push_back() to automatically insert elements such that insertions preseve the sorting of the list.

What you're talking about here is an ordered insert. There are functions in <algorithm> that allow you do do this. Rather than using std::vector::push_back you would use std::vector::insert, and call std::lower_bound which does a binary search for the first element not less than than a given value.

auto insert_pos = std::lower_bound( L2.begin(), L2.end(), value );
if( insert_pos == L2.end() || *insert_pos != value )
{
    L2.insert( insert_pos, value );
}

This makes every insertion O(logN) but if you are doing fewer than N insertions between your periodic checks, it ought to be an improvement.

The zipping operation might look something like this:

auto it1 = L1.begin();
auto it2 = L2.begin();

while( it1 != L1.end() && it2 != L2.end() )
{
    if( *it1 < *it2 ) {
        Handle_Missing( *it1++ );
    } else if( *it2 < *it1 ) {
        Handle_New( *it2++ );
    } else {
        it1++;
        it2++;
    }
}

while( it1 != L1.end() ) Handle_Missing( *it1++ );
while( it2 != L2.end() ) Handle_New( *it2++ );
4
votes

Can you create a hash value for your list items? If so, just compute the hash and check the hash table for the other list. This is quick, does not require sorting, and prevents your "every possible combination" problem. If your're using C++ and the STL you could use a map container to hold each list.

  • Create a hash for each item in L1, and use map to map it associate it with your list item.
  • Create a similar map for L2, and as each L2 has is created check to see if it's in the L1 map.
  • When a new element is added to L2, calculate its hash value and check to see if it's in the L1 hash map (using map.find() if using STL maps). If not then carry out your Handle_New_Element() function.
  • When an element is subtracted from the L2 list and it's hash is not in the L1 hash map then carry out your Handle_Missing_Element() function.
3
votes

A container that automatically sorts itself on inserts is std::set. Insertions will be O(log n), and comparing the two sets will be O(n). Since all your elements are unique you don't need std::multiset.

2
votes

For each element of both arrays maintain number of times it is met in the opposite array. You can store these numbers in separate arrays with same indexing, or in the structs you use.

When an element x is inserted into L2, you have to check it for equality with all the elements of L1. On each equality with y, increment counters of both elements x and y.

When an element x is removed from L2, you have to again compare it with all the elements of L1. On each equality with y from L1, decrement counter of y. Counter of x does not matter, since it is removed.

When you want to find non-duplicate elements, you can simply iterate over both arrays. The elements with zero counters are the ones you need.

In total, you need O(|L1|) additional operations per insert and remove, and O(|L1| + |L2|) operations per duplication search. The latter can be reduced to the number of sought-for non-duplicate elements, if you additionally maintain lists of all elements with zero counter.

EDIT: Ooops, it seems that each counter is always either 0 or 1 because of uniqueness in each list.

EDIT2: As Thane Plummer has written, you can additionally use hash table. If you create a hash table for L1, then you can do all the comparisons in insert and remove in O(1). BTW since your L1 is constant, you can even create a perfect hash table for it to make things faster.