2
votes

I've got a database with approximately 200 000 items, which is sorted by username. Now when I add an item to end of array and call my quick sort function to sort that array it takes almost a second to sort, which is not acceptable. There are definitely quite some optimisations that can be done. For example if I sequentially compare each string from n-1 to 0, and then move items accordingly performance is much greater.

Other idea is that I could perform binary search from 0 to n-1, well not infact search, but something similar to take advantage of my already sorted array. However I've failed to write a proper function that would return an index where my new element should be placed.

void quick_sort(int left, int right)
{
    int i = left, j = right;
    if (left >= right) return;
    char  pivotC[128];
    DataEntry *tmp;

    strcpy_a(pivotC, sizeof pivotC, User[(left + right) / 2]->username);

    while (i <= j)
    {
        while (StringCompare(User[i]->username, pivotC))
            i++;
        while (StringCompare(pivotC, User[j]->username))
            j--;
        if (i <= j) 
        {
            tmp = User[i];
            User[i] = User[j];
            User[j] = tmp;
            i++;
            j--;
        }
    }
    if (left < j)
        quick_sort(left, j);
    if (i < right)
        quick_sort(i, right);
}

Any help is greatly appreciated.

8
yup , you can use binary searck - m0bi5
Use STL containers, like std::map. If you can't use them, read about balanced search trees and use binary search - Basile Starynkevitch
Why aren't you using std::sort()? - sashoalm

8 Answers

8
votes

the solution is to rewrite your code to use the stl, I don't understand why people write C code in C++.

You need a vector of User

std::vector<User> users;
//then you can keep it ordered at each insertion
auto it = upper_bound(users.begin(), users.end(), user_to_insert, 
    [](auto& lhs, auto& rhs ) { /* implementation left to the reader */});
users.insert(it, user_to_insert);

You now have the same functionality in a much nicer and clean way

3
votes

Reinventing the wheel is fine if you want to learn how to code binary search, otherwise reusing is better.

std::lower_bound performs a binary search on a sorted range [first, last), returning an iterator to the searched element x if already present; otherwise the iterator would be pointing to the first element greater than x. Since standard containers' exposing an insert would insert before the iterator, this iterator can be used as-is. Here's a simple example.

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>

int main()
{
    std::list<int> data = { 1, 5, 7, 8, 12, 34, 52 };

    auto loc = std::lower_bound(data.begin(), data.end(), 10);
    // you may insert 10 here using loc
    std::cout << *loc << '\n';

    loc = std::lower_bound(data.begin(), data.end(), 12);
    // you may skip inserting 12 since it is in the list (OR)
    // insert it if you need to; it'd go before the current 12
    std::cout << *loc << '\n';
}

12

12

1
votes

Easy , direct method cause binary searching is too mainstream. Just need a few lines:

int where_to_add(int array[], int element)
{
    int i;
    for (i = length; i >= 0 && array[i-1] > element; i--);
    return i;
}

Let me know if this is the answer you were looking for

1
votes

Binary search will be of limited interest, as you will need to insert anyway and this will remain a time consuming operation (O(N)). So your first idea of a linear search followed by insertion is good enough; you can combine in a single backward loop. (This is a step of StraightInsertionSort.)

The truly efficient ways to handle dynamic sorted lists are by maintaining a balanced tree or using a hash table.

1
votes

If you're sorting a sorted list with only a few new out of place trailing items then you should take advantage of the rare case in which insertion sort actually works efficiently. Implementing insertion sort on a sorted list with only a few trailing out of place values can sort in O(n) time. You're just inserting your few out of place values into place, while quick sort is picking a pivot and going through the entire quick sort process. Also, if you're not incorporating some type of efficient pivot selection process into your quick sort, and going with some "average of first 3 items" approach on an already sorted list you're going to be sorting in O(n^2) time.

0
votes

You can do binary search like this way.. Here You can assume that if val is string type then compare using string comparison function and int AR[] is set of string or You can map them to integer. As the array is sorted , I think binary search will give you the best performance.

int bsearch(int AR[], int N, int VAL)
{
    int Mid,Lbound=0,Ubound=N-1;

    while(Lbound<=Ubound)
    {
        Mid=(Lbound+Ubound)/2;
        if(VAL>AR[Mid])
            Lbound=Mid+1;
        else if(VAL<AR[Mid])
            Ubound=Mid-1;
        else
            return Mid;
    }

    return 0;
}
0
votes

From what I can see, you're using a C array to store your entries, which means a big penalty in performance with huge number of entries whenever you try to insert an new entry because you may need to move a lot of entries in the array.

If you plan to keep a C array and not using some stl ordered containers (mostly thinking about std::map though), you may try to split your C array into two arrays. One will be a first array containing your key and an index to an element of the second array. You still need to sort the first array but its element is only two words (one for key, one for index) instead of a big block including key and some values) and should be faster. When inserting an item, you allocate at the end of the second array and take the index to insert it as a pair with key inside the first array. If you plan to remove an element dynamically, you can be a little smarter but your question appears not to cover it.

But even so, it might be still too slow, so you should indeed consider std::map or some algorithms like binary tree using AVL, Red Black tree, Splay tree, etc. where you do not need to move element physically.

-1
votes
int add(Container c, int r, int l, Unit t)
{
    if(c[r]>t)
        return r;
    if(c[l]<t)
        return l+1;
    if(c[r]==c[l])
    {
         if(c[r]==t)
            return -1;
         return -1;
    }
    int m=(r+l)/2;
    if(c[m]==t)
          return -1;
    if(c[m]>t)
          return add(c,m,l,t);
    if(c[m]<t)
          return add(c,r,m,t);
}

It will probably give you the index you need to add...I hope it can help.It assumes you do not need to add when its already in.