I am programming a simple heapify algorithm. It needs to work on big data size, the size being 100,000 and the numbers in the range 0 to 10^9.
The algorithm works for size somewhere around 25,000. But if data set becomes greater than that, it starts to throw segmentation faults.
I have used unsigned long long int throughout, so I dont see where the problem is. I am using vectors to store my data, and all data is of types long long int.
Has anyone encountered such problems before ?
Here is the heapify procedure I'm using :
vector<unsigned long long> array;
vector<unsigned long long> orig_index;
vector<unsigned long long> new_index;
unsigned long long heapify (unsigned long long count) {
unsigned long long temp, left, right, min;
long long j;
unsigned long long n_swaps = 0;
for(long long i=(count%2 ? (count-2)/2 : (count-1)/2 ); i>=0; --i) {
left= (2*i)+1 ;
right= (2*i)+2 ;
j= i;
while( ( (array[j] > array[left]) && left<count) || ( (array[j] > array[right]) && (right<count) ) ){
//Swap
//Find lesser of left or right
if(right>= count) {
min= array[left];
} else {
min= array[left] > array[right] ? array[right] : array[left];
}
if(array[j] > array[left] && (min== array[left])) {
temp= array[j];
array[j]= array[left] ;
array[left]= temp;
//Update indexes
orig_index.push_back(j);
new_index.push_back(left);
j= left;
right= (2*left)+2 ;
left= (2*left)+1 ;
++n_swaps;
}
else if ( (right < count) && (array[j] > array[right]) ) {
temp= array[j];
array[j]= array[right] ;
array[right]= temp;
//Update indexes
orig_index.push_back(j);
new_index.push_back(right);
j= right;
left= (2*right)+1 ;
right= (2*right)+2 ;
++n_swaps;
}
}
}
return n_swaps;
}
Here is the random data generator function I'm using. Count is size of data here (like 20k or 30k) and max is the range.
void generate(unsigned long long count, unsigned long long max) {
srand( time(NULL) );
//Dummy array of max size
vector<unsigned long long> dummy;
//Populate the dummy
for(unsigned long long i=0; i<max; ++i) {
dummy.push_back(i);
}
//Select random number from dummy, swap with last and pop
unsigned long long temp;
unsigned long long swap;
unsigned long long dummy_size= max-1;
cout<<"****************Indices************"<<endl;
for(unsigned long long i=0; i<count; ++i) {
temp= rand() % dummy_size ;
cout<<temp<<endl;
array.push_back( dummy[temp] );
//Swap and pop
swap= dummy[temp];
dummy[temp] = dummy[dummy_size];
dummy[dummy_size] = swap;
--dummy_size;
}
cout<<"*************End*****************"<<endl;
dummy.clear();
}
The main function
int main(void) {
unsigned long long count= 25000;
unsigned long long max= 1000000 ;
//Generate random numbers and push on array
generate(count, max);
//Print array
for(unsigned long long i=0; i<array.size(); ++i) {
cout<<array[i]<<" ";
}
cout<<endl;
//Build heap
unsigned long long n_swaps = heapify(count);
cout<<n_swaps<<"\n";
for(unsigned long long i=0; i<orig_index.size(); ++i) {
cout<<orig_index[i]<<" "<<new_index[i]<<endl;
}
return 0;
}
I hope the algorithms are correct, but just cant find why segmentation fault is happening for big data sets, and not small ones.
vector::at()
when accessing your vector items instead of[ ]
to ensure you are not going out of bounds (this may not be a "big data" issue). If you are going out-of-bounds, anout_of_range
exception will be thrown instead of a segmentation fault, thus giving you more information. Also, where is youmain
function showing how you're calling these functions? - PaulMcKenzieat()
instead of[ ]
. Thewhile
loop, and the code that swaps. - PaulMcKenzie[ ]
and going out of bounds is undefined behavior. Your code really wasn't working, but you believed it was. Usingat()
will always throw an exception as soon as you go out of bounds -- no random segmentation faults can happen. - PaulMcKenzie