Problem
I am using sort_by_key with the values being passed using a zip_iterator. This sort_by_key is called many times, and after a certain iteration it becomes 10x slower! What is the cause of this drop in performance?
Symptom
I am sorting 3 vectors using sort_by_key, one of them acts as the key vector:
struct Segment
{
int v[2];
};
thrust::device_vector<int> keyVec;
thrust::device_vector<int> valVec;
thrust::device_vector<Segment> segVec;
// ... code which fills these vectors ...
thrust::sort_by_key( keyVec.begin(), keyVec.end(),
make_zip_iterator( make_tuple( valVec.begin(), segVec.begin() ) ) );
The size of the vector is usually about 4 million. In the initial 2 times it is called, the sort_by_key takes 0.04s, in loop 3 it takes 0.1s and then degrades further to 0.3s for the rest of the loops. Thus, we see a 10x degradation in performance.
Extra Information
To ensure that the only factor of degradation was sort_by_key, I replaced the above with manual sorting using a handwritten kernel:
thrust::device_vector<int> indexVec( keyVec.size() );
thrust::sequence( indexVec.begin(), indexVec.end() );
// Sort the keys and indexes
thrust::sort_by_key( keyVec.begin(), keyVec.end(), indexVec.begin() );
thrust::device_vector<int> valVec2( keyVec.size() );
thrust::device_vector<Segment> segVec2( keyVec.size() );
// Use index array and move vectors to destination
moveKernel<<< x, y >>>(
toRawPtr( indexVec ),
indexVec.size(),
toRawPtr( valVec ),
toRawPtr( segVec ),
toRawPtr( valVec2 ),
toRawPtr( segVec2 ) );
// Swap back into original vectors
valVec.swap( valVec2 );
segVec.swap( segVec2 );
This handwritten sort takes 0.03s and this performance is consistent across all iterations, unlike the performance drop seen with sort_by_key and zip_iterator.