My program shall perform a parallel distinct rotation of words and texts.
If you do not know what this means: Rotations of "BANANA" are
- BANANA
- ANANAB
- NANABA
- ANABAN
- NABANA
- ABANAN
(simply put the first letter to the end.)
vector<string> rotate_sequentiell( string* word )
{
vector<string> all_rotations;
for ( unsigned int i = 0; i < word->size(); i++ )
{
string rotated = word->substr( i ) + word->substr( 0,i );
all_rotations.push_back( rotated );
}
if ( verbose ) { printVec(&all_rotations, "Rotations"); }
return all_rotations;
}
We should be able to make this parallel. Instead of moving just one letter to the end, I want to move two letters at once to the end, so for example, we take BANANA Take te "BA" to the end and get NANA BA, which is the third entry in the list above.
I implemented it like this
vector<string> rotate_parallel( string* word )
{
vector<string> all_rotations( word->size() );
#pragma omp parallel for
for ( unsigned int i = 0; i < word->size(); i++ )
{
string rotated = word->substr( i ) + word->substr( 0,i );
all_rotations[i] = rotated;
}
if ( verbose ) { printVec(&all_rotations, "Rotations"); }
return all_rotations;
}
I pre-calculated the number of possible rotations and used the #pragma omp parallel for, so it should do what I think it does.
To test these functions, I have a 40KB large text-file which is meant to be "rotated". I wanna have all the distinct rotations of a giant text.
What happens now is, that the sequential procedure tooks like 4.3 seconds and the parallel tooks like 6.5 seconds.
Why is that so? What am I doing wrong?
This is how I measure time:
clock_t start, finish;
start = clock();
bwt_encode_parallel( &glob_word, &seperator );
finish = clock();
cout << "Time (seconds): "
<< ((double)(finish - start))/CLOCKS_PER_SEC;
I compile my code with
g++ -O3 -g -Wall -lboost_regex -fopenmp -fmessage-length=0
std::string
allocates memory dynamically, and these allocations need synchronisation. You're probably looking at the serial time plus the synchronisation overhead. - molbdniloclock()
to measure how much time your program is taking. Useomp_get_wtime()
. You're currently measuring CPU time when you mean to be measuring wall time. This is absolutely your main problem. - NoseKnowsAll