Unfortunately, OpenMP cannot prevent a data race in this case. The shared clause allows all threads to see the vector variables, but it does nothing to order their accesses. Vector's push_back function is not thread-safe since it could cause the vector's underlying storage to be reallocated (to grow).
This code can be parallelized, but how well it will scale will depend on how much implementation effort you are willing to put in. To decide on an appropriate amount of effort, determine how much time this piece takes of your entire application. Here are two (of the many possible) ways to parallelize your problem:
- Low effort with solid performance - Make tempVector 1D and the same size as objVector. Instead of making 4 vectors with a list of indices into objVector, make tempVector[i] be which bin of 0-3 objVector[i] would go in. This could be done with a simple openmp parallel for. When tempVector is used later on, getting all of the values of a particular bin will involve scanning all of tempVector. If there are only 4 bins, this may actually perform quite well.
- More effort with best scalability - Give each thread it's own local tempVector and parallelize across objVector with an openmp parallel for. This way each thread can use vector's push_back function because they are the only thread accessing that vector. Merging all of local copies of tempVector into a single shared tempVector could be done by atomically adding the sizes and then copying the pieces over in bulk.