In a word, no you can't. The OpenCL paradigm is a data parallel one where workgroups are intended to be independent. The only workgroup scope synchronization mechanism is at the command queue level, ie. separate kernel launches. If you algorithm can't accommodate that, you either need a new algorithm, or use a different programming model.
You need to keep in mind that there are often far more workgroups than hardware to execute them simultaneously. Synchronization in such cases is impossible. There are ways to implement a spinlock or critical section across a hardware dependent number of work groups using atomic memory access primitives, however they are really an abuse of the programming model and tend to only be useful where there is relatively little interaction between workgroups.