I have locally sorted queues in different blocks of cuda. Let's say that there are m blocks. Now I have some problems.
1) I need to select only k blocks of out m blocks whose heads of queue is minimum k elements out of m elements.
2) In one block I need to load into shared memory the queues of other blocks. Can this be done?
Can anyone please tell me how to do these two operations?