parallel convex hull algorithm in gpu

Question

I am implementing a divide and conquer approach to convex hull in CUDA. This is my approach: Bottom up:

Create a list of lists to store convex hulls;
curSize = size of input (all points);
for i: 1 to log N
begin
curSize = curSize / 2;
Take every 2 adjacent convex hulls in list of lists and merge them into bigger hull (using standard upper and lower common tangent approach for Divide & Conquer Convex hull) in curSize threads
//Initially it merges every 2 adjacent points in list to a hull, then in next iteration, it merges convex hulls of size 2 into bigger convex hulls and so on.., finally the list of lists will have a single list of points on the hull
end

But its getting too complicated and I feel I am not utilizing the parallel power of CUDA as at every level of the tree I create N/2^i threads whose complexity is O(N) in merging all adjacent hulls at this level. Hence net complexity is still O(N logN).

Can you tell me how to make it better or give any alternative neater parallel algorithm for convex hull (it'll be great if I can get algorithm for parallel version of graham's scan)?

Sergey Bankevich Sergey Bankevich · Accepted Answer · 2011-07-30T15:17:04

The complexity of your algorithm is still O(N) (not changed in comparison to one-threaded version) because you make 3 things:

Manage threads: O(N)
Remove some vertices from lists: amortized O(N) since there are only N points.
Look at points adjacent to removed, but not removed during current step: O(N) since there are not more than 2 such points for each merge.

But if your points are not sorted, you should better parallelize sorting.

parallel convex hull algorithm in gpu

1 Answers