How GPU handles vertex-shaders?

Question

everybody.

I have few questions:

Are vertex shaders run once for each vertex, or once per vertex per primitive (i.e. three times the number of primitives)?
How are these shaders mapped to GPU's kernels: one shader per kernel, or one primitive per kernel (still for the vertex shaders)?
If there is just one shader call per vertex, mapped to one kernel, how does the GPU keeps track with vertices/primitives dependencies? (Primitives may share some vertices ; keeping track of these dependencies – on a per-vertex base – is really costly. I can't believe the GPU does so…)

Thanks in advance for your replies.

remove CUDA tag. Vertex shaders have nothing to do with CUDA, nor do they relate in any specified way to CUDA kernels. — Robert Crovella

v.oddou v.oddou · Accepted Answer · 2014-06-26T02:17:48

The answer is it depends, the graphic card wiring tries to reduce the number of times a vertex is shaded to the minimum depending on buffering and batching. this is all explained here: http://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/ There are multiple cache lines of vertex buffer tags (hear index that are pulled from the index buffer). the IA unit pulls vertices from the index buffer and fills the cache, then when it is full (modulo primitive size), it is sent to the card scheduler hardware that will dispatch the cache line to a block of shading cores. Then the IA stage will continue to fill a new cache line in parallel of that cores unit working on the previous request. And never waits until index buffer is fully depleted or core units are all busy. When results come back they put the shaded vertice data into some piece of memory that will be referenced by primitive assembly later.
there are 2 different stages, input assembly (just followed by vertex shading) and primitive assembly which comes later. graphics pipeline is a bit more specialized than generic kernels and I doubt all stages are implemented as generic kernels. particularly on slightly older hardware, notably the ones with specialized shaded vertex output memory, they need special wiring.
check the article series, its all explained, there isn't a perfect 1-1 relation, some vertices get re-shaded if they are too far in the index buffer.