Intro
I am trying to render squares in DirectX 11 in the most efficient way. Each square has a color (float3) and a position (float3). Typical count of squares is about 5 millions.
I tried 3 ways:
- Render raw data
- Use geometry shader
- Use instanced rendering
Raw data means, that each square is represented as 4 vertices in vertex buffer and two triangles in index buffer.
Geometry shader and instanced rendering mean, that each square has just one vertex in vertex buffer.
My results (on nvidia GTX960M) for 5M squares are:
- Geometry shader 22 FPS
- Instanced rendering 30 FPS
- Raw data rendering 41 FPS
I expected that geometry shader is not the most efficient method. On the other hand I am surprised that Instanced rendering is slower than raw data. Computation in vertex shader is exactly the same. It is just multiplication with transform matrix stored in constant buffer + addition of Shift variable.
Raw data input
struct VSInput{
float3 Position : POSITION0;
float3 Colot : COLOR0;
float2 Shift : TEXCOORD0;// This is xy deviation from square center
};
Instanced rendering input
struct VSInputPerVertex{
float2 Shift : TEXCOORD0;
};
struct VSInputPerInstance{
float3 Position : POSITION0;
float3 Colot : COLOR0;
};
Note
For bigger models (20M squares) is more efficient instanced rendering (evidently because of memory traffic).
Question
Why is instanced rendering slower (in case of 5M squares), than raw data rendering? Is there another efficient way how to accomplish this rendering task? Am I missing something?
Edit
StrcturedBuffer method
One of possible solutions is to use StructuredBuffer
as @galop1n suggested (for details see his answer).
My results (on nvidia GTX960M) for 5M squares
- StructuredBuffer 48 FPS
Observations
- Sometimes I observed that StructuredBuffer method was oscilating between 30 FPS - 55 FPS (accumulated number from 100 frames). It seems to be little unstable. Median is 48 FPS. I did not observe this using previous methods.
- Consider balance between draw calls and StructuredBuffer sizes. I reached the fastest behavior, when I used buffers with 1K - 4K points, for smaller models. When I tried to render 5M square model, I had big number of draw calls and it was not efficient (30 FPS). The best behavior I observe with 5M squares was with 16K points per buffer. 32K and 8K points per buffer seemed to be slower settings.