Below is a simple vertex and fragment shader combo in metal
that renders 64 identical 2D quads.
vertex VertexOut vertexMain(uint k [[ vertex_id ]],
uint ii [[instance_id]],
device float2* tex [[buffer(2)]],
device float2* position [[buffer(1)]],
device float* state [[buffer(0)]]){
VertexOut output;
int i = 4*ii+1;
float2 pos = position[k];
pos *= float2(state[i+2],state[i+3]);
pos += float2(state[i],state[i+1]);
pos.x *= state[0];
output.position = float4(pos,0,1);
output.tex = tex[k]*float2(du,dv);
return output;
};
fragment float4 fragmentMain(VertexOut input [[stage_in]],
texture2d<float> texture [[texture(0)]],
sampler sam [[sampler(0)]] ){
return texture.sample(sam, input.tex);
};
The sampler is using normalized coordinates so du
and dv
can range from 0 to 1 and control how large of a clip of the texture will be sampled starting at the lower left corner.
It seems I have a misunderstanding about how sampling works in metal. I would expect the computational cost to remain constant no matter what values du
and dv
hold. However as I increase du
and dv
to 1 the frame rate drops. I am not using any mipmapping nor am I changing the size of the quads that are rasterized on screen. The affect is more dramatic with linear filtering but happens with nearest filtering as well. It seems to me that since the number of pixels drawn to the screen is the same then the load on the GPU should not depend on du
and dv
. What am I missing?
EDIT: Here is my sampler and color attachment:
let samplerDescriptor = MTLSamplerDescriptor()
samplerDescriptor.normalizedCoordinates = true
samplerDescriptor.minFilter = .linear
samplerDescriptor.magFilter = .linear
let sampler = device.makeSamplerState(descriptor: samplerDescriptor)
let attachment = pipelineStateDescriptor.colorAttachments[0]
attachment?.isBlendingEnabled = true
attachment?.sourceRGBBlendFactor = .one
attachment?.destinationRGBBlendFactor = .oneMinusSourceAlpha