3
votes

Alright, so this has been bugging me for a while now, and could not find anything on MSDN that goes into the specifics that I need.

This is more of a 3 part question, so here it goes:

1-) When creating the swapchain applications specify backbuffer pixel formats, and most often is either B8G8R8A8 or R8G8B8A8. This gives 8 bit per color channel so a total of 4 bytes is used per pixel....so why does the pixel shader has to return a color as a float4 when float4 is actually 16 bytes?

2-) When binding textures to the Pixel Shader my textures are DXGI_FORMAT_B8G8R8A8_UNORM format, but why does the sampler need a float4 per pixel to work?

3-) Am I missing something here? am I overthinking this or what?

Please provide links to to support your claim. Preferably from MSDN!!!!

1
The pixel shader returns single-precision float so that it can be output to any format backbuffer.Chuck Walbourn
Fancy seeing you here Chuck, big fan of DxTex and DxTK....back to my issue.... Single precision floats are still 4 bytes, so I don't see why SV_TARGET won't take a single float instead of 4 floats when the render target appropriately needs so. I can see it from the point of easier data manipulation but then this quadruples memory usage. My only explanation is to think that the OM stage truncates each float to 8-bits but then where is that at in the documentation?Miguel
Truncates isn't really the right word. There's only really three types that really matter in shaders, signed integer, unsigned integer and single precision floating point. Double precision floating point also exists but I've never seen it used outside of scientific simulation. On export from the pixel shader, the GPU will convert (not truncate) 32bit floating point to whatever format the render target was created as. This could be 8 bits per channel, 10, 11, 16 or 32 depending on the format. There's no memory increase in storage, the shader just runs calculations at higher precision.Adam Miles

1 Answers

7
votes

GPUs are designed to perform calculations on 32bit floating point data, at least if they want to support D3D11. As of D3D10 you can also perform 32bit signed and unsigned integer operations. There's no requirement or language support for types smaller than 4 bytes in HLSL, so there's no "byte/char" or "short" for 1 and 2 byte integers or lower precision floating point.

Any DXGI formats that use the "FLOAT", "UNORM" or "SNORM" suffix are non-integer formats, while "UINT" and "SINT" are unsigned and signed integer. Any reads performed by the shader on the first three types will be provided to the shader as 32 bit floating point irrespective of whether the original format was 8 bit UNORM/SNORM or 10/11/16/32 bit floating point. Data in vertices is usually stored at a lower precision than full-fat 32bit floating point to save memory, but by the time it reaches the shader it has already been converted to 32bit float.

On output (to UAVs or Render Targets) the GPU compresses the "float" or "uint" data to whatever format the target was created at. If you try outputting float4(4.4, 5.5, 6.6, 10.1) to a target that is 8-bit normalised then it'll simply be truncated to (1.0,1.0,1.0,1.0) and only consume 4 bytes per pixel.

So to answer your questions:

1) Because shaders only operate on 32 bit types, but the GPU will compress/truncate your output as necessary to be stored in the resource you currently have bound according to its type. It would be madness to have special keywords and types for every format that the GPU supported.

2) The "sampler" doesn't "need a float4 per pixel to work". I think you're mixing your terminology. The declaration that the texture is a Texture2D<float4> is really just stating that this texture has four components and is of a format that is not an integer format. "float" doesn't necessarily mean the source data is 32 bit float (or actually even floating point) but merely that the data has a fractional component to it (eg 0.54, 1.32). Equally, declaring a texture as Texture2D<uint4> doesn't mean that the source data is 32 bit unsigned necessarily, but more that it contains four components of unsigned integer data. However, the data will be returned to you and converted to 32 bit float or 32 bit integer for use inside the shader.

3) You're missing the fact that the GPU decompresses textures / vertex data on reads and compresses it again on writes. The amount of storage used for your vertices/texture data is only as much as the format that you create the resource in, and has nothing to do with the fact that the shader is operating on 32 bit floats / integers.