OpenGL fragment shader: how much difference in computation time between working on "4 times of 1 channel" vs "1 time of 4 channels"?
For example, I could do the computation by 1 channel each time, and I do 4 times.
Or I could put all date in 4 channels, and do it for 1 time.
Some things to consider: (a) some overload for one fragment shader loading, (b) the time of texture fetch of 1 channel is almost equal to texture fetch of 4 channel? Compared to one multiplication in the shader, how much is the time of texture fetech? If the time of texture fetech is not much and there are many calculation steps (involving many multiplication, adding etc), then we do not need to consider texture fetech time much.
(c) how much difference in computation time of 4 times of float a * float a and 1 times of vec4(a, a, a, a) * vec4(a, a, a, a)?
I know for sure that "1 time of 4 channels" is faster than "4 times of 1 channel" But I want to know how faster it is.
The reason I consider "4 times of 1 channel" because the whole implementation involves several passes. For example, input texture 1, render to texture 2. This means there are two textures existing at the same time. After we calculate texture 2, we could delete texture 1. So we need one extra texture for GPU memory. For 1 channel, this means one extra texture of 1 channel for GPU memory. For 4 channels, this means one extra texture of 4 channels for GPU memory. So this causes space difference. (This is just a simple example. The real implementation should involve more steps)
I want to balance of the trade off between GPU memory and GPU computation time.
Any idea or resource to those questions?