Using gl_SampleMask with multisample texture doesn't get per-sample blend?

Question

I got problem when using gl_SampleMask with multisample texture.

To simplify problem I give this example.

Drawing two triangles to framebuffer with a 32x multisample texture attached. Vertexes of triangles are (0,0) (100,0) (100,1) and (0,0) (0,1) (100,1).

In fragment shader, I have code like this,

#extension GL_NV_sample_mask_override_coverage : require
layout(override_coverage) out int gl_SampleMask[];

...

out_color = vec4(1,0,0,1);
coverage_mask = gen_mask( gl_FragCoord.x / 100.0 * 8.0 );
gl_SampleMask[0] = coverage_mask;

function int gen_mask(int X) generates a integer with X 1s in it's binary representation.

Hopefully, I'd see 100 pixel filled with full red color. But actually I got alpha-blended output. Pixel at (50,0) shows (1,0.25,0.25), which seems to be two (1,0,0,0.5) drawing on (1,1,1,1) background.

However, if I break the coverage_mask, check gl_SampleID in fragment shader, and write (1,0,0,1) or (0,0,0,0) to output color according to coverage_mask's gl_SampleID's bit,

if ((coverage_mask >> gl_SampleID) & (1 == 1) ) {
    out_color = vec4(1,0,0,1);
} else {
    out_color = vec4(0,0,0,0);
}

I got 100 red pixel as expected.

I've checked OpenGL wiki and document but didn't found why the behavior changed here.

And, i'm using Nvidia GTX 980 with driver version 361.43 on Windows 10.

I'd put the test code to GitHub later if necessary.

I don't quite follow your expectations. Since you're setting fewer bits in the coverage mask for fragments that are towards the left, doesn't it make sense that those fragments end up with more transparency? — Reto Koradi
gl_FragCoord.x / 100.0 * 8.0 Assuming your have done the appropriate math to make gl_FragCoord.x go from [0, 100], this means that the largest coverage you'll get is... 8 bits. — Nicol Bolas
@NicolBolas As I've test before, the fact is, when texture has 32 samples, Nvidia's implementation split one pixel to four small fragment, each have 8 samples. So in each fragment shader there are only 8-bit gl_SampleMask available. — Mochimazui
@RetoKoradi se last comment to Nicol Bolas. I'll provide my code later. — Mochimazui

Nicol Bolas Nicol Bolas · Accepted Answer · 2016-01-04T14:08:49

when texture has 32 samples, Nvidia's implementation split one pixel to four small fragment, each have 8 samples. So in each fragment shader there are only 8-bit gl_SampleMask available.

OK, let's assume that's true. How do you suppose NVIDIA implements this?

Well, the OpenGL specification does not allow them to implement this by changing the effective size of gl_SampleMask. It makes it very clear that the size of the sample mask must be large enough to hold the maximum number of samples supported by the implementation. So if GL_MAX_SAMPLES returns 32, then gl_SampleMask must have 32 bits of storage.

So how would they implement it? Well, there's one simple way: the coverage mask. They give each of the 4 fragments a separate 8 bits of coverage mask that they write their outputs to. Which would work perfectly fine...

Until you overrode the coverage mask with override_coverage. This now means all 4 fragment shader invocations can write to the same samples as other FS invocations.

Oops.

I haven't directly tested NVIDIA's implementation to be certain of that, but it is very much consistent with the results you get. Each FS instance in your code will write to, at most, 8 samples. The same 8 samples. 8/32 is 0.25, which is exactly what you get: 0.25 of the color you wrote. Even though 4 FS's may be writing for the same pixel, each one is writing to the same 25% of the coverage mask.

There's no "alpha-blended output"; it's just doing what you asked.

As to why your second code works... well, you fell victim to one of the classic C/C++ (and therefore GLSL) blunders: operator precedence. Allow me to parenthesize your condition to show you what the compiler thinks you wrote:

((coverage_mask >> gl_SampleID) & (1 == 1))

Equality testing has a higher precedence than any bitwise operation. So it gets grouped like this. Now, a conformant GLSL implementation should have failed to compile because of that, since the result of 1 == 1 is a boolean, which cannot be used in a bitwise & operation.

Of course, NVIDIA has always had a tendency to play fast-and-loose with GLSL, so it doesn't surprise me that they allow this nonsense code to compile. Much like C++. I have no idea what this code would actually do; it depends on how a true boolean value gets transformed into an integer. And GLSL doesn't define such an implicit conversion, so it's up to NVIDIA to decide what that means.

The traditional condition for testing a bit is this:

(coverage_mask & (0x1 << gl_SampleID))

It also avoids undefined behavior if coverage_mask isn't an unsigned integer.

Of course, doing the condition correctly should give you... the exact same answer as the first one.

Using gl_SampleMask with multisample texture doesn't get per-sample blend?

1 Answers