How to convert an RGBA texture to Y and CbCr textures in metal

Question

Apple has a useful tutorial called Displaying an AR Experience with Metal that shows you how to extract the Y and CbCr textures from an ARFrame's capturedImage property and convert them to RGB for rendering. However I've run into problems trying to take an RGBA texture and perform the inverse operation, i.e. converting back to the Y and CbCr textures.

I rewrote the fragment shader in the tutorial as a compute shader that writes to an rgba texture I created from a metal buffer:

// Same as capturedImageFragmentShader but it's a kernel function instead
kernel void yCbCrToRgbKernel(texture2d<float, access::sample> yTexture [[ texture(kTextureIndex_Y) ]],
                             texture2d<float, access::sample> cbCrTexture [[ texture(kTextureIndex_CbCr) ]],
                             texture2d<float, access::write> rgbaTexture [[ texture(kTextureIndex_RGBA) ]],
                             uint2 gid [[ thread_position_in_grid ]])
{
    constexpr sampler colorSampler(mip_filter::linear, mag_filter::linear, min_filter::linear);

    const float4x4 ycbcrToRGBTransform = float4x4(
        float4(+1.0000f, +1.0000f, +1.0000f, +0.0000f),
        float4(+0.0000f, -0.3441f, +1.7720f, +0.0000f),
        float4(+1.4020f, -0.7141f, +0.0000f, +0.0000f),
        float4(-0.7010f, +0.5291f, -0.8860f, +1.0000f)
    );

    float4 ycbcr = float4(yTexture.sample(colorSampler, float2(gid)).r, cbCrTexture.sample(colorSampler, float2(gid)).rg, 1.0);
    float4 result = ycbcrToRGBTransform * ycbcr;
    rgbaTexture.write(result, ushort2(gid));
}

I tried to write a second compute shader to perform the reverse operation, calculating the Y, Cb, and Cr values using the conversion formulae found on YCbCr's wikipedia page:

kernel void rgbaToYCbCrKernel(texture2d<float, access::write> yTexture [[ texture(kTextureIndex_Y) ]],
                             texture2d<float, access::write> cbCrTexture [[ texture(kTextureIndex_CbCr) ]],
                             texture2d<float, access::sample> rgbaTexture [[ texture(kTextureIndex_RGBA) ]],
                             uint2 gid [[ thread_position_in_grid ]])
{
    constexpr sampler colorSampler(mip_filter::linear, mag_filter::linear, min_filter::linear);

    float4 rgba = rgbaTexture.sample(colorSampler, float2(gid)).rgba;

    // see https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.709_conversion for conversion formulae

    float Y = 16.0 + (65.481 * rgba.r + 128.553 * rgba.g + 24.966 * rgba.b);
    float Cb = 128 + (-37.797 * rgba.r + 74.203 * rgba.g + 112.0 * rgba.b);
    float Cr = 128 + (112.0 * rgba.r + 93.786 * rgba.g - 18.214 * rgba.b);

    yTexture.write(Y, gid);
    cbCrTexture.write(float4(Cb, Cr, 0, 0), gid); // this probably is not correct...
}

My problem is how to write data to these textures correctly. I know it is incorrect because the resulting display is a solid pink color. The expected result is obviously the original, unmodifie display.

The pixel formats for the Y, CbCr, and RGBA textures are .r8UNorm, .rg8UNorm, and rgba8UNorm respectively.

Here is my swift code for setting up the textures and executing the shaders:

private func createTexture(fromPixelBuffer pixelBuffer: CVPixelBuffer, pixelFormat: MTLPixelFormat, planeIndex: Int) -> MTLTexture? {
        guard CVMetalTextureCacheCreate(kCFAllocatorSystemDefault, nil, device, nil, &capturedImageTextureCache) == kCVReturnSuccess else { return nil }

        var mtlTexture: MTLTexture? = nil
        let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, planeIndex)
        let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, planeIndex)

        var texture: CVMetalTexture? = nil
        let status = CVMetalTextureCacheCreateTextureFromImage(nil, capturedImageTextureCache!, pixelBuffer, nil, pixelFormat, width, height, planeIndex, &texture)
        if status == kCVReturnSuccess {
            mtlTexture = CVMetalTextureGetTexture(texture!)
        }

        return mtlTexture
    }

    func arFrameToRGB(frame: ARFrame) {

        let frameBuffer = frame.capturedImage

        CVPixelBufferLockBaseAddress(frameBuffer, CVPixelBufferLockFlags(rawValue: 0))

        // Extract Y and CbCr textures
        let capturedImageTextureY = createTexture(fromPixelBuffer: frameBuffer, pixelFormat: .r8Unorm, planeIndex: 0)!
        let capturedImageTextureCbCr = createTexture(fromPixelBuffer: frameBuffer, pixelFormat: .rg8Unorm, planeIndex: 1)!

        // create the RGBA texture
        let rgbaBufferWidth = CVPixelBufferGetWidthOfPlane(frameBuffer, 0)
        let rgbaBufferHeight = CVPixelBufferGetHeightOfPlane(frameBuffer, 0)
        if rgbaBuffer == nil {
            rgbaBuffer = device.makeBuffer(length: 4 * rgbaBufferWidth * rgbaBufferHeight, options: [])
        }

        let rgbaTextureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .rgba8Unorm, width: rgbaBufferWidth, height: rgbaBufferHeight, mipmapped: false)
        rgbaTextureDescriptor.usage = [.shaderWrite, .shaderRead]
        let rgbaTexture = rgbaBuffer?.makeTexture(descriptor: rgbaTextureDescriptor, offset: 0, bytesPerRow: 4 * rgbaBufferWidth)

        threadGroupSize = MTLSizeMake(4, 4, 1)
        threadGroupCount = MTLSizeMake((rgbaTexture!.width + threadGroupSize!.width - 1) / threadGroupSize!.width, (rgbaTexture!.height + threadGroupSize!.height - 1) / threadGroupSize!.height, 1)

        let yCbCrToRGBACommandBuffer = commandQueue.makeCommandBuffer()!
        let yCbCrToRGBAComputeEncoder = yCbCrToRGBACommandBuffer.makeComputeCommandEncoder()!
        yCbCrToRGBAComputeEncoder.setComputePipelineState(yCbCrToRgbPso)
        yCbCrToRGBAComputeEncoder.setTexture(capturedImageTextureY, index: Int(kTextureIndex_Y.rawValue))
        yCbCrToRGBAComputeEncoder.setTexture(capturedImageTextureCbCr, index: Int(kTextureIndex_CbCr.rawValue))
        yCbCrToRGBAComputeEncoder.setTexture(rgbaTexture, index: Int(kTextureIndex_RGBA.rawValue))
        yCbCrToRGBAComputeEncoder.dispatchThreadgroups(threadGroupCount!, threadsPerThreadgroup: threadGroupSize!)
        yCbCrToRGBAComputeEncoder.endEncoding()

        let rgbaToYCbCrCommandBuffer = commandQueue.makeCommandBuffer()!
        let rgbaToYCbCrComputeEncoder = rgbaToYCbCrCommandBuffer.makeComputeCommandEncoder()!
        rgbaToYCbCrComputeEncoder.setComputePipelineState(rgbaToYCbCrPso)
        rgbaToYCbCrComputeEncoder.setTexture(capturedImageTextureY, index: Int(kTextureIndex_Y.rawValue))
        rgbaToYCbCrComputeEncoder.setTexture(capturedImageTextureCbCr, index: Int(kTextureIndex_CbCr.rawValue))
        rgbaToYCbCrComputeEncoder.setTexture(rgbaTexture, index: Int(kTextureIndex_RGBA.rawValue))
        rgbaToYCbCrComputeEncoder.dispatchThreadgroups(threadGroupCount!, threadsPerThreadgroup: threadGroupSize!)
        rgbaToYCbCrComputeEncoder.endEncoding()

        yCbCrToRGBACommandBuffer.commit()
        rgbaToYCbCrCommandBuffer.commit()

        yCbCrToRGBACommandBuffer.waitUntilCompleted()
        rgbaToYCbCrCommandBuffer.waitUntilCompleted()

        CVPixelBufferUnlockBaseAddress(frameBuffer, CVPixelBufferLockFlags(rawValue: 0))
    }

The end goal is to use metal shaders to do image processing on the rgba texture and eventually write back to the Y and CbCr textures for display on the screen.

Here are the parts I am unsure about

How do I write data in the correct format to these textures given that the type for the textures in the kernel function is texture2d<float, access::write> but they have differing pixel formats?
Is my rewrite of capturedImageFragmentShader in the Displaying an AR Experience with Metal as a compute shader as simple as I thought, or am I missing something there?

A while ago I wrote some Metal texture viewer that did something pretty close to what you're looking for here, check it out: github.com/eldade/EEMetalTextureViewer (the shader in question is here github.com/eldade/EEMetalTextureViewer/blob/master/…). There is also a sample program that grabs YCbCr data from the camera and converts it live. — ldoogy
That's not what I'm looking for, you just have shaders for converting from YCbCr -> RGB — Dalton Sweeney
Just glancing at this, I noticed that you don't specify the coordinate space of your samplers. What happens if you add coord::pixel to the front of your sampler constructors (e.g. constexpr sampler colorSampler(coord::pixel, ...)? I ask because it looks like the fragment function is using normalized coordinates provided by the rasterizer, but you're using work item indices that presumably correspond 1:1 with pixel coordinates. — warrenm
try to divide all value into 255 in rgbaToYCbCrKernel kernel. — Hamid Yusifli
There are quite a few issues that need to be dealt with to get RGB -> BT.709 -> RGB implemented properly. For example, your code does not convert to linear light when moving to YCbCr. There are also very tricky scaling issues with linear light in the decode stage. If you are interested, here is an example project that does it properly (though the RGB -> YCbCr is not in Metal). github.com/mdejong/MetalBT709Decoder — MoDJ

nburrus nburrus · Accepted Answer · 2021-05-29T20:27:36

I just had to implement the same thing. Your first issue is a confusion between the values stored in the texture buffer and how these values are presented in the Metal kernel. As typical in GPU shaders, when integer values are accessed as float they get normalized to [0,1] when reading them, and scaled back to [0,MaxIntValue] on write. For Metal this conversion is documented Page 228 of https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf "7.7.1.1 Converting Normalized Integer Pixel Data Types to Floating-Point Values".

So for example if the texture format of the Y channel is .r8UNorm, the data is stored with 1 byte per pixel, with values from 0 to 255. But once accessed in the kernel via texture2d<float>, the values will be in [0,1]. And when you are writing to such a texture, the values are automatically scaled back to [0,255]. So inside your kernel you should consider that you are dealing with values within [0,1] and not [0,255], and adjust your transforms accordingly.

The second issue is the RGBA to YCbCr transform itself. Assuming that the sample from Apple is correct, we can see that they follow the JPEG convention given at the end of the wikipedia page. The coefficients exactly match if you replace 128 by 128/255=0.5 and put that in a matrix form. The extra subtlety is that matrices are initialized in column-major mode in Metal code, so the corresponding math operation should read:

       |+1.     +0.     +1.402  -0.701 |   |Y |
       |+1.     -0.3441 -0.7141 +0.5291|   |Cb|
RGBA = |+1.     +1.772  +0.     -0.886 | . |Cr|
       |+0.     +0.     +0.     +1.    |   |1 |

Next what you need is the inverse transform. You can find it in the same JPEG section of the wikipedia page (again replacing 128 by 0.5), or if you want to use the same matrix form you can simply compute the inverse of the 4x4 matrix and use that. This is what I did and I got that after putting it back to column-major:

const float4x4 rgbaToYcbcrTransform = float4x4(
   float4(+0.2990, -0.1687, +0.5000, +0.0000),
   float4(+0.5870, -0.3313, -0.4187, +0.0000),
   float4(+0.1140, +0.5000, -0.0813, +0.0000),
   float4(+0.0000, +0.5000, +0.5000, +1.0000)
);

Then adapting your kernel code something like this should work (I did not test that exact code, my texture layout is slightly different):

// Ignore alpha as we can't convert it, just set it to 1.
float3 rgb = rgbaTexture.sample(colorSampler, float2(gid)).rgb;
float4 ycbcr = rgbaToYcbcrTransform * float4(rgb, 1.0);    
yTexture.write(ycbcr[0], gid);
cbCrTexture.write(float4(ycbcr[1], ycbcr[2], 0, 0), gid);

How to convert an RGBA texture to Y and CbCr textures in metal

1 Answers