Apple has a useful tutorial called Displaying an AR Experience with Metal that shows you how to extract the Y and CbCr textures from an ARFrame
's capturedImage
property and convert them to RGB for rendering. However I've run into problems trying to take an RGBA texture and perform the inverse operation, i.e. converting back to the Y and CbCr textures.
I rewrote the fragment shader in the tutorial as a compute shader that writes to an rgba texture I created from a metal buffer:
// Same as capturedImageFragmentShader but it's a kernel function instead
kernel void yCbCrToRgbKernel(texture2d<float, access::sample> yTexture [[ texture(kTextureIndex_Y) ]],
texture2d<float, access::sample> cbCrTexture [[ texture(kTextureIndex_CbCr) ]],
texture2d<float, access::write> rgbaTexture [[ texture(kTextureIndex_RGBA) ]],
uint2 gid [[ thread_position_in_grid ]])
{
constexpr sampler colorSampler(mip_filter::linear, mag_filter::linear, min_filter::linear);
const float4x4 ycbcrToRGBTransform = float4x4(
float4(+1.0000f, +1.0000f, +1.0000f, +0.0000f),
float4(+0.0000f, -0.3441f, +1.7720f, +0.0000f),
float4(+1.4020f, -0.7141f, +0.0000f, +0.0000f),
float4(-0.7010f, +0.5291f, -0.8860f, +1.0000f)
);
float4 ycbcr = float4(yTexture.sample(colorSampler, float2(gid)).r, cbCrTexture.sample(colorSampler, float2(gid)).rg, 1.0);
float4 result = ycbcrToRGBTransform * ycbcr;
rgbaTexture.write(result, ushort2(gid));
}
I tried to write a second compute shader to perform the reverse operation, calculating the Y, Cb, and Cr values using the conversion formulae found on YCbCr's wikipedia page:
kernel void rgbaToYCbCrKernel(texture2d<float, access::write> yTexture [[ texture(kTextureIndex_Y) ]],
texture2d<float, access::write> cbCrTexture [[ texture(kTextureIndex_CbCr) ]],
texture2d<float, access::sample> rgbaTexture [[ texture(kTextureIndex_RGBA) ]],
uint2 gid [[ thread_position_in_grid ]])
{
constexpr sampler colorSampler(mip_filter::linear, mag_filter::linear, min_filter::linear);
float4 rgba = rgbaTexture.sample(colorSampler, float2(gid)).rgba;
// see https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.709_conversion for conversion formulae
float Y = 16.0 + (65.481 * rgba.r + 128.553 * rgba.g + 24.966 * rgba.b);
float Cb = 128 + (-37.797 * rgba.r + 74.203 * rgba.g + 112.0 * rgba.b);
float Cr = 128 + (112.0 * rgba.r + 93.786 * rgba.g - 18.214 * rgba.b);
yTexture.write(Y, gid);
cbCrTexture.write(float4(Cb, Cr, 0, 0), gid); // this probably is not correct...
}
My problem is how to write data to these textures correctly. I know it is incorrect because the resulting display is a solid pink color. The expected result is obviously the original, unmodifie display.
The pixel formats for the Y, CbCr, and RGBA textures are .r8UNorm
, .rg8UNorm
, and rgba8UNorm
respectively.
Here is my swift code for setting up the textures and executing the shaders:
private func createTexture(fromPixelBuffer pixelBuffer: CVPixelBuffer, pixelFormat: MTLPixelFormat, planeIndex: Int) -> MTLTexture? {
guard CVMetalTextureCacheCreate(kCFAllocatorSystemDefault, nil, device, nil, &capturedImageTextureCache) == kCVReturnSuccess else { return nil }
var mtlTexture: MTLTexture? = nil
let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, planeIndex)
let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, planeIndex)
var texture: CVMetalTexture? = nil
let status = CVMetalTextureCacheCreateTextureFromImage(nil, capturedImageTextureCache!, pixelBuffer, nil, pixelFormat, width, height, planeIndex, &texture)
if status == kCVReturnSuccess {
mtlTexture = CVMetalTextureGetTexture(texture!)
}
return mtlTexture
}
func arFrameToRGB(frame: ARFrame) {
let frameBuffer = frame.capturedImage
CVPixelBufferLockBaseAddress(frameBuffer, CVPixelBufferLockFlags(rawValue: 0))
// Extract Y and CbCr textures
let capturedImageTextureY = createTexture(fromPixelBuffer: frameBuffer, pixelFormat: .r8Unorm, planeIndex: 0)!
let capturedImageTextureCbCr = createTexture(fromPixelBuffer: frameBuffer, pixelFormat: .rg8Unorm, planeIndex: 1)!
// create the RGBA texture
let rgbaBufferWidth = CVPixelBufferGetWidthOfPlane(frameBuffer, 0)
let rgbaBufferHeight = CVPixelBufferGetHeightOfPlane(frameBuffer, 0)
if rgbaBuffer == nil {
rgbaBuffer = device.makeBuffer(length: 4 * rgbaBufferWidth * rgbaBufferHeight, options: [])
}
let rgbaTextureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .rgba8Unorm, width: rgbaBufferWidth, height: rgbaBufferHeight, mipmapped: false)
rgbaTextureDescriptor.usage = [.shaderWrite, .shaderRead]
let rgbaTexture = rgbaBuffer?.makeTexture(descriptor: rgbaTextureDescriptor, offset: 0, bytesPerRow: 4 * rgbaBufferWidth)
threadGroupSize = MTLSizeMake(4, 4, 1)
threadGroupCount = MTLSizeMake((rgbaTexture!.width + threadGroupSize!.width - 1) / threadGroupSize!.width, (rgbaTexture!.height + threadGroupSize!.height - 1) / threadGroupSize!.height, 1)
let yCbCrToRGBACommandBuffer = commandQueue.makeCommandBuffer()!
let yCbCrToRGBAComputeEncoder = yCbCrToRGBACommandBuffer.makeComputeCommandEncoder()!
yCbCrToRGBAComputeEncoder.setComputePipelineState(yCbCrToRgbPso)
yCbCrToRGBAComputeEncoder.setTexture(capturedImageTextureY, index: Int(kTextureIndex_Y.rawValue))
yCbCrToRGBAComputeEncoder.setTexture(capturedImageTextureCbCr, index: Int(kTextureIndex_CbCr.rawValue))
yCbCrToRGBAComputeEncoder.setTexture(rgbaTexture, index: Int(kTextureIndex_RGBA.rawValue))
yCbCrToRGBAComputeEncoder.dispatchThreadgroups(threadGroupCount!, threadsPerThreadgroup: threadGroupSize!)
yCbCrToRGBAComputeEncoder.endEncoding()
let rgbaToYCbCrCommandBuffer = commandQueue.makeCommandBuffer()!
let rgbaToYCbCrComputeEncoder = rgbaToYCbCrCommandBuffer.makeComputeCommandEncoder()!
rgbaToYCbCrComputeEncoder.setComputePipelineState(rgbaToYCbCrPso)
rgbaToYCbCrComputeEncoder.setTexture(capturedImageTextureY, index: Int(kTextureIndex_Y.rawValue))
rgbaToYCbCrComputeEncoder.setTexture(capturedImageTextureCbCr, index: Int(kTextureIndex_CbCr.rawValue))
rgbaToYCbCrComputeEncoder.setTexture(rgbaTexture, index: Int(kTextureIndex_RGBA.rawValue))
rgbaToYCbCrComputeEncoder.dispatchThreadgroups(threadGroupCount!, threadsPerThreadgroup: threadGroupSize!)
rgbaToYCbCrComputeEncoder.endEncoding()
yCbCrToRGBACommandBuffer.commit()
rgbaToYCbCrCommandBuffer.commit()
yCbCrToRGBACommandBuffer.waitUntilCompleted()
rgbaToYCbCrCommandBuffer.waitUntilCompleted()
CVPixelBufferUnlockBaseAddress(frameBuffer, CVPixelBufferLockFlags(rawValue: 0))
}
The end goal is to use metal shaders to do image processing on the rgba texture and eventually write back to the Y and CbCr textures for display on the screen.
Here are the parts I am unsure about
How do I write data in the correct format to these textures given that the type for the textures in the kernel function is
texture2d<float, access::write>
but they have differing pixel formats?Is my rewrite of
capturedImageFragmentShader
in the Displaying an AR Experience with Metal as a compute shader as simple as I thought, or am I missing something there?
coord::pixel
to the front of your sampler constructors (e.g.constexpr sampler colorSampler(coord::pixel, ...
)? I ask because it looks like the fragment function is using normalized coordinates provided by the rasterizer, but you're using work item indices that presumably correspond 1:1 with pixel coordinates. – warrenm