I'm doing realtime video processing on iOS at 120 fps and want to first preprocess image on GPU (downsample, convert color, etc. that are not fast enough on CPU) and later postprocess frame on CPU using OpenCV.
What's the fastest way to share camera feed between GPU and CPU using Metal?
In other words the pipe would look like:
CMSampleBufferRef -> MTLTexture or MTLBuffer -> OpenCV Mat
I'm converting CMSampleBufferRef -> MTLTexture the following way
CVPixelBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
// textureRGBA
{
size_t width = CVPixelBufferGetWidth(pixelBuffer);
size_t height = CVPixelBufferGetHeight(pixelBuffer);
MTLPixelFormat pixelFormat = MTLPixelFormatBGRA8Unorm;
CVMetalTextureRef texture = NULL;
CVReturn status = CVMetalTextureCacheCreateTextureFromImage(NULL, _textureCache, pixelBuffer, NULL, pixelFormat, width, height, 0, &texture);
if(status == kCVReturnSuccess) {
textureBGRA = CVMetalTextureGetTexture(texture);
CFRelease(texture);
}
}
After my metal shader is finised I convert MTLTexture to OpenCV
cv::Mat image;
...
CGSize imageSize = CGSizeMake(drawable.texture.width, drawable.texture.height);
int imageByteCount = int(imageSize.width * imageSize.height * 4);
int mbytesPerRow = 4 * int(imageSize.width);
MTLRegion region = MTLRegionMake2D(0, 0, int(imageSize.width), int(imageSize.height));
CGSize resSize = CGSizeMake(drawable.texture.width, drawable.texture.height);
[drawable.texture getBytes:image.data bytesPerRow:mbytesPerRow fromRegion:region mipmapLevel:0];
Some observations:
1) Unfortunately MTLTexture.getBytes
seems expensive (copying data from GPU to CPU?) and takes around 5ms on my iphone 5S which is too much when processing at ~100fps
2) I noticed some people use MTLBuffer instead of MTLTexture with the following method:
metalDevice.newBufferWithLength(byteCount, options: .StorageModeShared)
(see: Memory write performance - GPU CPU Shared Memory)
However CMSampleBufferRef
and accompanying CVPixelBufferRef
is managed by CoreVideo is guess.