Processing camera feed data on GPU (metal) and CPU (OpenCV) on iPhone

Question

I'm doing realtime video processing on iOS at 120 fps and want to first preprocess image on GPU (downsample, convert color, etc. that are not fast enough on CPU) and later postprocess frame on CPU using OpenCV.

What's the fastest way to share camera feed between GPU and CPU using Metal?

In other words the pipe would look like:

CMSampleBufferRef -> MTLTexture or MTLBuffer -> OpenCV Mat

I'm converting CMSampleBufferRef -> MTLTexture the following way

CVPixelBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);

// textureRGBA
{
    size_t width = CVPixelBufferGetWidth(pixelBuffer);
    size_t height = CVPixelBufferGetHeight(pixelBuffer);
    MTLPixelFormat pixelFormat = MTLPixelFormatBGRA8Unorm;

    CVMetalTextureRef texture = NULL;
    CVReturn status = CVMetalTextureCacheCreateTextureFromImage(NULL, _textureCache, pixelBuffer, NULL, pixelFormat, width, height, 0, &texture);
    if(status == kCVReturnSuccess) {
        textureBGRA = CVMetalTextureGetTexture(texture);
        CFRelease(texture);
    }
}

After my metal shader is finised I convert MTLTexture to OpenCV

cv::Mat image;
...
CGSize imageSize = CGSizeMake(drawable.texture.width, drawable.texture.height);
int imageByteCount = int(imageSize.width * imageSize.height * 4);
int mbytesPerRow = 4 * int(imageSize.width);

MTLRegion region = MTLRegionMake2D(0, 0, int(imageSize.width), int(imageSize.height));
CGSize resSize = CGSizeMake(drawable.texture.width, drawable.texture.height);
[drawable.texture getBytes:image.data bytesPerRow:mbytesPerRow  fromRegion:region mipmapLevel:0];

Some observations:

1) Unfortunately MTLTexture.getBytes seems expensive (copying data from GPU to CPU?) and takes around 5ms on my iphone 5S which is too much when processing at ~100fps

2) I noticed some people use MTLBuffer instead of MTLTexture with the following method: metalDevice.newBufferWithLength(byteCount, options: .StorageModeShared) (see: Memory write performance - GPU CPU Shared Memory)

However CMSampleBufferRef and accompanying CVPixelBufferRef is managed by CoreVideo is guess.

The GPU is not supported for all resolutions. I know, It's not your answer . I just give an information about GPU. — HariKrishnan.P
I tried GPUImage but the biggest bottlenect is transfering data from GPU to CPU. GPUImage uses OpenGL under the good and opposite to Metal API cannot have shared memory. — pzo
I would look for a way to do the OpenCV work on the GPU too. Some parts of OpenCV are available in MetalPerformanceShaders.framework, mostly the image processing stuff. iOS 10 adds Convolutional neural networking. If you need other operators, file a feature request bug with Apple. — Ian Ollmann
I am trying to apply a simple vignette filter to a live camera feed using metal. The results are pretty slow and laggy, please check this if you can tell me what is missing:stackoverflow.com/q/53898780/1364053 — nr5

Gary Gary · Accepted Answer · 2016-08-10T22:36:52

The fastest way to do this is to use a MTLTexture backed by a MTLBuffer; it is a special kind of MTLTexture that shares memory with a MTLBuffer. However, your C processing (openCV) will be running a frame or two behind, this is unavoidable as you need to submit the commands to the GPU (encoding) and the GPU needs to render it, if you use waitUntilCompleted to make sure the GPU is finished that just chews up the CPU and is wasteful.

So the process would be: first you create the MTLBuffer then you use the MTLBuffer method "newTextureWithDescriptor:offset:bytesPerRow:" to create the special MTLTexture. You need to create the special MTLTexture beforehand (as an instance variable), then you need to setup up a standard rendering pipeline (faster than using compute shaders) that will take the MTLTexture created from the CMSampleBufferRef and pass this into your special MTLTexture, in that pass you can downscale and do any colour conversion as necessary in one pass. Then you submit the command buffer to the gpu, in a subsequent pass you can just call [theMTLbuffer contents] to grab the pointer to the bytes that back your special MTLTexture for use in openCV.

Any technique that forces a halt in the CPU/GPU behaviour will never be efficient as half the time will be spent waiting i.e. the CPU waits for the GPU to finish and the GPU has to wait also for the next encodings (when the GPU is working you want the CPU to be encoding the next frame and doing any openCV work rather than waiting for the GPU to finish).

Also, when people normally refer to real-time processing they usually are referring to some processing with real-time feedback (visual), all modern iOS devices from the 4s and above have a 60Hz screen refresh rate, so any feedback presented faster than that is pointless but if you need 2 frames (at 120Hz) to make 1 (at 60Hz) then you have to have a custom timer or modify CADisplayLink.

Processing camera feed data on GPU (metal) and CPU (OpenCV) on iPhone

1 Answers