Combining CoreML Object Detection and ARKit 2D Image Detection

Question

The app detects specific 2D images (with ARKit) and has a mlmodel that detects some furnitures, the mlmodel is of type Object Detection, it is trained and works. Depending of what is detected I need to add some 3D objects to the scene or others.

I created an AR session with ARWorldTrackingConfiguration and I can detect the 2D image and in the method renderer(_:didAdd:for:) I add the 3D object and it works perfectly:

    override func viewDidAppear(_ animated: Bool) {
    super.viewWillAppear(animated)

    guard let referenceImages = ARReferenceImage.referenceImages(inGroupNamed: "AR Resources", bundle: nil) else {
        fatalError("Missing expected asset catalog resources.")
    }

    let configuration = ARWorldTrackingConfiguration()
    configuration.worldAlignment = .gravityAndHeading
    configuration.detectionImages = referenceImages
    configuration.maximumNumberOfTrackedImages = 1
    configuration.isAutoFocusEnabled = false
    sceneView.session.run(configuration, options: [.resetTracking, .removeExistingAnchors])
}

Also, I setup the mlmodel:

override func viewDidLoad() {

    super.viewDidLoad()
    sceneView.delegate = self
    sceneView.session.delegate = self
    setupML()
}

    internal func setupML() {

    guard let modelPath = Bundle.main.url(forResource: "furnituresDetector", withExtension: "mlmodelc") else {
        fatalError("Missing model")
    }

    do {
        let coreMLModel = try VNCoreMLModel(for: MLModel(contentsOf: modelPath))
        let request = VNCoreMLRequest(model: coreMLModel) { [weak self] (request, error) in
            DispatchQueue.main.async {
                if let results = request.results {
                    print(results.count)
                }
            }
        }
        self.requests = [request]
    } catch {
        print("Core ML Model error")
    }
}

By the moment I just want to print the number of results to see if the ml model detects something or not.

Until here everything works perfectly, I run the app and the camera shows fluid. Instead of intanciating a new camera session, I reuse the session that has been started by the ARSCNView as I found in Combining CoreML and ARKit

So my solution was use session(_:didUpdate:) to make the request to coreml model and continuously know if the model has detected something that appears in the camera.

    func session(_ session: ARSession, didUpdate frame: ARFrame) {

    DispatchQueue(label: "CoreML_request").async {
        guard let pixelBuffer = session.currentFrame?.capturedImage else {
            return
        }

        let exifOrientation = self.exifOrientationFromDeviceOrientation()


        let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: exifOrientation, options: [:])
        do {
            try imageRequestHandler.perform(self.requests)
        } catch {
            print(error)
        }
    }
}

If I run the app it works but the problem is that the camera seems very slow and if I delete the code inside session(_:didUpdate:), the camera seems normal again. So the problem is here, I suppose that what happens is that it is not the proper place to make this request because this method is call all the time when detects a new frame in the camera. But I don't know where to do the request or what to do. Do you have any idea?

I will update it if I found some solution. Thanks!

Your Core ML model might be too big and slow to keep up with the camera. A simple solution is to run the Core ML model less often. — Matthijs Hollemans

AndreaRov AndreaRov · Accepted Answer · 2020-02-20T09:40:32

I found a solution. The problem is that the camera has a limited buffers available, I was enqueueing too many buffers while another Vision task was still running.

That is why the camera was slow. So, the solution is release the buffer before performing another request.

internal var currentBuffer: CVPixelBuffer?

func session(_ session: ARSession, didUpdate frame: ARFrame) {

    guard currentBuffer == nil, case .normal = frame.camera.trackingState else {
        return
    }
    self.currentBuffer = frame.capturedImage

    DispatchQueue(label: "CoreML_request").async {
        guard let pixelBuffer = session.currentFrame?.capturedImage else {
            return
        }

        let exifOrientation = self.exifOrientationFromDeviceOrientation()

        let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: exifOrientation, options: [:])
        do {
            // Release the pixel buffer when done, allowing the next buffer to be processed.
            defer { self.currentBuffer = nil }
            try imageRequestHandler.perform(self.requests)
        } catch {
            print(error)
        }
    }
}

Here you can check the documentation:

https://developer.apple.com/documentation/arkit/recognizing_and_labeling_arbitrary_objects

Combining CoreML Object Detection and ARKit 2D Image Detection

1 Answers